
OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!
If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.
Requested Article:
VinVL: Revisiting Visual Representations in Vision-Language Models
Pengchuan Zhang, Xiujun Li, Xiaowei Hu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 5575-5584
Open Access | Times Cited: 662
Pengchuan Zhang, Xiujun Li, Xiaowei Hu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 5575-5584
Open Access | Times Cited: 662
Showing 26-50 of 662 citing articles:
Prompting Large Language Models with Answer Heuristics for Knowledge-Based Visual Question Answering
Zhenwei Shao, Yu Zhou, Meng Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 14974-14983
Closed Access | Times Cited: 96
Zhenwei Shao, Yu Zhou, Meng Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 14974-14983
Closed Access | Times Cited: 96
Generalized Decoding for Pixel, Image, and Language
Xueyan Zou, Zi-Yi Dou, Jianwei Yang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 15116-15127
Open Access | Times Cited: 95
Xueyan Zou, Zi-Yi Dou, Jianwei Yang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 15116-15127
Open Access | Times Cited: 95
Large-scale Multi-modal Pre-trained Models: A Comprehensive Survey
Xiao Wang, Guangyao Chen, Guangwu Qian, et al.
Deleted Journal (2023) Vol. 20, Iss. 4, pp. 447-482
Open Access | Times Cited: 92
Xiao Wang, Guangyao Chen, Guangwu Qian, et al.
Deleted Journal (2023) Vol. 20, Iss. 4, pp. 447-482
Open Access | Times Cited: 92
A Survey on Long-Tailed Visual Recognition
Lu Yang, He Jiang, Qing Song, et al.
International Journal of Computer Vision (2022) Vol. 130, Iss. 7, pp. 1837-1872
Closed Access | Times Cited: 90
Lu Yang, He Jiang, Qing Song, et al.
International Journal of Computer Vision (2022) Vol. 130, Iss. 7, pp. 1837-1872
Closed Access | Times Cited: 90
Everything at Once – Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova, Brian Chen, Andrew Rouditchenko, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 90
Nina Shvetsova, Brian Chen, Andrew Rouditchenko, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 90
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Yoad Tewel, Yoav Shalev, Idan Schwartz, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 17897-17907
Open Access | Times Cited: 85
Yoad Tewel, Yoav Shalev, Idan Schwartz, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 17897-17907
Open Access | Times Cited: 85
General Facial Representation Learning in a Visual-Linguistic Manner
Yinglin Zheng, Hao Yang, Ting Zhang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 18676-18688
Open Access | Times Cited: 82
Yinglin Zheng, Hao Yang, Ting Zhang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 18676-18688
Open Access | Times Cited: 82
CLIP-Event: Connecting Text and Images with Event Structures
Manling Li, Ruochen Xu, Shuohang Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 16399-16408
Open Access | Times Cited: 80
Manling Li, Ruochen Xu, Shuohang Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 16399-16408
Open Access | Times Cited: 80
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation
Yutong Chen, Fangyun Wei, Xiao Sun, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 78
Yutong Chen, Fangyun Wei, Xiao Sun, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 78
GRIT: Faster and Better Image Captioning Transformer Using Dual Visual Features
Van-Quang Nguyen, Masanori Suganuma, Takayuki Okatani
Lecture notes in computer science (2022), pp. 167-184
Open Access | Times Cited: 75
Van-Quang Nguyen, Masanori Suganuma, Takayuki Okatani
Lecture notes in computer science (2022), pp. 167-184
Open Access | Times Cited: 75
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation
Henghui Ding, Chang Liu, Suchen Wang, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2022) Vol. 45, Iss. 6, pp. 7900-7916
Open Access | Times Cited: 73
Henghui Ding, Chang Liu, Suchen Wang, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2022) Vol. 45, Iss. 6, pp. 7900-7916
Open Access | Times Cited: 73
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension
Sanjay Subramanian, William Merrill, Trevor Darrell, et al.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2022), pp. 5198-5215
Open Access | Times Cited: 70
Sanjay Subramanian, William Merrill, Trevor Darrell, et al.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2022), pp. 5198-5215
Open Access | Times Cited: 70
See Finer, See More: Implicit Modality Alignment for Text-Based Person Retrieval
Xiujun Shu, Wei Wen, Haoqian Wu, et al.
Lecture notes in computer science (2023), pp. 624-641
Closed Access | Times Cited: 68
Xiujun Shu, Wei Wen, Haoqian Wu, et al.
Lecture notes in computer science (2023), pp. 624-641
Closed Access | Times Cited: 68
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi, Hamid Reza Pourreza, Hamidreza Mahyar
ACM Computing Surveys (2023) Vol. 56, Iss. 3, pp. 1-39
Open Access | Times Cited: 67
Taraneh Ghandi, Hamid Reza Pourreza, Hamidreza Mahyar
ACM Computing Surveys (2023) Vol. 56, Iss. 3, pp. 1-39
Open Access | Times Cited: 67
From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large Language Models
Jiaxian Guo, Junnan Li, Dongxu Li, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 10867-10877
Closed Access | Times Cited: 59
Jiaxian Guo, Junnan Li, Dongxu Li, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 10867-10877
Closed Access | Times Cited: 59
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Jiang Liu, Hui Ding, Zhaowei Cai, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 18653-18663
Open Access | Times Cited: 53
Jiang Liu, Hui Ding, Zhaowei Cai, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 18653-18663
Open Access | Times Cited: 53
LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Linjie Li, Zhe Gan, Kevin Lin, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 23119-23129
Open Access | Times Cited: 44
Linjie Li, Zhe Gan, Kevin Lin, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 23119-23129
Open Access | Times Cited: 44
NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving Scenario
Tianwen Qian, Jingjing Chen, Linhai Zhuo, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 5, pp. 4542-4550
Open Access | Times Cited: 33
Tianwen Qian, Jingjing Chen, Linhai Zhuo, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 5, pp. 4542-4550
Open Access | Times Cited: 33
MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Yue Xiang, Yuansheng Ni, Tianyu Zheng, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 32, pp. 9556-9567
Closed Access | Times Cited: 31
Yue Xiang, Yuansheng Ni, Tianyu Zheng, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 32, pp. 9556-9567
Closed Access | Times Cited: 31
CPT: Color-based Prompt Tuning for pre-trained vision-language models
Yuan Yao, Ao Zhang, Zhengyan Zhang, et al.
AI Open (2024)
Open Access | Times Cited: 28
Yuan Yao, Ao Zhang, Zhengyan Zhang, et al.
AI Open (2024)
Open Access | Times Cited: 28
Bootstrapping Interactive Image–Text Alignment for Remote Sensing Image Captioning
Cong Yang, Zuchao Li, Lefei Zhang
IEEE Transactions on Geoscience and Remote Sensing (2024) Vol. 62, pp. 1-12
Open Access | Times Cited: 16
Cong Yang, Zuchao Li, Lefei Zhang
IEEE Transactions on Geoscience and Remote Sensing (2024) Vol. 62, pp. 1-12
Open Access | Times Cited: 16
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
Noam Rotstein, David Bensaïd, Shaked Brody, et al.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024), pp. 5677-5688
Open Access | Times Cited: 15
Noam Rotstein, David Bensaïd, Shaked Brody, et al.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024), pp. 5677-5688
Open Access | Times Cited: 15
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
Yan Zeng, Xinsong Zhang, Hang Li
arXiv (Cornell University) (2021)
Open Access | Times Cited: 89
Yan Zeng, Xinsong Zhang, Hang Li
arXiv (Cornell University) (2021)
Open Access | Times Cited: 89
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao, Ao Zhang, Zhengyan Zhang, et al.
arXiv (Cornell University) (2021)
Open Access | Times Cited: 84
Yuan Yao, Ao Zhang, Zhengyan Zhang, et al.
arXiv (Cornell University) (2021)
Open Access | Times Cited: 84
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala, Marimuthu Kalimuthu, Dietrich Klakow
Journal of Artificial Intelligence Research (2021) Vol. 71, pp. 1183-1317
Open Access | Times Cited: 69
Aditya Mogadala, Marimuthu Kalimuthu, Dietrich Klakow
Journal of Artificial Intelligence Research (2021) Vol. 71, pp. 1183-1317
Open Access | Times Cited: 69