
OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!
If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.
Requested Article:
VinVL: Revisiting Visual Representations in Vision-Language Models
Pengchuan Zhang, Xiujun Li, Xiaowei Hu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 5575-5584
Open Access | Times Cited: 659
Pengchuan Zhang, Xiujun Li, Xiaowei Hu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 5575-5584
Open Access | Times Cited: 659
Showing 1-25 of 659 citing articles:
Transformers in Vision: A Survey
Salman Khan, Muzammal Naseer, Munawar Hayat, et al.
ACM Computing Surveys (2022) Vol. 54, Iss. 10s, pp. 1-41
Open Access | Times Cited: 1896
Salman Khan, Muzammal Naseer, Munawar Hayat, et al.
ACM Computing Surveys (2022) Vol. 54, Iss. 10s, pp. 1-41
Open Access | Times Cited: 1896
Grounded Language-Image Pre-training
Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 10955-10965
Open Access | Times Cited: 461
Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 10955-10965
Open Access | Times Cited: 461
Multimodal Learning With Transformers: A Survey
Peng Xu, Xiatian Zhu, David A. Clifton
IEEE Transactions on Pattern Analysis and Machine Intelligence (2023) Vol. 45, Iss. 10, pp. 12113-12132
Open Access | Times Cited: 325
Peng Xu, Xiatian Zhu, David A. Clifton
IEEE Transactions on Pattern Analysis and Machine Intelligence (2023) Vol. 45, Iss. 10, pp. 12113-12132
Open Access | Times Cited: 325
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang, Jiahui Yu, Adams Wei Yu, et al.
arXiv (Cornell University) (2021)
Open Access | Times Cited: 300
Zirui Wang, Jiahui Yu, Adams Wei Yu, et al.
arXiv (Cornell University) (2021)
Open Access | Times Cited: 300
FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 15617-15629
Open Access | Times Cited: 294
Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 15617-15629
Open Access | Times Cited: 294
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 16772-16782
Open Access | Times Cited: 278
Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 16772-16782
Open Access | Times Cited: 278
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
Pengchuan Zhang, Xiyang Dai, Jianwei Yang, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), pp. 2978-2988
Open Access | Times Cited: 258
Pengchuan Zhang, Xiyang Dai, Jianwei Yang, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), pp. 2978-2988
Open Access | Times Cited: 258
Image as a Foreign Language: BEIT Pretraining for Vision and Vision-Language Tasks
Wenhui Wang, Hangbo Bao, Dong Li, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 19175-19186
Closed Access | Times Cited: 254
Wenhui Wang, Hangbo Bao, Dong Li, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 19175-19186
Closed Access | Times Cited: 254
A Survey of Visual Transformers
Yang Liu, Yao Zhang, Yixin Wang, et al.
IEEE Transactions on Neural Networks and Learning Systems (2023) Vol. 35, Iss. 6, pp. 7478-7498
Open Access | Times Cited: 244
Yang Liu, Yao Zhang, Yixin Wang, et al.
IEEE Transactions on Neural Networks and Learning Systems (2023) Vol. 35, Iss. 6, pp. 7478-7498
Open Access | Times Cited: 244
An Empirical Study of Training End-to-End Vision-and-Language Transformers
Zi-Yi Dou, Yichong Xu, Zhe Gan, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 216
Zi-Yi Dou, Yichong Xu, Zhe Gan, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 216
From Show to Tell: A Survey on Deep Learning-Based Image Captioning
Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2022) Vol. 45, Iss. 1, pp. 539-559
Open Access | Times Cited: 210
Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2022) Vol. 45, Iss. 1, pp. 539-559
Open Access | Times Cited: 210
Vision-Language Pre-Training with Triple Contrastive Learning
Jinyu Yang, Jiali Duan, Son N. Tran, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 15650-15659
Open Access | Times Cited: 181
Jinyu Yang, Jiali Duan, Son N. Tran, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 15650-15659
Open Access | Times Cited: 181
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Wenhui Wang, Hangbo Bao, Dong Li, et al.
arXiv (Cornell University) (2021)
Open Access | Times Cited: 173
Wenhui Wang, Hangbo Bao, Dong Li, et al.
arXiv (Cornell University) (2021)
Open Access | Times Cited: 173
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang, Zhe Gan, Jianfeng Wang, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2022) Vol. 36, Iss. 3, pp. 3081-3089
Open Access | Times Cited: 152
Zhengyuan Yang, Zhe Gan, Jianfeng Wang, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2022) Vol. 36, Iss. 3, pp. 3081-3089
Open Access | Times Cited: 152
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen, Liunian Harold Li, Hao Tan, et al.
arXiv (Cornell University) (2021)
Open Access | Times Cited: 146
Sheng Shen, Liunian Harold Li, Hao Tan, et al.
arXiv (Cornell University) (2021)
Open Access | Times Cited: 146
Scaling Up Vision-Language Pretraining for Image Captioning
Xiaowei Hu, Zhe Gan, Jianfeng Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 145
Xiaowei Hu, Zhe Gan, Jianfeng Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 145
VLP: A Survey on Vision-language Pre-training
Feilong Chen, Duzhen Zhang, Minglun Han, et al.
Deleted Journal (2023) Vol. 20, Iss. 1, pp. 38-56
Open Access | Times Cited: 128
Feilong Chen, Duzhen Zhang, Minglun Han, et al.
Deleted Journal (2023) Vol. 20, Iss. 1, pp. 38-56
Open Access | Times Cited: 128
Transformers in Vision: A Survey
Salman Khan, Muzammal Naseer, Munawar Hayat, et al.
arXiv (Cornell University) (2021)
Closed Access | Times Cited: 125
Salman Khan, Muzammal Naseer, Munawar Hayat, et al.
arXiv (Cornell University) (2021)
Closed Access | Times Cited: 125
Unified Contrastive Learning in Image-Text-Label Space
Jianwei Yang, Chunyuan Li, Pengchuan Zhang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 19141-19151
Open Access | Times Cited: 123
Jianwei Yang, Chunyuan Li, Pengchuan Zhang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 19141-19151
Open Access | Times Cited: 123
Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li, Yifan Du, Kun Zhou, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2023)
Open Access | Times Cited: 123
Yifan Li, Yifan Du, Kun Zhou, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2023)
Open Access | Times Cited: 123
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
Tristan Thrush, Ryan Jiang, Max Bartolo, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 114
Tristan Thrush, Ryan Jiang, Max Bartolo, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 114
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning
Jun Chen, Han Guo, Kai Yi, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 18009-18019
Open Access | Times Cited: 98
Jun Chen, Han Guo, Kai Yi, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 18009-18019
Open Access | Times Cited: 98
A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge
Dustin Schwenk, Apoorv Khandelwal, Christopher Clark, et al.
Lecture notes in computer science (2022), pp. 146-162
Closed Access | Times Cited: 98
Dustin Schwenk, Apoorv Khandelwal, Christopher Clark, et al.
Lecture notes in computer science (2022), pp. 146-162
Closed Access | Times Cited: 98
MERLOT RESERVE: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers, Jiasen Lu, Ximing Lu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 16354-16366
Open Access | Times Cited: 98
Rowan Zellers, Jiasen Lu, Ximing Lu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 16354-16366
Open Access | Times Cited: 98
Towards Language-Free Training for Text-to-Image Generation
Yufan Zhou, Ruiyi Zhang, Changyou Chen, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 17886-17896
Closed Access | Times Cited: 97
Yufan Zhou, Ruiyi Zhang, Changyou Chen, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 17886-17896
Closed Access | Times Cited: 97