OpenAlex Citation Counts

OpenAlex Citations Logo

OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!

If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.

Requested Article:

VinVL: Revisiting Visual Representations in Vision-Language Models
Pengchuan Zhang, Xiujun Li, Xiaowei Hu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 5575-5584
Open Access | Times Cited: 659

Showing 1-25 of 659 citing articles:

Transformers in Vision: A Survey
Salman Khan, Muzammal Naseer, Munawar Hayat, et al.
ACM Computing Surveys (2022) Vol. 54, Iss. 10s, pp. 1-41
Open Access | Times Cited: 1896

Grounded Language-Image Pre-training
Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 10955-10965
Open Access | Times Cited: 461

Multimodal Learning With Transformers: A Survey
Peng Xu, Xiatian Zhu, David A. Clifton
IEEE Transactions on Pattern Analysis and Machine Intelligence (2023) Vol. 45, Iss. 10, pp. 12113-12132
Open Access | Times Cited: 325

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang, Jiahui Yu, Adams Wei Yu, et al.
arXiv (Cornell University) (2021)
Open Access | Times Cited: 300

FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 15617-15629
Open Access | Times Cited: 294

RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 16772-16782
Open Access | Times Cited: 278

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
Pengchuan Zhang, Xiyang Dai, Jianwei Yang, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), pp. 2978-2988
Open Access | Times Cited: 258

Image as a Foreign Language: BEIT Pretraining for Vision and Vision-Language Tasks
Wenhui Wang, Hangbo Bao, Dong Li, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 19175-19186
Closed Access | Times Cited: 254

A Survey of Visual Transformers
Yang Liu, Yao Zhang, Yixin Wang, et al.
IEEE Transactions on Neural Networks and Learning Systems (2023) Vol. 35, Iss. 6, pp. 7478-7498
Open Access | Times Cited: 244

An Empirical Study of Training End-to-End Vision-and-Language Transformers
Zi-Yi Dou, Yichong Xu, Zhe Gan, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 216

From Show to Tell: A Survey on Deep Learning-Based Image Captioning
Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2022) Vol. 45, Iss. 1, pp. 539-559
Open Access | Times Cited: 210

Vision-Language Pre-Training with Triple Contrastive Learning
Jinyu Yang, Jiali Duan, Son N. Tran, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 15650-15659
Open Access | Times Cited: 181

VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Wenhui Wang, Hangbo Bao, Dong Li, et al.
arXiv (Cornell University) (2021)
Open Access | Times Cited: 173

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang, Zhe Gan, Jianfeng Wang, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2022) Vol. 36, Iss. 3, pp. 3081-3089
Open Access | Times Cited: 152

How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen, Liunian Harold Li, Hao Tan, et al.
arXiv (Cornell University) (2021)
Open Access | Times Cited: 146

Scaling Up Vision-Language Pretraining for Image Captioning
Xiaowei Hu, Zhe Gan, Jianfeng Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 145

VLP: A Survey on Vision-language Pre-training
Feilong Chen, Duzhen Zhang, Minglun Han, et al.
Deleted Journal (2023) Vol. 20, Iss. 1, pp. 38-56
Open Access | Times Cited: 128

Transformers in Vision: A Survey
Salman Khan, Muzammal Naseer, Munawar Hayat, et al.
arXiv (Cornell University) (2021)
Closed Access | Times Cited: 125

Unified Contrastive Learning in Image-Text-Label Space
Jianwei Yang, Chunyuan Li, Pengchuan Zhang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 19141-19151
Open Access | Times Cited: 123

Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li, Yifan Du, Kun Zhou, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2023)
Open Access | Times Cited: 123

Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
Tristan Thrush, Ryan Jiang, Max Bartolo, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 114

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning
Jun Chen, Han Guo, Kai Yi, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 18009-18019
Open Access | Times Cited: 98

A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge
Dustin Schwenk, Apoorv Khandelwal, Christopher Clark, et al.
Lecture notes in computer science (2022), pp. 146-162
Closed Access | Times Cited: 98

MERLOT RESERVE: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers, Jiasen Lu, Ximing Lu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 16354-16366
Open Access | Times Cited: 98

Towards Language-Free Training for Text-to-Image Generation
Yufan Zhou, Ruiyi Zhang, Changyou Chen, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 17886-17896
Closed Access | Times Cited: 97

Page 1 - Next Page

Scroll to top