OpenAlex Citation Counts

OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!

If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.

Requested Article:

VinVL: Revisiting Visual Representations in Vision-Language Models
Pengchuan Zhang, Xiujun Li, Xiaowei Hu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 5575-5584
Open Access | Times Cited: 662

Showing 26-50 of 662 citing articles:

Prompting Large Language Models with Answer Heuristics for Knowledge-Based Visual Question Answering
Zhenwei Shao, Yu Zhou, Meng Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 14974-14983
Closed Access | Times Cited: 96

Generalized Decoding for Pixel, Image, and Language
Xueyan Zou, Zi-Yi Dou, Jianwei Yang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 15116-15127
Open Access | Times Cited: 95

Large-scale Multi-modal Pre-trained Models: A Comprehensive Survey
Xiao Wang, Guangyao Chen, Guangwu Qian, et al.
Deleted Journal (2023) Vol. 20, Iss. 4, pp. 447-482
Open Access | Times Cited: 92

A Survey on Long-Tailed Visual Recognition
Lu Yang, He Jiang, Qing Song, et al.
International Journal of Computer Vision (2022) Vol. 130, Iss. 7, pp. 1837-1872
Closed Access | Times Cited: 90

Everything at Once – Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova, Brian Chen, Andrew Rouditchenko, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 90

ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Yoad Tewel, Yoav Shalev, Idan Schwartz, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 17897-17907
Open Access | Times Cited: 85

General Facial Representation Learning in a Visual-Linguistic Manner
Yinglin Zheng, Hao Yang, Ting Zhang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 18676-18688
Open Access | Times Cited: 82

CLIP-Event: Connecting Text and Images with Event Structures
Manling Li, Ruochen Xu, Shuohang Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 16399-16408
Open Access | Times Cited: 80

A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation
Yutong Chen, Fangyun Wei, Xiao Sun, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 78

GRIT: Faster and Better Image Captioning Transformer Using Dual Visual Features
Van-Quang Nguyen, Masanori Suganuma, Takayuki Okatani
Lecture notes in computer science (2022), pp. 167-184
Open Access | Times Cited: 75

VLT: Vision-Language Transformer and Query Generation for Referring Segmentation
Henghui Ding, Chang Liu, Suchen Wang, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2022) Vol. 45, Iss. 6, pp. 7900-7916
Open Access | Times Cited: 73

ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension
Sanjay Subramanian, William Merrill, Trevor Darrell, et al.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2022), pp. 5198-5215
Open Access | Times Cited: 70

See Finer, See More: Implicit Modality Alignment for Text-Based Person Retrieval
Xiujun Shu, Wei Wen, Haoqian Wu, et al.
Lecture notes in computer science (2023), pp. 624-641
Closed Access | Times Cited: 68

Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi, Hamid Reza Pourreza, Hamidreza Mahyar
ACM Computing Surveys (2023) Vol. 56, Iss. 3, pp. 1-39
Open Access | Times Cited: 67

From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large Language Models
Jiaxian Guo, Junnan Li, Dongxu Li, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 10867-10877
Closed Access | Times Cited: 59

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Jiang Liu, Hui Ding, Zhaowei Cai, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 18653-18663
Open Access | Times Cited: 53

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Linjie Li, Zhe Gan, Kevin Lin, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 23119-23129
Open Access | Times Cited: 44

NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving Scenario
Tianwen Qian, Jingjing Chen, Linhai Zhuo, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 5, pp. 4542-4550
Open Access | Times Cited: 33

MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Yue Xiang, Yuansheng Ni, Tianyu Zheng, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 32, pp. 9556-9567
Closed Access | Times Cited: 31

CPT: Color-based Prompt Tuning for pre-trained vision-language models
Yuan Yao, Ao Zhang, Zhengyan Zhang, et al.
AI Open (2024)
Open Access | Times Cited: 28

Bootstrapping Interactive Image–Text Alignment for Remote Sensing Image Captioning
Cong Yang, Zuchao Li, Lefei Zhang
IEEE Transactions on Geoscience and Remote Sensing (2024) Vol. 62, pp. 1-12
Open Access | Times Cited: 16

FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
Noam Rotstein, David Bensaïd, Shaked Brody, et al.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024), pp. 5677-5688
Open Access | Times Cited: 15

Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
Yan Zeng, Xinsong Zhang, Hang Li
arXiv (Cornell University) (2021)
Open Access | Times Cited: 89

CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao, Ao Zhang, Zhengyan Zhang, et al.
arXiv (Cornell University) (2021)
Open Access | Times Cited: 84

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala, Marimuthu Kalimuthu, Dietrich Klakow
Journal of Artificial Intelligence Research (2021) Vol. 71, pp. 1183-1317
Open Access | Times Cited: 69

Previous Page - Page 2 - Next Page

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category .
cookielawinfo-checkbox-functional	1 year	The cookie is set by the GDPR Cookie Consent plugin to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Necessary" category .
cookielawinfo-checkbox-others	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Others".
cookielawinfo-checkbox-performance	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.

Requested Article:

Showing 26-50 of 662 citing articles:

Your Privacy