OpenAlex Citation Counts

OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!

If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.

Requested Article:

Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
Yan Zeng, Xinsong Zhang, Hang Li
arXiv (Cornell University) (2021)
Open Access | Times Cited: 89

Showing 1-25 of 89 citing articles:

ViperGPT: Visual Inference via Python Execution for Reasoning
Dídac Surís, Sachit Menon, Carl Vondrick
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 11854-11864
Open Access | Times Cited: 84

ULIP-2: Towards Scalable Multimodal Pre-Training for 3D Understanding
Le Xue, Ning Yu, Shu Zhang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 34, pp. 27081-27091
Closed Access | Times Cited: 33

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Sicong Leng, Hang Zhang, Guanzheng Chen, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 35, pp. 13872-13882
Closed Access | Times Cited: 15

Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision
Jilan Xu, Junlin Hou, Yuejie Zhang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 39

VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng, Xizi Wang, Jie Lei, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 38

Robust Multi-Drone Multi-Target Tracking to Resolve Target Occlusion: A Benchmark
Zhihao Liu, Yuanyuan Shang, Timing Li, et al.
IEEE Transactions on Multimedia (2023) Vol. 25, pp. 1462-1476
Closed Access | Times Cited: 35

BridgeTower: Building Bridges between Encoders in Vision-Language Representation Learning
Xu Xiao, Chenfei Wu, Shachar Rosenman, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2023) Vol. 37, Iss. 9, pp. 10637-10647
Open Access | Times Cited: 32

Detecting and Grounding Multi-Modal Media Manipulation
Rui Shao, Tianxing Wu, Ziwei Liu
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 29

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Chaoya Jiang, Haiyang Xu, Mengfan Dong, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 35, pp. 27026-27036
Closed Access | Times Cited: 12

HierVL: Learning Hierarchical Video-Language Embeddings
Kumar Ashutosh, Rohit Girdhar, Lorenzo Torresani, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) Vol. 114, pp. 23066-23078
Open Access | Times Cited: 20

Position-Guided Text Prompt for Vision-Language Pre-Training
Jinpeng Wang, Pan Zhou, Mike Zheng Shou, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 19

HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Chia-Wen Kuo, Zsolt Kira
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 11039-11049
Open Access | Times Cited: 18

Transferable Multimodal Attack on Vision-Language Pre-training Models
Haodi Wang, Kai Dong, Zhilei Zhu, et al.
2022 IEEE Symposium on Security and Privacy (SP) (2024) Vol. 34, pp. 1722-1740
Closed Access | Times Cited: 6

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Jack Urbanek, Florian Bordes, Pietro Astolfi, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. abs/2201.12086, pp. 26690-26699
Closed Access | Times Cited: 6

A Simple Framework for Text-Supervised Semantic Segmentation
Muyang Yi, Quan Cui, Hao Wu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Closed Access | Times Cited: 14

Open-vocabulary Attribute Detection
María A. Bravo, Sudhanshu Mittal, Simon Ging, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 14

Perceptual Grouping in Contrastive Vision-Language Models
Kanchana Ranasinghe, Brandon McKinzie, Sachin Ravi, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 5548-5561
Open Access | Times Cited: 14

TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
Kan Wu, Houwen Peng, Zhenghong Zhou, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 21913-21923
Open Access | Times Cited: 13

ReDiT: re-evaluating large visual question answering model confidence by defining input scenario difficulty and applying temperature mapping
Modafar Al-Shouha, Gábor Szűcs
Multimedia Systems (2025) Vol. 31, Iss. 1
Closed Access

Understand and Detect: Multi-step zero-shot detection with image-level specific prompt
Miaotian Guo, Kewei Wu, Zhuqing Jiang, et al.
Knowledge-Based Systems (2025) Vol. 311, pp. 113083-113083
Closed Access

Stimulating conversation-style emergencies of multi-modal LMs
順時湯浅, Bingquan Liu, Chengjie Sun, et al.
Information Fusion (2025), pp. 103047-103047
Closed Access

UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility
Yonglin Tian, Fei Lin, Yuqing Li, et al.
Information Fusion (2025), pp. 103158-103158
Closed Access

PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models
Yuan Yao, Qianyu Chen, Ao Zhang, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022), pp. 11104-11117
Open Access | Times Cited: 21

Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training
Yan Zeng, Wangchunshu Zhou, Ao Luo, et al.
(2023), pp. 5731-5746
Open Access | Times Cited: 11

MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
Jingjing Jiang, Nanning Zheng
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 24203-24213
Open Access | Times Cited: 11

Page 1 - Next Page

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category .
cookielawinfo-checkbox-functional	1 year	The cookie is set by the GDPR Cookie Consent plugin to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Necessary" category .
cookielawinfo-checkbox-others	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Others".
cookielawinfo-checkbox-performance	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.

Requested Article:

Showing 1-25 of 89 citing articles:

Your Privacy