OpenAlex Citation Counts

OpenAlex Citations Logo

OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!

If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.

Requested Article:

VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Wenhui Wang, Hangbo Bao, Dong Li, et al.
arXiv (Cornell University) (2021)
Open Access | Times Cited: 174

Showing 1-25 of 174 citing articles:

Image as a Foreign Language: BEIT Pretraining for Vision and Vision-Language Tasks
Wenhui Wang, Hangbo Bao, Dong Li, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 19175-19186
Closed Access | Times Cited: 260

MedCLIP: Contrastive Learning from Unpaired Medical Images and Text
Zifeng Wang, Zhenbang Wu, D. C. Agarwal, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022)
Open Access | Times Cited: 192

Vision-Language Models for Vision Tasks: A Survey
J Zhang, Jiaxing Huang, Sheng Jin, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2024) Vol. 46, Iss. 8, pp. 5625-5644
Open Access | Times Cited: 111

Prompting Large Language Models with Answer Heuristics for Knowledge-Based Visual Question Answering
Zhenwei Shao, Yu Zhou, Meng Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 14974-14983
Closed Access | Times Cited: 96

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
Chenliang Li, Haiyang Xu, Junfeng Tian, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022)
Open Access | Times Cited: 93

UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition
Guimin Hu, Ting-En Lin, Yi Zhao, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022)
Open Access | Times Cited: 87

Review of large vision models and visual prompt engineering
Jiaqi Wang, Zhengliang Liu, Lin Zhao, et al.
Meta-Radiology (2023) Vol. 1, Iss. 3, pp. 100047-100047
Open Access | Times Cited: 77

A Survey of Vision-Language Pre-Trained Models
Yifan Du, Zikang Liu, Junyi Li, et al.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (2022), pp. 5436-5443
Open Access | Times Cited: 75

ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts
Zhida Feng, Zhenyu Zhang, Xintong Yu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 60

MixGen: A New Multi-Modal Data Augmentation
Xiaoshuai Hao, Yi Zhu, Srikar Appalaraju, et al.
(2023)
Open Access | Times Cited: 55

Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing
Shruthi Bannur, Stephanie L. Hyland, Qianchu Liu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 15016-15027
Open Access | Times Cited: 54

When brain-inspired AI meets AGI
Lin Zhao, Lu Zhang, Zihao Wu, et al.
Meta-Radiology (2023) Vol. 1, Iss. 1, pp. 100005-100005
Open Access | Times Cited: 48

Multimodal Large Language Models: A Survey
Jiayang Wu, Wensheng Gan, Zefeng Chen, et al.
2021 IEEE International Conference on Big Data (Big Data) (2023)
Open Access | Times Cited: 46

VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
Junjie Ke, Keren Ye, Jiahui Yu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 10041-10051
Open Access | Times Cited: 41

GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text
Pengfei Liu, Yiming Ren, Jun Tao, et al.
Computers in Biology and Medicine (2024) Vol. 171, pp. 108073-108073
Open Access | Times Cited: 22

USER: Unified Semantic Enhancement With Momentum Contrast for Image-Text Retrieval
Yan Zhang, Zhong Ji, Di Wang, et al.
IEEE Transactions on Image Processing (2024) Vol. 33, pp. 595-609
Open Access | Times Cited: 17

CLIP Models are Few-Shot Learners: Empirical Studies on VQA and Visual Entailment
Haoyu Song, Dong Li, Weinan Zhang, et al.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2022)
Open Access | Times Cited: 65

Align and Attend: Multimodal Summarization with Dual Contrastive Losses
Bo He, Jun Wang, Jielin Qiu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 14867-14878
Open Access | Times Cited: 36

BridgeTower: Building Bridges between Encoders in Vision-Language Representation Learning
Xu Xiao, Chenfei Wu, Shachar Rosenman, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2023) Vol. 37, Iss. 9, pp. 10637-10647
Open Access | Times Cited: 32

Cross-modal Contrastive Learning for Multimodal Fake News Detection
Longzheng Wang, Chuang Zhang, Hongbo Xu, et al.
(2023), pp. 5696-5704
Open Access | Times Cited: 28

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Shraman Pramanick, Yale Song, Sayan Nag, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 5262-5274
Open Access | Times Cited: 23

X2-VLM: All-in-One Pre-Trained Model for Vision-Language Tasks
Yan Zeng, Xinsong Zhang, Hang Li, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2023) Vol. 46, Iss. 5, pp. 3156-3168
Open Access | Times Cited: 22

From image to language: A critical analysis of Visual Question Answering (VQA) approaches, challenges, and opportunities
Md Farhan Ishmam, Md Sakib Hossain Shovon, M. F. Mridha, et al.
Information Fusion (2024) Vol. 106, pp. 102270-102270
Open Access | Times Cited: 13

Prompting large language model with context and pre-answer for knowledge-based VQA
Zhongjian Hu, Peng Yang, Yuanshuang Jiang, et al.
Pattern Recognition (2024) Vol. 151, pp. 110399-110399
Closed Access | Times Cited: 11

Deep learning models for ischemic stroke lesion segmentation in medical images: A survey
Jialin Luo, Peishan Dai, Zhuang He, et al.
Computers in Biology and Medicine (2024) Vol. 175, pp. 108509-108509
Closed Access | Times Cited: 11

Page 1 - Next Page

Scroll to top