OpenAlex Citation Counts

OpenAlex Citations Logo

OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!

If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.

Requested Article:

Monkey: Image Resolution and Text Label are Important Things for Large Multi-Modal Models
Zhang Li, Biao Yang, Qiang Liu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 35, pp. 26753-26763
Closed Access | Times Cited: 22

Showing 22 citing articles:

A Survey on Multimodal Large Language Models
Shukang Yin, Chaoyou Fu, Sirui Zhao, et al.
National Science Review (2024) Vol. 11, Iss. 12
Open Access | Times Cited: 71

Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen, Jiannan Wu, Wenhai Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024), pp. 24185-24198
Closed Access | Times Cited: 38

LMEye: An Interactive Perception Network for Large Language Models
Yunxin Li, Baotian Hu, Xinyu Chen, et al.
IEEE Transactions on Multimedia (2024) Vol. 26, pp. 10952-10964
Open Access | Times Cited: 6

Has multimodal learning delivered universal intelligence in healthcare? A comprehensive survey
Qika Lin, Y. C. Zhu, Mei Xin, et al.
Information Fusion (2024), pp. 102795-102795
Open Access | Times Cited: 5

AI Computing Systems for Large Language Models Training
Zhenxing Zhang, Yuanbo Wen, Hairong Lyu, et al.
Journal of Computer Science and Technology (2025) Vol. 40, Iss. 1, pp. 6-41
Closed Access

MM1: Methods, Analysis and Insights from Multimodal LLM Pre-training
Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, et al.
Lecture notes in computer science (2024), pp. 304-323
Closed Access | Times Cited: 4

LLaVA-UHD: An LMM Perceiving Any Aspect Ratio and High-Resolution Images
Zonghao Guo, Ruyi Xu, Yuan Yao, et al.
Lecture notes in computer science (2024), pp. 390-406
Closed Access | Times Cited: 4

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Liang Chen, Haozhe Zhao, Tianyu Liu, et al.
Lecture notes in computer science (2024), pp. 19-35
Closed Access | Times Cited: 3

AutoGraph: Enabling Visual Context via Graph Alignment in Open Domain Multi-Modal Dialogue Generation
Deji Zhao, Donghong Han, Ye Yuan, et al.
(2024), pp. 2079-2088
Closed Access | Times Cited: 2

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Keen You, Haotian Zhang, Eldon Schoop, et al.
Lecture notes in computer science (2024), pp. 240-255
Closed Access | Times Cited: 1

Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs
Shi Liu, Kecheng Zheng, Wei Chen
Lecture notes in computer science (2024), pp. 125-140
Closed Access | Times Cited: 1

TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang, Yanzhe Zhang, Jian Chen, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 35, pp. 22584-22594
Closed Access

Towards Cross-Domain Multimodal Automated Service Regulation Systems
Jianwei Yin, Tiancheng Zhao, Li Kuang
(2024) Vol. 37, pp. 426-436
Closed Access

Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning
Jing Tan, Anissa Mokraoui, Ban-Hoe Kwan, et al.
(2024) Vol. 10, pp. 79-84
Closed Access

Mitigating Hallucination in Visual-Language Models via Re-balancing Contrastive Decoding
Xiaoyu Liang, Jiayuan Yu, Lianrui Mu, et al.
Lecture notes in computer science (2024), pp. 482-496
Closed Access

PSALM: Pixelwise SegmentAtion with Large Multi-modal Model
Zheng Zhang, Yeyao Ma, Enming Zhang, et al.
Lecture notes in computer science (2024), pp. 74-91
Closed Access

Enhancing Visual Information Extraction with Large Language Models Through Layout-Aware Instruction Tuning
Teng Li, Jiapeng Wang, Lianwen Jin
Lecture notes in computer science (2024), pp. 276-289
Closed Access

WAS: Dataset and Methods for Artistic Text Segmentation
Xudong Xie, Yuzhe Li, Yang Liu, et al.
Lecture notes in computer science (2024), pp. 237-254
Closed Access

Large Vision-Language Model Security: A Survey
Taowen Wang, Fang Zheng, Haochen Xue, et al.
Communications in computer and information science (2024), pp. 3-22
Closed Access

Multimodal Mamba: A Versatile Multimodal Model for Seamless Integration into Diverse Downstream Tasks
Z. H. Li, Guibo Zhu, Dongyi Yi, et al.
(2024), pp. 303-313
Closed Access

ElderEase AR: Enhancing Elderly Daily Living with the Multimodal Large Language Model and Augmented Reality
Tianyu Song, Zhengyi Liu, Ruibin Zhao, et al.
(2024), pp. 60-67
Closed Access

Page 1

Scroll to top