
OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!
If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.
Requested Article:
Monkey: Image Resolution and Text Label are Important Things for Large Multi-Modal Models
Zhang Li, Biao Yang, Qiang Liu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 35, pp. 26753-26763
Closed Access | Times Cited: 22
Zhang Li, Biao Yang, Qiang Liu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 35, pp. 26753-26763
Closed Access | Times Cited: 22
Showing 22 citing articles:
A Survey on Multimodal Large Language Models
Shukang Yin, Chaoyou Fu, Sirui Zhao, et al.
National Science Review (2024) Vol. 11, Iss. 12
Open Access | Times Cited: 71
Shukang Yin, Chaoyou Fu, Sirui Zhao, et al.
National Science Review (2024) Vol. 11, Iss. 12
Open Access | Times Cited: 71
Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen, Jiannan Wu, Wenhai Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024), pp. 24185-24198
Closed Access | Times Cited: 38
Zhe Chen, Jiannan Wu, Wenhai Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024), pp. 24185-24198
Closed Access | Times Cited: 38
LMEye: An Interactive Perception Network for Large Language Models
Yunxin Li, Baotian Hu, Xinyu Chen, et al.
IEEE Transactions on Multimedia (2024) Vol. 26, pp. 10952-10964
Open Access | Times Cited: 6
Yunxin Li, Baotian Hu, Xinyu Chen, et al.
IEEE Transactions on Multimedia (2024) Vol. 26, pp. 10952-10964
Open Access | Times Cited: 6
Has multimodal learning delivered universal intelligence in healthcare? A comprehensive survey
Qika Lin, Y. C. Zhu, Mei Xin, et al.
Information Fusion (2024), pp. 102795-102795
Open Access | Times Cited: 5
Qika Lin, Y. C. Zhu, Mei Xin, et al.
Information Fusion (2024), pp. 102795-102795
Open Access | Times Cited: 5
The Blessing of Depth Anything: An Almost Unsupervised Approach to Crop Segmentation with Depth-Informed Pseudo Labeling
Shuyu Cao, Binghui Xu, Wei Zhou, et al.
Plant Phenomics (2025), pp. 100005-100005
Open Access
Shuyu Cao, Binghui Xu, Wei Zhou, et al.
Plant Phenomics (2025), pp. 100005-100005
Open Access
AI Computing Systems for Large Language Models Training
Zhenxing Zhang, Yuanbo Wen, Hairong Lyu, et al.
Journal of Computer Science and Technology (2025) Vol. 40, Iss. 1, pp. 6-41
Closed Access
Zhenxing Zhang, Yuanbo Wen, Hairong Lyu, et al.
Journal of Computer Science and Technology (2025) Vol. 40, Iss. 1, pp. 6-41
Closed Access
MM1: Methods, Analysis and Insights from Multimodal LLM Pre-training
Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, et al.
Lecture notes in computer science (2024), pp. 304-323
Closed Access | Times Cited: 4
Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, et al.
Lecture notes in computer science (2024), pp. 304-323
Closed Access | Times Cited: 4
LLaVA-UHD: An LMM Perceiving Any Aspect Ratio and High-Resolution Images
Zonghao Guo, Ruyi Xu, Yuan Yao, et al.
Lecture notes in computer science (2024), pp. 390-406
Closed Access | Times Cited: 4
Zonghao Guo, Ruyi Xu, Yuan Yao, et al.
Lecture notes in computer science (2024), pp. 390-406
Closed Access | Times Cited: 4
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Liang Chen, Haozhe Zhao, Tianyu Liu, et al.
Lecture notes in computer science (2024), pp. 19-35
Closed Access | Times Cited: 3
Liang Chen, Haozhe Zhao, Tianyu Liu, et al.
Lecture notes in computer science (2024), pp. 19-35
Closed Access | Times Cited: 3
AutoGraph: Enabling Visual Context via Graph Alignment in Open Domain Multi-Modal Dialogue Generation
Deji Zhao, Donghong Han, Ye Yuan, et al.
(2024), pp. 2079-2088
Closed Access | Times Cited: 2
Deji Zhao, Donghong Han, Ye Yuan, et al.
(2024), pp. 2079-2088
Closed Access | Times Cited: 2
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Keen You, Haotian Zhang, Eldon Schoop, et al.
Lecture notes in computer science (2024), pp. 240-255
Closed Access | Times Cited: 1
Keen You, Haotian Zhang, Eldon Schoop, et al.
Lecture notes in computer science (2024), pp. 240-255
Closed Access | Times Cited: 1
Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs
Shi Liu, Kecheng Zheng, Wei Chen
Lecture notes in computer science (2024), pp. 125-140
Closed Access | Times Cited: 1
Shi Liu, Kecheng Zheng, Wei Chen
Lecture notes in computer science (2024), pp. 125-140
Closed Access | Times Cited: 1
TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang, Yanzhe Zhang, Jian Chen, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 35, pp. 22584-22594
Closed Access
Ruiyi Zhang, Yanzhe Zhang, Jian Chen, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 35, pp. 22584-22594
Closed Access
Towards Cross-Domain Multimodal Automated Service Regulation Systems
Jianwei Yin, Tiancheng Zhao, Li Kuang
(2024) Vol. 37, pp. 426-436
Closed Access
Jianwei Yin, Tiancheng Zhao, Li Kuang
(2024) Vol. 37, pp. 426-436
Closed Access
Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning
Jing Tan, Anissa Mokraoui, Ban-Hoe Kwan, et al.
(2024) Vol. 10, pp. 79-84
Closed Access
Jing Tan, Anissa Mokraoui, Ban-Hoe Kwan, et al.
(2024) Vol. 10, pp. 79-84
Closed Access
Mitigating Hallucination in Visual-Language Models via Re-balancing Contrastive Decoding
Xiaoyu Liang, Jiayuan Yu, Lianrui Mu, et al.
Lecture notes in computer science (2024), pp. 482-496
Closed Access
Xiaoyu Liang, Jiayuan Yu, Lianrui Mu, et al.
Lecture notes in computer science (2024), pp. 482-496
Closed Access
PSALM: Pixelwise SegmentAtion with Large Multi-modal Model
Zheng Zhang, Yeyao Ma, Enming Zhang, et al.
Lecture notes in computer science (2024), pp. 74-91
Closed Access
Zheng Zhang, Yeyao Ma, Enming Zhang, et al.
Lecture notes in computer science (2024), pp. 74-91
Closed Access
Enhancing Visual Information Extraction with Large Language Models Through Layout-Aware Instruction Tuning
Teng Li, Jiapeng Wang, Lianwen Jin
Lecture notes in computer science (2024), pp. 276-289
Closed Access
Teng Li, Jiapeng Wang, Lianwen Jin
Lecture notes in computer science (2024), pp. 276-289
Closed Access
WAS: Dataset and Methods for Artistic Text Segmentation
Xudong Xie, Yuzhe Li, Yang Liu, et al.
Lecture notes in computer science (2024), pp. 237-254
Closed Access
Xudong Xie, Yuzhe Li, Yang Liu, et al.
Lecture notes in computer science (2024), pp. 237-254
Closed Access
Large Vision-Language Model Security: A Survey
Taowen Wang, Fang Zheng, Haochen Xue, et al.
Communications in computer and information science (2024), pp. 3-22
Closed Access
Taowen Wang, Fang Zheng, Haochen Xue, et al.
Communications in computer and information science (2024), pp. 3-22
Closed Access
Multimodal Mamba: A Versatile Multimodal Model for Seamless Integration into Diverse Downstream Tasks
Z. H. Li, Guibo Zhu, Dongyi Yi, et al.
(2024), pp. 303-313
Closed Access
Z. H. Li, Guibo Zhu, Dongyi Yi, et al.
(2024), pp. 303-313
Closed Access
ElderEase AR: Enhancing Elderly Daily Living with the Multimodal Large Language Model and Augmented Reality
Tianyu Song, Zhengyi Liu, Ruibin Zhao, et al.
(2024), pp. 60-67
Closed Access
Tianyu Song, Zhengyi Liu, Ruibin Zhao, et al.
(2024), pp. 60-67
Closed Access