
OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!
If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.
Requested Article:
Scaling Up Vision-Language Pretraining for Image Captioning
Xiaowei Hu, Zhe Gan, Jianfeng Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 146
Xiaowei Hu, Zhe Gan, Jianfeng Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 146
Showing 1-25 of 146 citing articles:
Multimodal Learning With Transformers: A Survey
Peng Xu, Xiatian Zhu, David A. Clifton
IEEE Transactions on Pattern Analysis and Machine Intelligence (2023) Vol. 45, Iss. 10, pp. 12113-12132
Open Access | Times Cited: 338
Peng Xu, Xiatian Zhu, David A. Clifton
IEEE Transactions on Pattern Analysis and Machine Intelligence (2023) Vol. 45, Iss. 10, pp. 12113-12132
Open Access | Times Cited: 338
From Show to Tell: A Survey on Deep Learning-Based Image Captioning
Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2022) Vol. 45, Iss. 1, pp. 539-559
Open Access | Times Cited: 213
Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2022) Vol. 45, Iss. 1, pp. 539-559
Open Access | Times Cited: 213
Reproducible Scaling Laws for Contrastive Language-Image Learning
Mehdi Cherti, Romain Beaumont, Ross Wightman, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 2818-2829
Closed Access | Times Cited: 198
Mehdi Cherti, Romain Beaumont, Ross Wightman, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 2818-2829
Closed Access | Times Cited: 198
A comprehensive survey on applications of transformers for deep learning tasks
Saidul Islam, Hanae Elmekki, Ahmed Elsebai, et al.
Expert Systems with Applications (2023) Vol. 241, pp. 122666-122666
Open Access | Times Cited: 106
Saidul Islam, Hanae Elmekki, Ahmed Elsebai, et al.
Expert Systems with Applications (2023) Vol. 241, pp. 122666-122666
Open Access | Times Cited: 106
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 97
Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 97
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
Chenliang Li, Haiyang Xu, Junfeng Tian, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022)
Open Access | Times Cited: 93
Chenliang Li, Haiyang Xu, Junfeng Tian, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022)
Open Access | Times Cited: 93
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai, Basil Mustafa, А. И. Колесников, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Open Access | Times Cited: 76
Xiaohua Zhai, Basil Mustafa, А. И. Колесников, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Open Access | Times Cited: 76
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi, Hamid Reza Pourreza, Hamidreza Mahyar
ACM Computing Surveys (2023) Vol. 56, Iss. 3, pp. 1-39
Open Access | Times Cited: 67
Taraneh Ghandi, Hamid Reza Pourreza, Hamidreza Mahyar
ACM Computing Surveys (2023) Vol. 56, Iss. 3, pp. 1-39
Open Access | Times Cited: 67
Learning Video Representations from Large Language Models
Yue Zhao, Ishan Misra, Philipp Krähenbühl, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 64
Yue Zhao, Ishan Misra, Philipp Krähenbühl, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 64
Smallcap: Lightweight Image Captioning Prompted with Retrieval Augmentation
Rita Ramos, Bruno Martins, Desmond Elliott, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 47
Rita Ramos, Bruno Martins, Desmond Elliott, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 47
GLaMM: Pixel Grounding Large Multimodal Model
Hanoona Rasheed, Muhammad Maaz, Sahal Shaji, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024), pp. 13009-13018
Closed Access | Times Cited: 24
Hanoona Rasheed, Muhammad Maaz, Sahal Shaji, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024), pp. 13009-13018
Closed Access | Times Cited: 24
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
Noam Rotstein, David Bensaïd, Shaked Brody, et al.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024), pp. 5677-5688
Open Access | Times Cited: 15
Noam Rotstein, David Bensaïd, Shaked Brody, et al.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024), pp. 5677-5688
Open Access | Times Cited: 15
The Unreasonable Effectiveness of CLIP Features for Image Captioning: An Experimental Analysis
Manuele Barraco, Marcella Cornia, Silvia Cascianelli, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2022)
Open Access | Times Cited: 54
Manuele Barraco, Marcella Cornia, Silvia Cascianelli, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2022)
Open Access | Times Cited: 54
Translation between Molecules and Natural Language
Carl K. Edwards, Tuan Lai, Kevin Ros, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022)
Open Access | Times Cited: 54
Carl K. Edwards, Tuan Lai, Kevin Ros, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022)
Open Access | Times Cited: 54
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
Zhengyuan Yang, Zhe Gan, Jianfeng Wang, et al.
Lecture notes in computer science (2022), pp. 521-539
Closed Access | Times Cited: 53
Zhengyuan Yang, Zhe Gan, Jianfeng Wang, et al.
Lecture notes in computer science (2022), pp. 521-539
Closed Access | Times Cited: 53
Text-Only Training for Image Captioning using Noise-Injected CLIP
David Nukrai, Ron Mokady, Amir Globerson
(2022)
Open Access | Times Cited: 42
David Nukrai, Ron Mokady, Amir Globerson
(2022)
Open Access | Times Cited: 42
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Wenliang Dai, Lu Hou, Lifeng Shang, et al.
Findings of the Association for Computational Linguistics: ACL 2022 (2022)
Open Access | Times Cited: 40
Wenliang Dai, Lu Hou, Lifeng Shang, et al.
Findings of the Association for Computational Linguistics: ACL 2022 (2022)
Open Access | Times Cited: 40
VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng, Xizi Wang, Jie Lei, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 38
Feng Cheng, Xizi Wang, Jie Lei, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 38
Deep image captioning: A review of methods, trends and future challenges
Liming Xu, Quan Tang, Jiancheng Lv, et al.
Neurocomputing (2023) Vol. 546, pp. 126287-126287
Closed Access | Times Cited: 29
Liming Xu, Quan Tang, Jiancheng Lv, et al.
Neurocomputing (2023) Vol. 546, pp. 126287-126287
Closed Access | Times Cited: 29
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing
Zequn Zeng, Hao Zhang, Ruiying Lu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 23465-23476
Open Access | Times Cited: 26
Zequn Zeng, Hao Zhang, Ruiying Lu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 23465-23476
Open Access | Times Cited: 26
MMNet: Multi-Collaboration and Multi-Supervision Network for Sequential Deepfake Detection
Ruiyang Xia, Decheng Liu, Jie Li, et al.
IEEE Transactions on Information Forensics and Security (2024) Vol. 19, pp. 3409-3422
Open Access | Times Cited: 8
Ruiyang Xia, Decheng Liu, Jie Li, et al.
IEEE Transactions on Information Forensics and Security (2024) Vol. 19, pp. 3409-3422
Open Access | Times Cited: 8
Universal and extensible language-vision models for organ segmentation and tumor detection from abdominal computed tomography
Jie Liu, Yixiao Zhang, Kang Wang, et al.
Medical Image Analysis (2024) Vol. 97, pp. 103226-103226
Open Access | Times Cited: 8
Jie Liu, Yixiao Zhang, Kang Wang, et al.
Medical Image Analysis (2024) Vol. 97, pp. 103226-103226
Open Access | Times Cited: 8
ViTs as backbones: Leveraging vision transformers for feature extraction
Omar Elharrouss, Yassine Himeur, Yasir Mahmood, et al.
Information Fusion (2025), pp. 102951-102951
Closed Access | Times Cited: 1
Omar Elharrouss, Yassine Himeur, Yasir Mahmood, et al.
Information Fusion (2025), pp. 102951-102951
Closed Access | Times Cited: 1
Controllable Image Captioning via Prompting
Ning Wang, Jiahao Xie, Jihao Wu, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2023) Vol. 37, Iss. 2, pp. 2617-2625
Open Access | Times Cited: 21
Ning Wang, Jiahao Xie, Jihao Wu, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2023) Vol. 37, Iss. 2, pp. 2617-2625
Open Access | Times Cited: 21
Context-aware Alignment and Mutual Masking for 3D-Language Pre-training
Jin Zhao, Munawar Hayat, Yuwei Yang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 10984-10994
Closed Access | Times Cited: 21
Jin Zhao, Munawar Hayat, Yuwei Yang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 10984-10994
Closed Access | Times Cited: 21