
OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!
If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.
Requested Article:
Scaling Up Vision-Language Pretraining for Image Captioning
Xiaowei Hu, Zhe Gan, Jianfeng Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 147
Xiaowei Hu, Zhe Gan, Jianfeng Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Open Access | Times Cited: 147
Showing 51-75 of 147 citing articles:
Transforming Visual Scene Graphs to Image Captions
Xu Yang, Jiawei Peng, Zihua Wang, et al.
(2023), pp. 12427-12440
Open Access | Times Cited: 9
Xu Yang, Jiawei Peng, Zihua Wang, et al.
(2023), pp. 12427-12440
Open Access | Times Cited: 9
Image Captioning With Controllable and Adaptive Length Levels
Ning Ding, Chaorui Deng, Mingkui Tan, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2023) Vol. 46, Iss. 2, pp. 764-779
Closed Access | Times Cited: 9
Ning Ding, Chaorui Deng, Mingkui Tan, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2023) Vol. 46, Iss. 2, pp. 764-779
Closed Access | Times Cited: 9
Towards Models that Can See and Read
Roy Ganz, Oren Nuriel, Aviad Aberdam, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 21661-21671
Open Access | Times Cited: 9
Roy Ganz, Oren Nuriel, Aviad Aberdam, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 21661-21671
Open Access | Times Cited: 9
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation
Bang Yang, Fenglin Liu, Yuexian Zou, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2024) Vol. 46, Iss. 8, pp. 5712-5724
Open Access | Times Cited: 3
Bang Yang, Fenglin Liu, Yuexian Zou, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2024) Vol. 46, Iss. 8, pp. 5712-5724
Open Access | Times Cited: 3
Improving Cross-Modal Alignment with Synthetic Pairs for Text-Only Image Captioning
Zhiyue Liu, Jinyuan Liu, Fanrong Ma
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 4, pp. 3864-3872
Open Access | Times Cited: 3
Zhiyue Liu, Jinyuan Liu, Fanrong Ma
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 4, pp. 3864-3872
Open Access | Times Cited: 3
CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning
Zhaoheng Zheng, Haidong Zhu, Ram Nevatia
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024), pp. 1710-1720
Open Access | Times Cited: 3
Zhaoheng Zheng, Haidong Zhu, Ram Nevatia
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024), pp. 1710-1720
Open Access | Times Cited: 3
Detours for Navigating Instructional Videos
Kumar Ashutosh, Zihui Xue, Tushar Nagarajan, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 33, pp. 18804-18815
Closed Access | Times Cited: 3
Kumar Ashutosh, Zihui Xue, Tushar Nagarajan, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 33, pp. 18804-18815
Closed Access | Times Cited: 3
TF-FAS: Twofold-Element Fine-Grained Semantic Guidance for Generalizable Face Anti-spoofing
Xudong Wang, Ke-Yue Zhang, Taiping Yao, et al.
Lecture notes in computer science (2024), pp. 148-168
Closed Access | Times Cited: 3
Xudong Wang, Ke-Yue Zhang, Taiping Yao, et al.
Lecture notes in computer science (2024), pp. 148-168
Closed Access | Times Cited: 3
Arabic Image Captioning using Pre-training of Deep Bidirectional Transformers
Jonathan Emami, Pierre Nugues, Ashraf Elnagar, et al.
(2022), pp. 40-51
Open Access | Times Cited: 13
Jonathan Emami, Pierre Nugues, Ashraf Elnagar, et al.
(2022), pp. 40-51
Open Access | Times Cited: 13
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
Zhuolin Yang, Ping Wei, Zihan Liu, et al.
(2023), pp. 11844-11857
Open Access | Times Cited: 7
Zhuolin Yang, Ping Wei, Zihan Liu, et al.
(2023), pp. 11844-11857
Open Access | Times Cited: 7
Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning
Jia Cheng Hu, Roberto Cavicchioli, Alessandro Capotondi
2021 IEEE International Conference on Big Data (Big Data) (2023), pp. 2173-2182
Closed Access | Times Cited: 7
Jia Cheng Hu, Roberto Cavicchioli, Alessandro Capotondi
2021 IEEE International Conference on Big Data (Big Data) (2023), pp. 2173-2182
Closed Access | Times Cited: 7
Deep Learning Approaches for Image Captioning: Opportunities, Challenges and Future Potential
Azhar Jamil, Saif Ur Rehman, Khalid Mahmood, et al.
IEEE Access (2024), pp. 1-1
Open Access | Times Cited: 2
Azhar Jamil, Saif Ur Rehman, Khalid Mahmood, et al.
IEEE Access (2024), pp. 1-1
Open Access | Times Cited: 2
A survey on advancements in image–text multimodal models: From general techniques to biomedical implementations
Ruifeng Guo, Jingxuan Wei, Linzhuang Sun, et al.
Computers in Biology and Medicine (2024) Vol. 178, pp. 108709-108709
Closed Access | Times Cited: 2
Ruifeng Guo, Jingxuan Wei, Linzhuang Sun, et al.
Computers in Biology and Medicine (2024) Vol. 178, pp. 108709-108709
Closed Access | Times Cited: 2
Reproducible scaling laws for contrastive language-image learning
Mehdi Cherti, Romain Beaumont, Ross Wightman, et al.
arXiv (Cornell University) (2022)
Open Access | Times Cited: 11
Mehdi Cherti, Romain Beaumont, Ross Wightman, et al.
arXiv (Cornell University) (2022)
Open Access | Times Cited: 11
TAVT: Towards Transferable Audio-Visual Text Generation
Lin Wang, Tao Jin, Wenwen Pan, et al.
(2023), pp. 14983-14999
Open Access | Times Cited: 6
Lin Wang, Tao Jin, Wenwen Pan, et al.
(2023), pp. 14983-14999
Open Access | Times Cited: 6
A Review of Transformer-Based Approaches for Image Captioning
Oscar Ondeng, Heywood Ouma, Peter O. Akuon
Applied Sciences (2023) Vol. 13, Iss. 19, pp. 11103-11103
Open Access | Times Cited: 6
Oscar Ondeng, Heywood Ouma, Peter O. Akuon
Applied Sciences (2023) Vol. 13, Iss. 19, pp. 11103-11103
Open Access | Times Cited: 6
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco, Sara Sarto, Marcella Cornia, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Open Access | Times Cited: 6
Manuele Barraco, Sara Sarto, Marcella Cornia, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Open Access | Times Cited: 6
Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data
Vladislav Lialin, Stephen Rawls, David W. Chan, et al.
(2023), pp. 390-400
Open Access | Times Cited: 4
Vladislav Lialin, Stephen Rawls, David W. Chan, et al.
(2023), pp. 390-400
Open Access | Times Cited: 4
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, et al.
arXiv (Cornell University) (2023)
Open Access | Times Cited: 4
Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, et al.
arXiv (Cornell University) (2023)
Open Access | Times Cited: 4
From methods to datasets: A survey on Image-Caption Generators
Lakshita Agarwal, Bindu Verma
Multimedia Tools and Applications (2023) Vol. 83, Iss. 9, pp. 28077-28123
Closed Access | Times Cited: 4
Lakshita Agarwal, Bindu Verma
Multimedia Tools and Applications (2023) Vol. 83, Iss. 9, pp. 28077-28123
Closed Access | Times Cited: 4
Are metrics measuring what they should? An evaluation of Image Captioning task metrics
Othón González-Chávez, Guillermo Ruíz, Daniela Moctezuma, et al.
Signal Processing Image Communication (2023) Vol. 120, pp. 117071-117071
Open Access | Times Cited: 4
Othón González-Chávez, Guillermo Ruíz, Daniela Moctezuma, et al.
Signal Processing Image Communication (2023) Vol. 120, pp. 117071-117071
Open Access | Times Cited: 4
Efficient text-based query based on multi-level and deep-semantic multimedia indexing and retrieval
Mohamed Hamroun, Sonia Lajmi, Maryam Jallouli, et al.
Multimedia Tools and Applications (2023) Vol. 83, Iss. 18, pp. 55811-55850
Closed Access | Times Cited: 4
Mohamed Hamroun, Sonia Lajmi, Maryam Jallouli, et al.
Multimedia Tools and Applications (2023) Vol. 83, Iss. 18, pp. 55811-55850
Closed Access | Times Cited: 4
Learning to Follow and Generate Instructions for Language-Capable Navigation
Xiaohan Wang, Wenguan Wang, Jiayi Shao, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2023) Vol. 46, Iss. 5, pp. 3334-3350
Closed Access | Times Cited: 4
Xiaohan Wang, Wenguan Wang, Jiayi Shao, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2023) Vol. 46, Iss. 5, pp. 3334-3350
Closed Access | Times Cited: 4
Enhancing Visual Grounding in Vision-Language Pre-Training With Position-Guided Text Prompts
Jinpeng Wang, Pan Zhou, Mike Zheng Shou, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2023) Vol. 46, Iss. 5, pp. 3406-3421
Closed Access | Times Cited: 4
Jinpeng Wang, Pan Zhou, Mike Zheng Shou, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2023) Vol. 46, Iss. 5, pp. 3406-3421
Closed Access | Times Cited: 4
HACAN: a hierarchical answer-aware and context-aware network for question generation
Ruijun Sun, Hanqin Tao, Yanmin Chen, et al.
Frontiers of Computer Science (2023) Vol. 18, Iss. 5
Closed Access | Times Cited: 4
Ruijun Sun, Hanqin Tao, Yanmin Chen, et al.
Frontiers of Computer Science (2023) Vol. 18, Iss. 5
Closed Access | Times Cited: 4