OpenAlex Citation Counts

OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!

If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.

Requested Article:

GRIT: Faster and Better Image Captioning Transformer Using Dual Visual Features
Van-Quang Nguyen, Masanori Suganuma, Takayuki Okatani
Lecture notes in computer science (2022), pp. 167-184
Open Access | Times Cited: 78

Showing 1-25 of 78 citing articles:

Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi, Hamid Reza Pourreza, Hamidreza Mahyar
ACM Computing Surveys (2023) Vol. 56, Iss. 3, pp. 1-39
Open Access | Times Cited: 69

KiUT: Knowledge-injected U-Transformer for Radiology Report Generation
Zhongzhen Huang, Xiaofan Zhang, Shaoting Zhang
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 19809-19818
Open Access | Times Cited: 43

ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing
Zequn Zeng, Hao Zhang, Ruiying Lu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 23465-23476
Open Access | Times Cited: 27

Memory-Based Augmentation Network for Video Captioning
Shuaiqi Jing, Haonan Zhang, Pengpeng Zeng, et al.
IEEE Transactions on Multimedia (2023) Vol. 26, pp. 2367-2379
Closed Access | Times Cited: 25

LGR-NET: Language Guided Reasoning Network for Referring Expression Comprehension
Mingcong Lu, Ruifan Li, Fangxiang Feng, et al.
IEEE Transactions on Circuits and Systems for Video Technology (2024) Vol. 34, Iss. 8, pp. 7771-7784
Closed Access | Times Cited: 11

MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng, Yan Xie, Hao Zhang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 35, pp. 14100-14110
Closed Access | Times Cited: 9

Leveraging ensemble deep models and llm for visual polysemy and word sense disambiguation
Insaf Setitra, Praboda Rajapaksha, Aung Kaung Myat, et al.
Multimedia Tools and Applications (2025)
Closed Access | Times Cited: 1

End-to-End 3D Dense Captioning with Vote2Cap-DETR
Sijin Chen, Hongyuan Zhu, Xin Chen, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) Vol. 41, pp. 11124-11133
Open Access | Times Cited: 21

HierVL: Learning Hierarchical Video-Language Embeddings
Kumar Ashutosh, Rohit Girdhar, Lorenzo Torresani, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) Vol. 114, pp. 23066-23078
Open Access | Times Cited: 20

Improving visual question answering for bridge inspection by pre‐training with external data of image–text pairs
Thannarot Kunlamai, Tatsuro Yamane, Masanori Suganuma, et al.
Computer-Aided Civil and Infrastructure Engineering (2023) Vol. 39, Iss. 3, pp. 345-361
Open Access | Times Cited: 19

Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning
Sijin Chen, Hongyuan Zhu, Mingsheng Li, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2024) Vol. 46, Iss. 11, pp. 7331-7347
Open Access | Times Cited: 7

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada, Kanta Kaneda, Daichi Saito, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 37, pp. 13559-13568
Closed Access | Times Cited: 7

Embedded Heterogeneous Attention Transformer for Cross-Lingual Image Captioning
Zijie Song, Zhenzhen Hu, Yuanen Zhou, et al.
IEEE Transactions on Multimedia (2024) Vol. 26, pp. 9008-9020
Closed Access | Times Cited: 5

ExpansionNet v2: Block Static Expansion in fast end to end training for Image Captioning
Jia Cheng Hu, Roberto Cavicchioli, Alessandro Capotondi
arXiv (Cornell University) (2022)
Open Access | Times Cited: 19

Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates
Nicholas Moratelli, Manuele Barraco, Davide Morelli, et al.
Sensors (2023) Vol. 23, Iss. 3, pp. 1286-1286
Open Access | Times Cited: 12

Detours for Navigating Instructional Videos
Kumar Ashutosh, Zihui Xue, Tushar Nagarajan, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 33, pp. 18804-18815
Closed Access | Times Cited: 4

SPT: Spatial Pyramid Transformer for Image Captioning
Haonan Zhang, Pengpeng Zeng, Lianli Gao, et al.
IEEE Transactions on Circuits and Systems for Video Technology (2023) Vol. 34, Iss. 6, pp. 4829-4842
Closed Access | Times Cited: 10

SCAP: enhancing image captioning through lightweight feature sifting and hierarchical decoding
Yuhao Zhang, Jiaqi Tong, Honglin Liu
The Visual Computer (2025)
Closed Access

Scene graph sorting and shuffle polishing based controllable image captioning
Guichang Wu, Qian Zhao, Xiushu Liu
Signal Image and Video Processing (2025) Vol. 19, Iss. 4
Closed Access

CDZL: a controllable diversity zero-shot image caption model using large language models
Xin Zhao, Weiwei Kong, Zongyao Liu, et al.
Signal Image and Video Processing (2025) Vol. 19, Iss. 4
Closed Access

Dual-visual collaborative enhanced transformer for image captioning
Zhenping Mou, Tianqi Song, Luo Hong
Multimedia Systems (2025) Vol. 31, Iss. 2
Closed Access

Multimodal artificial intelligence approaches using large language models for expert‐level landslide image analysis
Kittitouch Areerob, Van‐Quang Nguyen, Xianfeng Li, et al.
Computer-Aided Civil and Infrastructure Engineering (2025)
Open Access

Enhancing visual contextual semantic information for image captioning
Ronggui Wang, Shuo Li, Lixia Xue, et al.
International Journal of Machine Learning and Cybernetics (2025)
Closed Access

Positional-enhanced and normalized transformer for image captioning
Faming Gong, Shi Zhong, Xingfang Zhao
Signal Image and Video Processing (2025) Vol. 19, Iss. 7
Closed Access

Image Captioning With Controllable and Adaptive Length Levels
Ning Ding, Chaorui Deng, Mingkui Tan, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2023) Vol. 46, Iss. 2, pp. 764-779
Closed Access | Times Cited: 9

Page 1 - Next Page

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category .
cookielawinfo-checkbox-functional	1 year	The cookie is set by the GDPR Cookie Consent plugin to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Necessary" category .
cookielawinfo-checkbox-others	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Others".
cookielawinfo-checkbox-performance	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.

Requested Article:

Showing 1-25 of 78 citing articles:

Your Privacy