
OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!
If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.
Requested Article:
CLIP-Event: Connecting Text and Images with Event Structures
Manling Li, Ruochen Xu, Shuohang Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 16399-16408
Open Access | Times Cited: 82
Manling Li, Ruochen Xu, Shuohang Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 16399-16408
Open Access | Times Cited: 82
Showing 1-25 of 82 citing articles:
Multimodal Learning With Transformers: A Survey
Peng Xu, Xiatian Zhu, David A. Clifton
IEEE Transactions on Pattern Analysis and Machine Intelligence (2023) Vol. 45, Iss. 10, pp. 12113-12132
Open Access | Times Cited: 338
Peng Xu, Xiatian Zhu, David A. Clifton
IEEE Transactions on Pattern Analysis and Machine Intelligence (2023) Vol. 45, Iss. 10, pp. 12113-12132
Open Access | Times Cited: 338
Large-scale Multi-modal Pre-trained Models: A Comprehensive Survey
Xiao Wang, Guangyao Chen, Guangwu Qian, et al.
Deleted Journal (2023) Vol. 20, Iss. 4, pp. 447-482
Open Access | Times Cited: 92
Xiao Wang, Guangyao Chen, Guangwu Qian, et al.
Deleted Journal (2023) Vol. 20, Iss. 4, pp. 447-482
Open Access | Times Cited: 92
Effective conditioned and composed image retrieval combining CLIP-based features
Alberto Baldrati, Marco Bertini, Tiberio Uricchio, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 21434-21442
Open Access | Times Cited: 73
Alberto Baldrati, Marco Bertini, Tiberio Uricchio, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 21434-21442
Open Access | Times Cited: 73
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
Jiadong Wang, Xinyuan Qian, Malu Zhang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) Vol. 9, pp. 14653-14662
Open Access | Times Cited: 44
Jiadong Wang, Xinyuan Qian, Malu Zhang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) Vol. 9, pp. 14653-14662
Open Access | Times Cited: 44
Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Paul Pu Liang, Amir Zadeh, Louis‐Philippe Morency
ACM Computing Surveys (2024) Vol. 56, Iss. 10, pp. 1-42
Open Access | Times Cited: 24
Paul Pu Liang, Amir Zadeh, Louis‐Philippe Morency
ACM Computing Surveys (2024) Vol. 56, Iss. 10, pp. 1-42
Open Access | Times Cited: 24
Conditioned and composed image retrieval combining and partially fine-tuning CLIP-based features
Alberto Baldrati, Marco Bertini, Tiberio Uricchio, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2022), pp. 4955-4964
Open Access | Times Cited: 44
Alberto Baldrati, Marco Bertini, Tiberio Uricchio, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2022), pp. 4955-4964
Open Access | Times Cited: 44
Verbs in Action: Improving verb understanding in video-language models
Liliane Momeni, Mathilde Caron, Arsha Nagrani, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Open Access | Times Cited: 25
Liliane Momeni, Mathilde Caron, Arsha Nagrani, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Open Access | Times Cited: 25
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Haoxuan You, Luowei Zhou, Bin Xiao, et al.
Lecture notes in computer science (2022), pp. 69-87
Closed Access | Times Cited: 28
Haoxuan You, Luowei Zhou, Bin Xiao, et al.
Lecture notes in computer science (2022), pp. 69-87
Closed Access | Times Cited: 28
Envisioning Narrative Intelligence: A Creative Visual Storytelling Anthology
Brett A. Halperin, Stephanie M. Lukin
(2023), pp. 1-21
Open Access | Times Cited: 16
Brett A. Halperin, Stephanie M. Lukin
(2023), pp. 1-21
Open Access | Times Cited: 16
RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-Training
Chen-Wei Xie, Siyang Sun, Xiong Xiong, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Closed Access | Times Cited: 13
Chen-Wei Xie, Siyang Sun, Xiong Xiong, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Closed Access | Times Cited: 13
VERITE: a Robust benchmark for multimodal misinformation detection accounting for unimodal bias
Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos, et al.
International Journal of Multimedia Information Retrieval (2024) Vol. 13, Iss. 1
Open Access | Times Cited: 5
Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos, et al.
International Journal of Multimedia Information Retrieval (2024) Vol. 13, Iss. 1
Open Access | Times Cited: 5
M2KGRL: A semantic-matching based framework for multimodal knowledge graph representation learning
Tao Chen, Tiexin Wang, Huihui Zhang, et al.
Expert Systems with Applications (2025) Vol. 269, pp. 126388-126388
Closed Access
Tao Chen, Tiexin Wang, Huihui Zhang, et al.
Expert Systems with Applications (2025) Vol. 269, pp. 126388-126388
Closed Access
Exploiting instance-label dynamics through reciprocal anchored contrastive learning for few-shot relation extraction
Yanglei Gan, Qiao Liu, Run Lin, et al.
Neural Networks (2025) Vol. 187, pp. 107259-107259
Closed Access
Yanglei Gan, Qiao Liu, Run Lin, et al.
Neural Networks (2025) Vol. 187, pp. 107259-107259
Closed Access
Multimedia event extraction based on multimodal low-dimensional feature representation space
Yiming Cui, Bin Sun, Tao Jiang, et al.
Signal Image and Video Processing (2025) Vol. 19, Iss. 5
Closed Access
Yiming Cui, Bin Sun, Tao Jiang, et al.
Signal Image and Video Processing (2025) Vol. 19, Iss. 5
Closed Access
Multi-axis fusion with optimal transport learning for multimodal aspect-based sentiment analysis
Li Jia, Tinghuai Ma, Huan Rong, et al.
Expert Systems with Applications (2025), pp. 127353-127353
Closed Access
Li Jia, Tinghuai Ma, Huan Rong, et al.
Expert Systems with Applications (2025), pp. 127353-127353
Closed Access
Training-Free Zero-Shot Composed Image Retrieval via Weighted Modality Fusion and Similarity
Rebecca Wu, Yanling Lin, Huei‐Fang Yang
Communications in computer and information science (2025), pp. 77-90
Closed Access
Rebecca Wu, Yanling Lin, Huei‐Fang Yang
Communications in computer and information science (2025), pp. 77-90
Closed Access
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval
Xudong Lin, Simran Tiwari, Shiyuan Huang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) Vol. 33, pp. 14846-14855
Open Access | Times Cited: 11
Xudong Lin, Simran Tiwari, Shiyuan Huang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) Vol. 33, pp. 14846-14855
Open Access | Times Cited: 11
Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
Sixun Dong, Huazhang Hu, Dongze Lian, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) Vol. 30, pp. 2437-2447
Open Access | Times Cited: 11
Sixun Dong, Huazhang Hu, Dongze Lian, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) Vol. 30, pp. 2437-2447
Open Access | Times Cited: 11
Multi-grained Gradual Inference Model for Multimedia Event Extraction
Yang Liu, Fang Liu, Licheng Jiao, et al.
IEEE Transactions on Circuits and Systems for Video Technology (2024) Vol. 34, Iss. 10, pp. 10507-10520
Closed Access | Times Cited: 4
Yang Liu, Fang Liu, Licheng Jiao, et al.
IEEE Transactions on Circuits and Systems for Video Technology (2024) Vol. 34, Iss. 10, pp. 10507-10520
Closed Access | Times Cited: 4
Probing the Symbolic Logical Reasoning Ability of Large Language Models
Jianchao Ji, Zelong Li, Shuyuan Xu, et al.
ACM Transactions on Intelligent Systems and Technology (2025)
Closed Access
Jianchao Ji, Zelong Li, Shuyuan Xu, et al.
ACM Transactions on Intelligent Systems and Technology (2025)
Closed Access
RESIN-11: Schema-guided Event Prediction for 11 Newsworthy Scenarios
Xinya Du, Zixuan Zhang, Sha Li, et al.
(2022), pp. 54-63
Open Access | Times Cited: 14
Xinya Du, Zixuan Zhang, Sha Li, et al.
(2022), pp. 54-63
Open Access | Times Cited: 14
Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval
Mustafa Shukor, Nicolas Thome, Matthieu Cord
(2023)
Open Access | Times Cited: 7
Mustafa Shukor, Nicolas Thome, Matthieu Cord
(2023)
Open Access | Times Cited: 7
Video Event Extraction with Multi-View Interaction Knowledge Distillation
Kaiwen Wei, Runyan Du, Jin Li, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 17, pp. 19224-19233
Open Access | Times Cited: 2
Kaiwen Wei, Runyan Du, Jin Li, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 17, pp. 19224-19233
Open Access | Times Cited: 2
Learning Generalizable Perceptual Representations for Data-Efficient No-Reference Image Quality Assessment
Suhas Srinath, Shankhanil Mitra, Shika Rao, et al.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024), pp. 22-31
Open Access | Times Cited: 2
Suhas Srinath, Shankhanil Mitra, Shika Rao, et al.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024), pp. 22-31
Open Access | Times Cited: 2
Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval
Mustafa Shukor, Nicolas Thome, Matthieu Cord
Computer Vision and Image Understanding (2024) Vol. 247, pp. 104071-104071
Open Access | Times Cited: 2
Mustafa Shukor, Nicolas Thome, Matthieu Cord
Computer Vision and Image Understanding (2024) Vol. 247, pp. 104071-104071
Open Access | Times Cited: 2