
OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!
If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.
Requested Article:
Multi-modal Alignment using Representation Codebook
Jiali Duan, Li‐Qun Chen, Son N. Tran, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 15630-15639
Open Access | Times Cited: 36
Jiali Duan, Li‐Qun Chen, Son N. Tran, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 15630-15639
Open Access | Times Cited: 36
Showing 1-25 of 36 citing articles:
Vision-Language Pre-Training with Triple Contrastive Learning
Jinyu Yang, Jiali Duan, Son N. Tran, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 15650-15659
Open Access | Times Cited: 183
Jinyu Yang, Jiali Duan, Son N. Tran, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 15650-15659
Open Access | Times Cited: 183
@ CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Zixian Ma, Jerry Hong, Mustafa Omer Gul, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) Vol. 33, pp. 10910-10921
Open Access | Times Cited: 39
Zixian Ma, Jerry Hong, Mustafa Omer Gul, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) Vol. 33, pp. 10910-10921
Open Access | Times Cited: 39
A survey of multimodal hybrid deep learning for computer vision: Architectures, applications, trends, and challenges
Khaled Bayoudh
Information Fusion (2023) Vol. 105, pp. 102217-102217
Closed Access | Times Cited: 32
Khaled Bayoudh
Information Fusion (2023) Vol. 105, pp. 102217-102217
Closed Access | Times Cited: 32
Cross-Modal Concept Learning and Inference for Vision-Language Models
Yi Zhang, Ce Zhang, Yushun Tang, et al.
Neurocomputing (2024) Vol. 583, pp. 127530-127530
Open Access | Times Cited: 9
Yi Zhang, Ce Zhang, Yushun Tang, et al.
Neurocomputing (2024) Vol. 583, pp. 127530-127530
Open Access | Times Cited: 9
Context-aware Alignment and Mutual Masking for 3D-Language Pre-training
Jin Zhao, Munawar Hayat, Yuwei Yang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 10984-10994
Closed Access | Times Cited: 21
Jin Zhao, Munawar Hayat, Yuwei Yang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 10984-10994
Closed Access | Times Cited: 21
Understanding and Constructing Latent Modality Structures in Multi-Modal Representation Learning
Qian Jiang, Changyou Chen, Han Zhao, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 7661-7671
Open Access | Times Cited: 17
Qian Jiang, Changyou Chen, Han Zhao, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 7661-7671
Open Access | Times Cited: 17
Concept-Guided Prompt Learning for Generalization in Vision-Language Models
Yi Zhang, Ce Zhang, Ke Yu, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 7, pp. 7377-7386
Open Access | Times Cited: 6
Yi Zhang, Ce Zhang, Ke Yu, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 7, pp. 7377-7386
Open Access | Times Cited: 6
Domain Aligned CLIP for Few-shot Classification
Muhammad Waleed Gondal, Jochen Gast, Inigo Alonso Ruiz, et al.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024)
Open Access | Times Cited: 5
Muhammad Waleed Gondal, Jochen Gast, Inigo Alonso Ruiz, et al.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024)
Open Access | Times Cited: 5
Multimedia event extraction based on multimodal low-dimensional feature representation space
Yiming Cui, Bin Sun, Tao Jiang, et al.
Signal Image and Video Processing (2025) Vol. 19, Iss. 5
Closed Access
Yiming Cui, Bin Sun, Tao Jiang, et al.
Signal Image and Video Processing (2025) Vol. 19, Iss. 5
Closed Access
Disentanglement and codebook learning-induced feature match network to diagnose neurodegenerative diseases on incomplete multimodal data
Wei Xiong, Tao Wang, Xiumei Chen, et al.
Pattern Recognition (2025), pp. 111597-111597
Closed Access
Wei Xiong, Tao Wang, Xiumei Chen, et al.
Pattern Recognition (2025), pp. 111597-111597
Closed Access
Learning to Adapt CLIP for Few-Shot Monocular Depth Estimation
Xueting Hu, Ce Zhang, Yi Zhang, et al.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024), pp. 5582-5591
Open Access | Times Cited: 4
Xueting Hu, Ce Zhang, Yi Zhang, et al.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024), pp. 5582-5591
Open Access | Times Cited: 4
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen, Longteng Guo, Jia Sun, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 2, pp. 1110-1119
Open Access | Times Cited: 3
Junyi Chen, Longteng Guo, Jia Sun, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 2, pp. 1110-1119
Open Access | Times Cited: 3
MultiCAD: Contrastive Representation Learning for Multi-modal 3D Computer-Aided Design Models
Weijian Ma, Minyang Xu, Xueyang Li, et al.
(2023), pp. 1766-1776
Closed Access | Times Cited: 8
Weijian Ma, Minyang Xu, Xueyang Li, et al.
(2023), pp. 1766-1776
Closed Access | Times Cited: 8
MAMO: Fine-Grained Vision-Language Representations Learning with Masked Multimodal Modeling
Zijia Zhao, Longteng Guo, Xingjian He, et al.
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (2023)
Open Access | Times Cited: 6
Zijia Zhao, Longteng Guo, Xingjian He, et al.
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (2023)
Open Access | Times Cited: 6
TOT:Topology-Aware Optimal Transport for Multimodal Hate Detection
Linhao Zhang, Jin Li, Xian Sun, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2023) Vol. 37, Iss. 4, pp. 4884-4892
Open Access | Times Cited: 6
Linhao Zhang, Jin Li, Xian Sun, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2023) Vol. 37, Iss. 4, pp. 4884-4892
Open Access | Times Cited: 6
Mobile image restoration via prior quantization
Shiqi Chen, Jingwen Zhou, Menghao Li, et al.
Pattern Recognition Letters (2023) Vol. 174, pp. 64-70
Open Access | Times Cited: 6
Shiqi Chen, Jingwen Zhou, Menghao Li, et al.
Pattern Recognition Letters (2023) Vol. 174, pp. 64-70
Open Access | Times Cited: 6
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Floris Weers, Vaishaal Shankar, Angelos Katharopoulos, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) Vol. 9, pp. 23432-23444
Open Access | Times Cited: 4
Floris Weers, Vaishaal Shankar, Angelos Katharopoulos, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) Vol. 9, pp. 23432-23444
Open Access | Times Cited: 4
Multi-Modal Representation Learning with Text-Driven Soft Masks
Jaeyoo Park, Bohyung Han
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 4
Jaeyoo Park, Bohyung Han
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 4
Multimodal Optimal Transport Knowledge Distillation for Cross-domain Recommendation
Wei Yang, Jie Yang, Yuan Liu
(2023), pp. 2959-2968
Closed Access | Times Cited: 4
Wei Yang, Jie Yang, Yuan Liu
(2023), pp. 2959-2968
Closed Access | Times Cited: 4
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery Banks
Yimu Wang, Xiangru Jian, Bo Xue
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2023), pp. 10542-10567
Open Access | Times Cited: 4
Yimu Wang, Xiangru Jian, Bo Xue
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2023), pp. 10542-10567
Open Access | Times Cited: 4
Unsupervised Prototype Adapter for Vision-Language Models
Yi Zhang, Ce Zhang, Xueting Hu, et al.
Lecture notes in computer science (2023), pp. 197-209
Closed Access | Times Cited: 4
Yi Zhang, Ce Zhang, Xueting Hu, et al.
Lecture notes in computer science (2023), pp. 197-209
Closed Access | Times Cited: 4
Learning incremental audio-visual representation for continual multimodal understanding
Boqing Zhu, Changjian Wang, Kele Xu, et al.
Knowledge-Based Systems (2024), pp. 112513-112513
Open Access | Times Cited: 1
Boqing Zhu, Changjian Wang, Kele Xu, et al.
Knowledge-Based Systems (2024), pp. 112513-112513
Open Access | Times Cited: 1
SCCS: Semantics-Consistent Cross-domain Summarization via Optimal Transport Alignment
Jielin Qiu, Jiacheng Zhu, Mengdi Xu, et al.
Findings of the Association for Computational Linguistics: ACL 2022 (2023), pp. 1584-1601
Open Access | Times Cited: 3
Jielin Qiu, Jiacheng Zhu, Mengdi Xu, et al.
Findings of the Association for Computational Linguistics: ACL 2022 (2023), pp. 1584-1601
Open Access | Times Cited: 3
Knowledge-Embedded Mutual Guidance for Visual Reasoning
Wenbo Zheng, Lan Yan, Long Chen, et al.
IEEE Transactions on Cybernetics (2023) Vol. 54, Iss. 4, pp. 2579-2591
Closed Access | Times Cited: 2
Wenbo Zheng, Lan Yan, Long Chen, et al.
IEEE Transactions on Cybernetics (2023) Vol. 54, Iss. 4, pp. 2579-2591
Closed Access | Times Cited: 2
Better Integrating Vision and Semantics for Improving Few-shot Classification
Zhuoling Li, Yong Wang
(2023), pp. 4737-4746
Closed Access | Times Cited: 2
Zhuoling Li, Yong Wang
(2023), pp. 4737-4746
Closed Access | Times Cited: 2