OpenAlex Citation Counts

OpenAlex Citations Logo

OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!

If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.

Requested Article:

MERLOT RESERVE: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers, Jiasen Lu, Ximing Lu, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 16354-16366
Open Access | Times Cited: 98

Showing 1-25 of 98 citing articles:

Objaverse: A Universe of Annotated 3D Objects
Matt Deitke, Dustin Schwenk, Jordi Salvador, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 13142-13153
Open Access | Times Cited: 224

VLP: A Survey on Vision-language Pre-training
Feilong Chen, Duzhen Zhang, Minglun Han, et al.
Deleted Journal (2023) Vol. 20, Iss. 1, pp. 38-56
Open Access | Times Cited: 128

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 98

All in One: Exploring Unified Video-Language Pre-Training
Jinpeng Wang, Yixiao Ge, Rui Yan, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 6598-6608
Open Access | Times Cited: 88

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Linjie Li, Zhe Gan, Kevin Lin, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 23119-23129
Open Access | Times Cited: 44

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
AJ Piergiovanni, Weicheng Kuo, Anelia Angelova
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 43

MIST : Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering
Difei Gao, Luowei Zhou, Lei Ji, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 14773-14783
Open Access | Times Cited: 40

FlexiViT: One Model for All Patch Sizes
Lucas Beyer, Pavel Izmailov, А. И. Колесников, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Open Access | Times Cited: 38

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
Qinghao Ye, Guohai Xu, Ming Yan, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 15359-15370
Open Access | Times Cited: 26

Verbs in Action: Improving verb understanding in video-language models
Liliane Momeni, Mathilde Caron, Arsha Nagrani, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Open Access | Times Cited: 25

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Shuhuai Ren, Linli Yao, Shicheng Li, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. abs/2305.06500, pp. 14313-14323
Closed Access | Times Cited: 12

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
Yi Wang, Kunchang Li, Xinhao Li, et al.
Lecture notes in computer science (2024), pp. 396-416
Closed Access | Times Cited: 10

Can I Trust Your Answer? Visually Grounded Video Question Answering
Junbin Xiao, Angela Yao, Yicong Li, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 35, pp. 13204-13214
Closed Access | Times Cited: 8

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu, Christopher M. Clark, Sang-Ho Lee, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 7, pp. 26429-26445
Closed Access | Times Cited: 8

VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering
Yanan Wang, Michihiro Yasunaga, Hongyu Ren, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 21525-21535
Open Access | Times Cited: 19

i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang, Yuwei Fang, Chenguang Zhu, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2023) Vol. 37, Iss. 9, pp. 10880-10890
Open Access | Times Cited: 18

Unified Coarse-to-Fine Alignment for Video-Text Retrieval
Ziyang Wang, Yi-Lin Sung, Feng Cheng, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 2804-2815
Open Access | Times Cited: 17

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Haoxuan You, Rui Sun, Zhecan Wang, et al.
(2023), pp. 11289-11303
Open Access | Times Cited: 16

Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
Mehmet Saygın Seyfioğlu, Wisdom O. Ikezogwo, Fatemeh Ghezloo, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 3, pp. 13183-13192
Closed Access | Times Cited: 6

EclipSE: Efficient Long-Range Video Retrieval Using Sight and Sound
Yan-Bo Lin, Jie Lei, Mohit Bansal, et al.
Lecture notes in computer science (2022), pp. 413-430
Open Access | Times Cited: 23

Procedure-Aware Pretraining for Instructional Video Understanding
Honglu Zhou, Roberto Martín-Martín, Mubbasir Kapadia, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) Vol. 33, pp. 10727-10738
Open Access | Times Cited: 13

Streaming Dense Video Captioning
Xingyi Zhou, Anurag Arnab, Shyamal Buch, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024), pp. 18243-18252
Closed Access | Times Cited: 5

AutoAD III: The Prequel - Back to the Pixels
Tengda Han, Max Bain, Arsha Nagrani, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024), pp. 18164-18174
Closed Access | Times Cited: 5

Distilling Vision-Language Models on Millions of Videos
Yue Zhao, L. Zhao, Xingyi Zhou, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024), pp. 13106-13116
Closed Access | Times Cited: 4

CF-CAD: A Contrastive Fusion Network For 3D Computer-Aided Design Generative Modeling
Xueyang Li, Haotian Chen, Yunzhong Lou, et al.
Lecture notes in computer science (2025), pp. 435-450
Closed Access

Page 1 - Next Page

Scroll to top