OpenAlex Citation Counts

OpenAlex Citations Logo

OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!

If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.

Requested Article:

Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Sebastian Gehrmann, Elizabeth A. Clark, Thibault Sellam
Journal of Artificial Intelligence Research (2023) Vol. 77, pp. 103-166
Open Access | Times Cited: 69

Showing 1-25 of 69 citing articles:

Towards a Unified Multi-Dimensional Evaluator for Text Generation
Ming Zhong, Yang Liu, Da Yin, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022)
Open Access | Times Cited: 57

Prompted Opinion Summarization with GPT-3.5
Adithya Bhaskar, Alex Fabbri, Greg Durrett
Findings of the Association for Computational Linguistics: ACL 2022 (2023)
Open Access | Times Cited: 26

Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Patrick Fernandes, Aman Madaan, Emmy Liu, et al.
Transactions of the Association for Computational Linguistics (2023) Vol. 11, pp. 1643-1668
Open Access | Times Cited: 24

Benchmarking the Hallucination Tendency of Google Gemini and Moonshot Kimi
Ruoxi Shan, Qiang Ming, Guang Hong, et al.
(2024)
Open Access | Times Cited: 9

EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria
Tae Soo Kim, Yoonjoo Lee, Jamin Shin, et al.
(2024), pp. 1-21
Open Access | Times Cited: 7

Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Nouha Dziri, Hannah Rashkin, Tal Linzen, et al.
Transactions of the Association for Computational Linguistics (2022) Vol. 10, pp. 1066-1083
Open Access | Times Cited: 30

LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
Kalpesh Krishna, Erin Bransom, Bailey Kuehl, et al.
(2023)
Open Access | Times Cited: 18

From text to treatment: the crucial role of validation for generative large language models in health care
Anne de Hond, Tuur Leeuwenberg, Richard Bartels, et al.
The Lancet Digital Health (2024) Vol. 6, Iss. 7, pp. e441-e443
Open Access | Times Cited: 5

Toward cultural interpretability: A linguistic anthropological framework for describing and evaluating large language models
Gregory R. Jones, Shai Satran, Arvind Satyanarayan
Big Data & Society (2025) Vol. 12, Iss. 1
Open Access

A Critical Evaluation of Evaluations for Long-form Question Answering
Fangyuan Xu, Yixiao Song, Mohit Iyyer, et al.
(2023), pp. 3225-3245
Open Access | Times Cited: 12

MCRanker: Generating Diverse Criteria On-the-Fly to Improve Pointwise LLM Rankers
Fang Guo, Wenyu Li, Honglei Zhuang, et al.
(2025), pp. 944-953
Closed Access

Human-Centered Evaluation and Auditing of Language Models
Ziang Xiao, Wesley Hanwen Deng, Michelle S. Lam, et al.
(2024), pp. 1-6
Open Access | Times Cited: 3

SQuALITY: Building a Long-Document Summarization Dataset the Hard Way
Alex Wang, Richard Yuanzhe Pang, Angelica Chen, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022), pp. 1139-1156
Open Access | Times Cited: 17

Evaluating factual accuracy in complex data-to-text
Craig Thomson, Ehud Reiter, Barkavi Sundararajan
Computer Speech & Language (2023) Vol. 80, pp. 101482-101482
Open Access | Times Cited: 9

Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey)
Krishnaram Kenthapadi, Mehrnoosh Sameki, Ankur Taly
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2024), pp. 6523-6533
Open Access | Times Cited: 3

SafeText: A Benchmark for Exploring Physical Safety in Language Models
Sharon Levy, Emily Allaway, Melanie Subbiah, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022), pp. 2407-2421
Open Access | Times Cited: 14

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, et al.
(2022), pp. 266-281
Open Access | Times Cited: 14

Dialect-robust Evaluation of Generated Text
Jiao Sun, Thibault Sellam, Elizabeth A. Clark, et al.
(2023), pp. 6010-6028
Open Access | Times Cited: 7

Common Flaws in Running Human Evaluation Experiments in NLP
Craig Thomson, Ehud Reiter, Anja Belz
Computational Linguistics (2024) Vol. 50, Iss. 2, pp. 795-805
Open Access | Times Cited: 2

Automatic Histograms: Leveraging Language Models for Text Dataset Exploration
Emily Reif, Crystal Qian, James Wexler, et al.
(2024), pp. 1-9
Open Access | Times Cited: 2

Page 1 - Next Page

Scroll to top