
OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!
If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.
Requested Article:
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Sebastian Gehrmann, Elizabeth A. Clark, Thibault Sellam
Journal of Artificial Intelligence Research (2023) Vol. 77, pp. 103-166
Open Access | Times Cited: 69
Sebastian Gehrmann, Elizabeth A. Clark, Thibault Sellam
Journal of Artificial Intelligence Research (2023) Vol. 77, pp. 103-166
Open Access | Times Cited: 69
Showing 1-25 of 69 citing articles:
Towards a Unified Multi-Dimensional Evaluator for Text Generation
Ming Zhong, Yang Liu, Da Yin, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022)
Open Access | Times Cited: 57
Ming Zhong, Yang Liu, Da Yin, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022)
Open Access | Times Cited: 57
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation
Yixin Liu, Alex Fabbri, Pengfei Liu, et al.
(2023)
Open Access | Times Cited: 37
Yixin Liu, Alex Fabbri, Pengfei Liu, et al.
(2023)
Open Access | Times Cited: 37
Large Language Models Effectively Leverage Document-level Context for Literary Translation, but Critical Errors Persist
Marzena Karpinska, Mohit Iyyer
(2023)
Open Access | Times Cited: 32
Marzena Karpinska, Mohit Iyyer
(2023)
Open Access | Times Cited: 32
Prompted Opinion Summarization with GPT-3.5
Adithya Bhaskar, Alex Fabbri, Greg Durrett
Findings of the Association for Computational Linguistics: ACL 2022 (2023)
Open Access | Times Cited: 26
Adithya Bhaskar, Alex Fabbri, Greg Durrett
Findings of the Association for Computational Linguistics: ACL 2022 (2023)
Open Access | Times Cited: 26
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Patrick Fernandes, Aman Madaan, Emmy Liu, et al.
Transactions of the Association for Computational Linguistics (2023) Vol. 11, pp. 1643-1668
Open Access | Times Cited: 24
Patrick Fernandes, Aman Madaan, Emmy Liu, et al.
Transactions of the Association for Computational Linguistics (2023) Vol. 11, pp. 1643-1668
Open Access | Times Cited: 24
Benchmarking the Hallucination Tendency of Google Gemini and Moonshot Kimi
Ruoxi Shan, Qiang Ming, Guang Hong, et al.
(2024)
Open Access | Times Cited: 9
Ruoxi Shan, Qiang Ming, Guang Hong, et al.
(2024)
Open Access | Times Cited: 9
EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria
Tae Soo Kim, Yoonjoo Lee, Jamin Shin, et al.
(2024), pp. 1-21
Open Access | Times Cited: 7
Tae Soo Kim, Yoonjoo Lee, Jamin Shin, et al.
(2024), pp. 1-21
Open Access | Times Cited: 7
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Nouha Dziri, Hannah Rashkin, Tal Linzen, et al.
Transactions of the Association for Computational Linguistics (2022) Vol. 10, pp. 1066-1083
Open Access | Times Cited: 30
Nouha Dziri, Hannah Rashkin, Tal Linzen, et al.
Transactions of the Association for Computational Linguistics (2022) Vol. 10, pp. 1066-1083
Open Access | Times Cited: 30
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
Kalpesh Krishna, Erin Bransom, Bailey Kuehl, et al.
(2023)
Open Access | Times Cited: 18
Kalpesh Krishna, Erin Bransom, Bailey Kuehl, et al.
(2023)
Open Access | Times Cited: 18
A Scoping Study of Evaluation Practices for Responsible AI Tools: Steps Towards Effectiveness Evaluations
Glen Berman, Nitesh Goyal, Michael Madaio
(2024), pp. 1-24
Open Access | Times Cited: 5
Glen Berman, Nitesh Goyal, Michael Madaio
(2024), pp. 1-24
Open Access | Times Cited: 5
Intelligence as Agency: Evaluating the Capacity of Generative AI to Empower or Constrain Human Action
Arvind Satyanarayan, Graham M. Jones
(2024)
Open Access | Times Cited: 5
Arvind Satyanarayan, Graham M. Jones
(2024)
Open Access | Times Cited: 5
From text to treatment: the crucial role of validation for generative large language models in health care
Anne de Hond, Tuur Leeuwenberg, Richard Bartels, et al.
The Lancet Digital Health (2024) Vol. 6, Iss. 7, pp. e441-e443
Open Access | Times Cited: 5
Anne de Hond, Tuur Leeuwenberg, Richard Bartels, et al.
The Lancet Digital Health (2024) Vol. 6, Iss. 7, pp. e441-e443
Open Access | Times Cited: 5
Toward cultural interpretability: A linguistic anthropological framework for describing and evaluating large language models
Gregory R. Jones, Shai Satran, Arvind Satyanarayan
Big Data & Society (2025) Vol. 12, Iss. 1
Open Access
Gregory R. Jones, Shai Satran, Arvind Satyanarayan
Big Data & Society (2025) Vol. 12, Iss. 1
Open Access
Evaluation Workflows for Large Language Models (LLMs) that Integrate Domain Expertise for Complex Knowledge Tasks
Annalisa Szymanski
(2025), pp. 215-217
Closed Access
Annalisa Szymanski
(2025), pp. 215-217
Closed Access
A Critical Evaluation of Evaluations for Long-form Question Answering
Fangyuan Xu, Yixiao Song, Mohit Iyyer, et al.
(2023), pp. 3225-3245
Open Access | Times Cited: 12
Fangyuan Xu, Yixiao Song, Mohit Iyyer, et al.
(2023), pp. 3225-3245
Open Access | Times Cited: 12
MCRanker: Generating Diverse Criteria On-the-Fly to Improve Pointwise LLM Rankers
Fang Guo, Wenyu Li, Honglei Zhuang, et al.
(2025), pp. 944-953
Closed Access
Fang Guo, Wenyu Li, Honglei Zhuang, et al.
(2025), pp. 944-953
Closed Access
Human-Centered Evaluation and Auditing of Language Models
Ziang Xiao, Wesley Hanwen Deng, Michelle S. Lam, et al.
(2024), pp. 1-6
Open Access | Times Cited: 3
Ziang Xiao, Wesley Hanwen Deng, Michelle S. Lam, et al.
(2024), pp. 1-6
Open Access | Times Cited: 3
SQuALITY: Building a Long-Document Summarization Dataset the Hard Way
Alex Wang, Richard Yuanzhe Pang, Angelica Chen, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022), pp. 1139-1156
Open Access | Times Cited: 17
Alex Wang, Richard Yuanzhe Pang, Angelica Chen, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022), pp. 1139-1156
Open Access | Times Cited: 17
Evaluating factual accuracy in complex data-to-text
Craig Thomson, Ehud Reiter, Barkavi Sundararajan
Computer Speech & Language (2023) Vol. 80, pp. 101482-101482
Open Access | Times Cited: 9
Craig Thomson, Ehud Reiter, Barkavi Sundararajan
Computer Speech & Language (2023) Vol. 80, pp. 101482-101482
Open Access | Times Cited: 9
Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey)
Krishnaram Kenthapadi, Mehrnoosh Sameki, Ankur Taly
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2024), pp. 6523-6533
Open Access | Times Cited: 3
Krishnaram Kenthapadi, Mehrnoosh Sameki, Ankur Taly
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2024), pp. 6523-6533
Open Access | Times Cited: 3
SafeText: A Benchmark for Exploring Physical Safety in Language Models
Sharon Levy, Emily Allaway, Melanie Subbiah, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022), pp. 2407-2421
Open Access | Times Cited: 14
Sharon Levy, Emily Allaway, Melanie Subbiah, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022), pp. 2407-2421
Open Access | Times Cited: 14
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, et al.
(2022), pp. 266-281
Open Access | Times Cited: 14
Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, et al.
(2022), pp. 266-281
Open Access | Times Cited: 14
Dialect-robust Evaluation of Generated Text
Jiao Sun, Thibault Sellam, Elizabeth A. Clark, et al.
(2023), pp. 6010-6028
Open Access | Times Cited: 7
Jiao Sun, Thibault Sellam, Elizabeth A. Clark, et al.
(2023), pp. 6010-6028
Open Access | Times Cited: 7
Common Flaws in Running Human Evaluation Experiments in NLP
Craig Thomson, Ehud Reiter, Anja Belz
Computational Linguistics (2024) Vol. 50, Iss. 2, pp. 795-805
Open Access | Times Cited: 2
Craig Thomson, Ehud Reiter, Anja Belz
Computational Linguistics (2024) Vol. 50, Iss. 2, pp. 795-805
Open Access | Times Cited: 2
Automatic Histograms: Leveraging Language Models for Text Dataset Exploration
Emily Reif, Crystal Qian, James Wexler, et al.
(2024), pp. 1-9
Open Access | Times Cited: 2
Emily Reif, Crystal Qian, James Wexler, et al.
(2024), pp. 1-9
Open Access | Times Cited: 2