
OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!
If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.
Requested Article:
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Yixin Song, Zeyu Mi, Haotong Xie, et al.
(2024), pp. 590-606
Open Access | Times Cited: 8
Yixin Song, Zeyu Mi, Haotong Xie, et al.
(2024), pp. 590-606
Open Access | Times Cited: 8
Showing 8 citing articles:
MoE-L ightning : High-Throughput MoE Inference on Memory-constrained GPUs
Shiyi Cao, Shu Liu, Tyler Griggs, et al.
(2025), pp. 715-730
Closed Access
Shiyi Cao, Shu Liu, Tyler Griggs, et al.
(2025), pp. 715-730
Closed Access
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo, Feng Cheng, Zhixu Du, et al.
IEEE Circuits and Systems Magazine (2025) Vol. 25, Iss. 1, pp. 35-57
Open Access
Cong Guo, Feng Cheng, Zhixu Du, et al.
IEEE Circuits and Systems Magazine (2025) Vol. 25, Iss. 1, pp. 35-57
Open Access
Minhui Xie, Shaoxun Zeng, Hao Guo, et al.
(2025), pp. 509-523
Closed Access
Accelerating Mixture-of-Experts language model inference via plug-and-play lookahead gate on a single GPU
Jie Ou, Yueming Chen, Buyao Xiong, et al.
Computer Standards & Interfaces (2025), pp. 103996-103996
Closed Access
Jie Ou, Yueming Chen, Buyao Xiong, et al.
Computer Standards & Interfaces (2025), pp. 103996-103996
Closed Access
Achieving Peak Performance for Large Language Models: A Systematic Review
Zhyar Rzgar K Rostam, Sándor Szénási, Gábor Kertész
IEEE Access (2024) Vol. 12, pp. 96017-96050
Open Access | Times Cited: 2
Zhyar Rzgar K Rostam, Sándor Szénási, Gábor Kertész
IEEE Access (2024) Vol. 12, pp. 96017-96050
Open Access | Times Cited: 2
A review of AI edge devices and lightweight CNN deployment
Kailai Sun, Xinwei Wang, Xi Miao, et al.
Neurocomputing (2024) Vol. 614, pp. 128791-128791
Closed Access | Times Cited: 2
Kailai Sun, Xinwei Wang, Xi Miao, et al.
Neurocomputing (2024) Vol. 614, pp. 128791-128791
Closed Access | Times Cited: 2
Governing Open Vocabulary Data Leaks Using an Edge LLM through Programming by Example
Qiyu Li, J. Wen, Haojian Jin
Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (2024) Vol. 8, Iss. 4, pp. 1-31
Open Access
Qiyu Li, J. Wen, Haojian Jin
Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (2024) Vol. 8, Iss. 4, pp. 1-31
Open Access
Cambricon-LLM: A Chiplet-Based Hybrid Architecture for On-Device Inference of 70B LLM
Zhongkai Yu, Shengwen Liang, Tianyun Ma, et al.
(2024), pp. 1474-1488
Closed Access
Zhongkai Yu, Shengwen Liang, Tianyun Ma, et al.
(2024), pp. 1474-1488
Closed Access