Optimizing Large‑Scale Language Model Inference via Firmware‑Level and Architectural Attention Sparsity. (2025). International Journal of Modern Medicine, 4(10), 14-20. https://intjmm.com/index.php/ijmm/article/view/78