The Transformer Cookbook
Published in arXiv, 2025
We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer’s parameters. This work addresses the steep learning curve of such endeavors, a problem exacerbated by a fragmented literature where key results are scattered across numerous papers. In particular, we synthesize this disparate body of findings into a curated set of recipes that demonstrate how to implement everything from basic arithmetic in feed-forward layers to complex data routing via self-attention. Our mise en place of formulations is for both newcomers seeking an accessible entry point and experts in need of a systematic reference. This unified presentation of transformer constructions provides a foundation for future work spanning theoretical research in computational complexity to empirical investigations in architecture design and interpretability.
Citation BibTeX:
@article{yang2025transformercookbook,
title={The Transformer Cookbook},
author={Andy Yang and Christopher Watson and Anton Xue and Satwik Bhattamishra and Jose Llarena and William Merrill and Emile Dos Santos Ferreira and Anej Svete and David Chiang},
year={2025},
eprint={2510.00368},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.00368},
journal={arXiv preprint arXiv:2510.00368},
}
