Transformers Can Represent n-gram Language Models

Date: