Posts by Collection



A Geometric Notion of Causal Probing

Published in arXiv, 2023

We propose a formal definition of intrinsic information about a concept (feature) in a subspace of a language model’s representation space. We propose a counterfactual approach that avoids the failure mode of spurious correlations by treating components in the subspace and its orthogonal complement independently.

Formal Aspects of Language Modeling

Published in arXiv, 2023

Large language models have become one of the most commonly deployed NLP inventions. In the past half-decade, their integration into core natural language processing tools has dramatically increased the performance of such tools, and they have entered the public discourse surrounding artificial intelligence. Consequently, it is important for both developers and researchers alike to understand the mathematical foundations of large language models, as well as how to implement them. These notes are the accompaniment to the theoretical portion of the ETH Zürich course on large language models, covering what constitutes a language model from a formal, theoretical perspective.

On the Representational Capacity of Recurrent Neural Language Models

Published in EMNLP 2023, 2023

This work investigates the computational expressivity of language models based on recurrent neural networks. We extend the Turing completeness result by Siegelmann and Sontag (1992) to the probabilistic case, showing how a rationally weighted RLM with unbounded computation time can simulate any probabilistic Turing machine (PTM).

Recurrent Neural Language Models as Probabilistic Finite-state Automata

Published in EMNLP 2023, 2023

We study what classes of such probability distributions RNN LMs can represent and show that simple RNNs are equivalent to a subclass of probabilistic finite-state automata, and can thus model a strict subset of probability distributions expressible by finite-state models.