Building the Parser — Recursive Descent and the Shape of Meaning
Every grammar rule becomes a function. Every function calls the ones below it. That's the entire trick of recursive descent — and it's enough to parse any programming language.
Every grammar rule becomes a function. Every function calls the ones below it. That's the entire trick of recursive descent parsing.
I could dress it up. I could start with history — Niklaus Wirth's original formulation, the distinction between LL and LR parsers, the decades-long war between hand-written and generated parsers. But the core idea fits in one sentence, and once you see it, you can't unsee it.
As we built in the lexer chapter, a scanner turns raw text into tokens. A flat list: Var, Identifier("x"), Equal, Number(10), Plus, Number(20), Semicolon. The lexer doesn't know that 10 + 20 is an addition. It doesn't know that var x = ... is a declaration. It just produces tokens and moves on.
The parser's job is to give that flat list shape. To discover that 10 + 20 is a binary expression with + as the operator and two numbers as operands. That var x = 10 + 20; is a variable declaration where the initializer is that binary expression. The shape is a tree — an abstract syntax tree — and building it is what today is about.
The tree
Before we write the parser, we need to define what it produces. Each node in the AST represents a grammatical construct:
public abstract record Expr
{
public record Binary(Expr Left, Token Op, Expr Right) : Expr;
public record Unary(Token Op, Expr Operand) : Expr;
public record Literal(object? Value) : Expr;
public record Grouping(Expr Expression) : Expr;
public record Variable(Token Name) : Expr;
public record Assign(Token Name, Expr Value) : Expr;
public record Logical(Expr Left, Token Op, Expr Right) : Expr;
}
public abstract record Stmt
{
public record ExpressionStmt(Expr Expression) : Stmt;
public record PrintStmt(Expr Expression) : Stmt;
public record VarDecl(Token Name, Expr? Initializer) : Stmt;
public record Block(List<Stmt> Statements) : Stmt;
public record IfStmt(Expr Condition, Stmt ThenBranch, Stmt? ElseBranch) : Stmt;
public record WhileStmt(Expr Condition, Stmt Body) : Stmt;
}C# records are ideal here. Each AST node is immutable, carries exactly the data it needs, and gets structural equality for free. The Expr/Stmt split mirrors the grammar: expressions produce values, statements produce effects.
Notice there's no Add or Subtract node. A Binary node with a + token handles addition. A Binary node with a * token handles multiplication. The operator is data, not structure. This keeps the tree small — seven expression types and six statement types cover our entire language.
The insight from music
There's a parallel in music notation that makes operator precedence click.
When a musician reads a score, they don't process notes one at a time left to right. They parse in hierarchical groups. Individual notes form beats. Beats form measures. Measures form phrases. Phrases form sections. Each level of grouping has its own rules, and the higher levels contain the lower ones.
A measure of 4/4 time with eighth notes, quarter notes, and a half note isn't a flat sequence — it's a tree. The half note occupies beats 3 and 4. The quarter note occupies beat 2. The two eighth notes share beat 1. The rhythmic hierarchy determines how the notes group, just like operator precedence determines how expressions group.
2 + 3 * 4 isn't (2 + 3) * 4. It's 2 + (3 * 4). Multiplication "binds tighter" — it sits deeper in the tree, closer to the leaves. The same way a sixteenth note subdivision sits inside its parent beat, inside its parent measure.
Parsing is hierarchical grouping. The grammar defines the hierarchy. Recursive descent walks it from the top.
In recursive descent, each level of precedence is a function. The function for addition calls the function for multiplication (which binds tighter). The function for multiplication calls the function for unary operators (tighter still). Unary calls primary (literals, variables, parenthesized expressions — the leaves). Each function is a level in the rhythmic hierarchy, and the recursive calls descend through the levels until they hit the bottom.
The parser skeleton
public class Parser
{
private readonly List<Token> _tokens;
private int _current = 0;
public Parser(List<Token> tokens) => _tokens = tokens;
public List<Stmt> Parse()
{
var statements = new List<Stmt>();
while (!IsAtEnd())
statements.Add(Declaration());
return statements;
}
private Token Peek() => _tokens[_current];
private Token Previous() => _tokens[_current - 1];
private bool IsAtEnd() => Peek().Type == TokenType.Eof;
private Token Advance()
{
if (!IsAtEnd()) _current++;
return Previous();
}
private bool Check(TokenType type)
=> !IsAtEnd() && Peek().Type == type;
private bool Match(params TokenType[] types)
{
foreach (var type in types)
{
if (Check(type))
{
Advance();
return true;
}
}
return false;
}
}Same two-pointer pattern as the lexer. _current is the read head, Peek() looks at the current token without consuming, Advance() consumes and returns. The parser is a cursor walking through a flat list — the recursion comes from how the parsing methods call each other.
Statements: the top of the hierarchy
Parsing starts at the top. Declaration() handles variable declarations and falls through to Statement() for everything else:
private Stmt Declaration()
{
if (Match(TokenType.Var)) return VarDeclaration();
return Statement();
}
private Stmt VarDeclaration()
{
var name = Consume(TokenType.Identifier, "Expected variable name.");
Expr? initializer = null;
if (Match(TokenType.Equal))
initializer = Expression();
Consume(TokenType.Semicolon, "Expected ';' after variable declaration.");
return new Stmt.VarDecl(name, initializer);
}
private Stmt Statement()
{
if (Match(TokenType.Print)) return PrintStatement();
if (Match(TokenType.LeftBrace)) return BlockStatement();
if (Match(TokenType.If)) return IfStatement();
if (Match(TokenType.While)) return WhileStatement();
return ExpressionStatement();
}Each method follows the same rhythm: check what token we're looking at, consume the tokens that match our grammatical rule, return the corresponding AST node. If nothing matches at this level, fall through to the next level down.
Consume is Match with teeth — it advances if the expected token is there, and throws if it isn't:
private Token Consume(TokenType type, string message)
{
if (Check(type)) return Advance();
throw new ParseException(Peek(), message);
}Control flow
if and while follow their grammar rules directly:
private Stmt IfStatement()
{
Consume(TokenType.LeftParen, "Expected '(' after 'if'.");
var condition = Expression();
Consume(TokenType.RightParen, "Expected ')' after if condition.");
var thenBranch = Statement();
Stmt? elseBranch = null;
if (Match(TokenType.Else))
elseBranch = Statement();
return new Stmt.IfStmt(condition, thenBranch, elseBranch);
}
private Stmt WhileStatement()
{
Consume(TokenType.LeftParen, "Expected '(' after 'while'.");
var condition = Expression();
Consume(TokenType.RightParen, "Expected ')' after while condition.");
var body = Statement();
return new Stmt.WhileStmt(condition, body);
}
private Stmt BlockStatement()
{
var statements = new List<Stmt>();
while (!Check(TokenType.RightBrace) && !IsAtEnd())
statements.Add(Declaration());
Consume(TokenType.RightBrace, "Expected '}' after block.");
return new Stmt.Block(statements);
}The else in IfStatement is optional — we only consume it if it's there. This means else always binds to the nearest if, which is the behavior you'd expect. The grammar handles the "dangling else" problem for free.
The precedence ladder
This is the heart of the parser. Each level of precedence is a method that handles operators at that level and delegates to the next tighter level:
private Expr Expression() => Assignment();
private Expr Assignment()
{
var expr = Or();
if (Match(TokenType.Equal))
{
var value = Assignment(); // right-associative
if (expr is Expr.Variable v)
return new Expr.Assign(v.Name, value);
throw new ParseException(Previous(), "Invalid assignment target.");
}
return expr;
}
private Expr Or()
{
var expr = And();
while (Match(TokenType.Or))
expr = new Expr.Logical(expr, Previous(), And());
return expr;
}
private Expr And()
{
var expr = Equality();
while (Match(TokenType.And))
expr = new Expr.Logical(expr, Previous(), Equality());
return expr;
}
private Expr Equality()
{
var expr = Comparison();
while (Match(TokenType.BangEqual, TokenType.EqualEqual))
expr = new Expr.Binary(expr, Previous(), Comparison());
return expr;
}
private Expr Comparison()
{
var expr = Term();
while (Match(TokenType.Greater, TokenType.GreaterEqual,
TokenType.Less, TokenType.LessEqual))
expr = new Expr.Binary(expr, Previous(), Term());
return expr;
}
private Expr Term()
{
var expr = Factor();
while (Match(TokenType.Minus, TokenType.Plus))
expr = new Expr.Binary(expr, Previous(), Factor());
return expr;
}
private Expr Factor()
{
var expr = Unary();
while (Match(TokenType.Slash, TokenType.Star))
expr = new Expr.Binary(expr, Previous(), Unary());
return expr;
}Read it from bottom to top and the precedence emerges. Factor (multiplication, division) calls Unary. Term (addition, subtraction) calls Factor. Comparison calls Term. Each level wraps its result in a Binary node, with the operator it consumed and the tighter-binding expression as the right operand.
The while loop in each method handles left-associativity. 1 - 2 - 3 parses as (1 - 2) - 3, not 1 - (2 - 3), because the loop builds the tree leftward — each iteration wraps the previous result as the new left child.
Assignment is the exception: it's right-associative (the recursive call is to Assignment() itself, not the next level down). a = b = 5 parses as a = (b = 5).
The bottom of the ladder
Unary operators and primary expressions are the leaves:
private Expr Unary()
{
if (Match(TokenType.Bang, TokenType.Minus))
return new Expr.Unary(Previous(), Unary()); // recursive!
return Primary();
}
private Expr Primary()
{
if (Match(TokenType.False)) return new Expr.Literal(false);
if (Match(TokenType.True)) return new Expr.Literal(true);
if (Match(TokenType.Nil)) return new Expr.Literal(null);
if (Match(TokenType.Number, TokenType.String))
return new Expr.Literal(Previous().Literal);
if (Match(TokenType.Identifier))
return new Expr.Variable(Previous());
if (Match(TokenType.LeftParen))
{
var expr = Expression();
Consume(TokenType.RightParen, "Expected ')' after expression.");
return new Expr.Grouping(expr);
}
throw new ParseException(Peek(), "Expected expression.");
}The Unary method is recursive — --x is a valid expression (negate the negation of x). Each ! or - wraps another unary, until we hit something that isn't a unary operator and fall through to Primary.
Primary is where everything bottoms out. Literals, variables, and parenthesized expressions. The parenthesized case is beautiful: consume the (, recursively parse a full expression (jumping all the way back to the top of the precedence ladder), then consume the ). This is how grouping overrides precedence — (1 + 2) * 3 works because the parenthesized sub-expression is parsed as a complete expression before * ever sees it.
Running it
Let's connect the lexer to the parser and see the tree:
var source = "var x = 10 + 20 * 3;";
var tokens = new Scanner(source).ScanTokens();
var statements = new Parser(tokens).Parse();
// Simple tree printer
void PrintAst(Expr expr, int indent = 0)
{
var pad = new string(' ', indent * 2);
switch (expr)
{
case Expr.Binary b:
Console.WriteLine($"{pad}({b.Op.Lexeme}");
PrintAst(b.Left, indent + 1);
PrintAst(b.Right, indent + 1);
Console.WriteLine($"{pad})");
break;
case Expr.Literal l:
Console.WriteLine($"{pad}{l.Value}");
break;
case Expr.Variable v:
Console.WriteLine($"{pad}{v.Name.Lexeme}");
break;
}
}
foreach (var stmt in statements)
{
if (stmt is Stmt.VarDecl decl)
{
Console.WriteLine($"var {decl.Name.Lexeme} =");
PrintAst(decl.Initializer!, 1);
}
}Output:
var x =
(+
10
(*
20
3
)
)The tree's shape encodes what the flat token list couldn't: * binds before +. The 20 * 3 multiplication is a subtree inside the addition. No parentheses needed — the structure itself carries the precedence.
What we built and what it costs
The full parser is about 180 lines of C#. No parser generator, no lookup tables, no grammar file. Just methods calling methods, each one handling one level of the grammar.
The trade-off is real, though. A recursive descent parser can only handle LL(1) grammars — grammars where one token of lookahead is enough to decide which rule to apply. Our if/else works because else is always unambiguous in context. But languages with more complex syntax (C++'s angle brackets for templates, for instance) hit cases where one token isn't enough. Those languages need GLR parsers, parser combinators, or hand-written hacks that break the clean recursive descent model.
For our language, LL(1) is more than enough. And there's something satisfying about a parser you can read top-to-bottom and understand completely.
What's next
We have tokens. We have a tree. In the interpreter chapter, we make it run.
The interpreter will walk the AST, evaluate expressions, execute statements, and manage an environment of variable bindings. var x = 10 + 20 * 3; won't just be a tree — it'll produce 70 and bind it to x. Our language will compute.
Without structure, there's no meaning. The parser gave us structure. The interpreter gives us life.




