Introduction to Parser Combinators

So I have been playing with parser combinators for quite a while now and they felt so interesting that I wanted to write about them.

What is a parser?

Basically a parser is a function which accepts some sort of character input such a string or a character stream and returns some structure as output that contains the parsed result. It is as simple as it sounds, no intricacies here.

Parsers are used everywhere in software in order to interpret character sequences into complex structures. Some simple example are JSON and XML parsers, which take some string as argument and return the corresponding data structure.

Using F#, we can define a simple parser type as follows:

1
2
3
4
5
type ParserResult<'a> =
    | Success of 'a * list<char>
    | Failure

type Parser<'a> = list<char> -> ParserResult<'a>

This means that a Parser will receive a list<char> and will return a ParserResult<'a> which can be either a Failure if nothing could be parsed or a Success otherwise and return both the parsed element and the rest of the list.

Having this definition, we can now build our first parser:

1
2
3
4
5
6
7
/// If the stream starts with c, returns Success, otherwise returns Failure
let CharParser (c: char) : Parser<char> =
    let p stream =
        match stream with
        | x::xs when x = c -> Success(x, xs)
        | _ -> Failure
    in p

which whill check if the given list starts with the specified element and if it does, returns a Success with both the element and the rest of the list. If it doesn’t, it returns a Failure.

Wait… what? Is that a parser?

Yes! and is probably one of the most important. Here is how it works:

1
2
3
4
5
> (CharParser 'a') ['a'; 'b'; 'c']
val it : ParserResult<char> = Success ('a',['b'; 'c'])

> (CharParser 'b') ['a'; 'b'; 'c']
val it : ParserResult<char> = Failure

Simple and stupid by itself, but useful when used as a building block for more complex parsers.

Ok, you showed what a parser is, what are those parser combinators you mention in the title?

Gluing parsers with parser combinators

A parser combinator is a higher order function that operates on one or more parsers in order to create a new parser.

Basically, parser combinators are the glue that allows us to mix small and simple parser in order to create more complex parsers that eventually, will allow us to parse whatever it is we want to parse.

We could, for example, define the following Either parser combinator:

1
2
3
4
5
6
7
/// If p1 fails uses p2, otherwise returns p1's result
let Either (p1: Parser<'a>) (p2: Parser<'a>) : Parser<'a> =
    let p stream =
        match p1 stream with
        | Failure -> p2 stream
        | res -> res
    in p

which builds a new parser that will try to use the first input parser and, if it fails, will use the second.

Imagine now, we would like to build a digit parser. Now that we have defined our Either operator we could define the parser as:

1
2
3
4
5
/// Parses any digit character
let DigitParser : Parser<char> =
    ['0'..'9']
    |> List.map CharParser
    |> List.reduce Either

What if we wanted that parser to return an int instead? We could simply define our second parser combinator:

1
2
3
4
5
6
7
/// Applies the function f to the result of p if it succeeds
let Apply (p: Parser<'a>) (f: 'a -> 'b) : Parser<'b> =
    let q stream =
        match p stream with
        | Success(x, rest) -> Success(f x, rest)
        | Failure -> Failure
    in q

and now we could use it to define a new digit parser:

1
let DigitParserInt = Apply DigitParser (fun c -> (int c) - (int '0'))

But parsing only a single character is not very useful so we could build the following combinator

1
2
3
4
5
6
7
/// Applies p as many times as possible
let rec Many (p: Parser<'a>) : Parser<List<'a>> =
    let q stream =
        match p stream with
        | Failure -> Success([], stream)
        | Success(x, rest) ->  (Apply (Many p) (fun xs -> x::xs)) rest
    in q

which will run the input parser p as many times as possible until it fails and will return a list of the successfully parsed elements.

This new combinator allows us to build the following integer parser:

1
2
3
/// Parses (positive) integer numbers
let IntegerParser : Parser<int> =
    Apply (Many DigitParserInt) (List.reduce (fun x y -> x * 10 + y))

that will parse integer numbers.

Enough

We could go like this forever but fortunately smarter people did it before and made their work freely available. The most famous library is Parsec for Haskell but adaptations have been made to other languages including C++, Java, Python, Javascript among many.

I myself have written a small library in F# for the sake of learning and I used it to build a JSON parser. I think it can be a nice introduction parser combinators since it is not optimized in any way and the ideas can be seen more clearly.

I posted all the code snippets of this post here to make it possible to try them online.

In my next post I will talk about monads in F# and explain how parser combinators is probably one of the better examples to understand what they are.

Comments