--- title: You could have invented Parsec date: August 17, 2016 01:29 AM synopsys: 2 --- As most of us should know, [Parsec](https://hackage.haskell.org/package/parsec) is a relatively fast, lightweight monadic parser combinator library. In this post I aim to show that monadic parsing is not only useful, but a simple concept to grok. We shall implement a simple parsing library with instances of common typeclasses of the domain, such as Monad, Functor and Applicative, and some example combinators to show how powerful this abstraction really is. --- Getting the buzzwords out of the way, being _monadic_ just means that Parsers instances of `Monad`{.haskell}. Recall the Monad typeclass, as defined in `Control.Monad`{.haskell}, ```haskell class Applicative m => Monad m where return :: a -> m a (>>=) :: m a -> (a -> m b) -> m b {- Some fields omitted -} ``` How can we fit a parser in the above constraints? To answer that, we must first define what a parser _is_. A naïve implementation of the `Parser`{.haskell} type would be a simple type synonym. ```haskell type Parser a = String -> (a, String) ``` This just defines that a parser is a function from a string to a result pair with the parsed value and the resulting stream. This would mean that parsers are just state transformers, and if we define it as a synonym for the existing mtl `State`{.haskell} monad, we get the Monad, Functor and Applicative instances for free! But alas, this will not do. Apart from modeling the state transformation that a parser expresses, we need a way to represent failure. You already know that `Maybe a`{.haskell} expresses failure, so we could try something like this: ```haskell type Parser a = String -> Maybe (a, String) ``` But, as you might have guessed, this is not the optimal representation either: `Maybe`{.haskell} _does_ model failure, but in a way that is lacking. It can only express that a computation was successful or that it failed, not why it failed. We need a way to fail with an error message. That is, the `Either`{.haskell} monad. ```haskell type Parser e a = String -> Either e (a, String) ``` Notice how we have the `Maybe`{.haskell} and `Either`{.haskell} outside the tuple, so that when an error happens we stop parsing immediately. We could instead have them inside the tuple for better error reporting, but that's out of scope for a simple blag post. This is pretty close to the optimal representation, but there are still some warts things to address: `String`{.haskell} is a bad representation for textual data, so ideally you'd have your own `Stream`{.haskell} class that has instances for things such as `Text`{.haskell}, `ByteString`{.haskell} and `String`{.haskell}. One issue, however, is more glaring: You _can't_ define typeclass instances for type synonyms! The fix, however, is simple: make `Parser`{.haskell} a newtype. ```haskell newtype Parser a = Parser { parse :: String -> Either String (a, String) } ``` --- Now that that's out of the way, we can actually get around to instancing some typeclasses. Since the AMP landed in GHC 7.10 (base 4.8), the hierarchy of the Monad typeclass is as follows: ```haskell class Functor (m :: * -> *) where class Functor m => Applicative m where class Applicative m => Monad m where ``` That is, we need to implement Functor and Applicative before we can actually implement Monad. We shall also add an `Alternative`{.haskell} instance for expressing choice. First we need some utility functions, such as `runParser`{.haskell}, that runs a parser from a given stream. ```haskell runParser :: Parser a -> String -> Either String a runParser (Parser p) s = fst <$> p s ``` We could also use function for modifying error messages. For convenience, we make this an infix operator, ``{.haskell}. ```haskell () :: Parser a -> String -> Parser a (Parser p) err = Parser go where go s = case p s of Left _ -> Left err Right x -> return x infixl 2 ``` `Functor` ======= Remember that Functor models something that can be mapped over (technically, `fmap`-ed over). We need to define semantics for `fmap` on Parsers. A sane implementation would only map over the result, and keeping errors the same. This is a homomorphism, and follows the Functor laws. However, since we can't modify a function in place, we need to return a new parser that applies the given function _after_ the parsing is done. ```haskell instance Functor Parser where fn `fmap` (Parser p) = Parser go where go st = case p st of Left e -> Left e Right (res, str') -> Right (fn res, str') ``` ### `Applicative` While Functor is something that can be mapped over, Applicative defines semantics for applying a function inside a context to something inside a context. The Applicative class is defined as ```haskell class Functor m => Applicative m where pure :: a -> m a (<*>) :: f (a -> b) -> f a -> f b ``` Notice how the `pure`{.haskell} and the `return`{.haskell} methods are equivalent, so we only have to implement one of them. Let's go over this by parts. ```haskell instance Applicative Parser where pure x = Parser $ \str -> Right (x, str) ``` The `pure`{.haskell} function leaves the stream untouched, and sets the result to the given value. The `(<*>)`{.haskell} function needs to to evaluate and parse the left-hand side to get the in-context function to apply it. ```haskell (Parser p) <*> (Parser p') = Parser go where go st = case p st of Left e -> Left e Right (fn, st') -> case p' st' of Left e' -> Left e' Right (v, st'') -> Right (fn v, st'') ``` ### `Alternative` Since the only superclass of Alternative is Applicative, we can instance it without a Monad instance defined. We do, however, need an import of `Control.Applicative`{.haskell}. ```haskell instance Alternative Parser where empty = Parser $ \_ -> Left "empty parser" (Parser p) <|> (Parser p') = Parser go where go st = case p st of Left _ -> p' st Right x -> Right x ``` ### `Monad` After almost a thousand words, one would be excused for forgetting we're implementing a _monadic_ parser combinator library. That means, we need an instance of the `Monad`{.haskell} typeclass. Since we have an instance of Applicative, we don't need an implementation of return: it is equivalent to `pure`, save for the class constraint. ```haskell instance Monad Parser where return = pure ``` The `(>>=)`{.haskell} implementation, however, needs a bit more thought. Its type signature is ```haskell (>>=) :: m a -> (a -> m b) -> m b ``` That means we need to extract a value from the Parser monad and apply it to the given function, producing a new Parser. ```haskell (Parser p) >>= f = Parser go where go s = case p s of Left e -> Left e Right (x, s') -> parse (f x) s' ``` While some people think that the `fail`{.haskell} is not supposed to be in the Monad typeclass, we do need an implementation for when pattern matching fails. It is also convenient to use `fail`{.haskell} for the parsing action that returns an error with a given message. ```haskell fail m = Parser $ \_ -> Left m ``` --- We now have a `Parser`{.haskell} monad, that expresses a parsing action. But, a parser library is no good when actual parsing is made harder than easier. To make parsing easier, we define _combinators_, functions that modify a parser in one way or another. But first, we should get some parsing functions. ### any, satisfying `any` is the parsing action that pops a character off the stream and returns that. It does no further parsing at all. ```haskell any :: Parser Char any = Parser go where go [] = Left "any: end of file" go (x:xs) = Right (x,xs) ``` `satisfying` tests the parsed value against a function of type `Char -> Bool`{.haskell} before deciding if it's successful or a failure. ```haskell satisfy :: (Char -> Bool) -> Parser Char satisfy f = d x <- any if f x then return x else fail "satisfy: does not satisfy" ``` We use the `fail`{.haskell} function defined above to represent failure. ### `oneOf`, `char` These functions are defined in terms of `satisfying`, and parse individual characters. ```haskell char :: Char -> Parser Char char c = satisfy (c ==) "char: expected literal " ++ [c] oneOf :: String -> Parser Char oneOf s = satisfy (`elem` s) "oneOf: expected one of '" ++ s ++ "'" ``` ### `string` This parser parses a sequence of characters, in order. ```haskell string :: String -> Parser String string [] = return [] string (x:xs) = do char x string xs return $ x:xs ``` --- And that's it! In a few hundred lines, we have built a working parser combinator library with Functor, Applicative, Alternative, and Monad instances. While it's not as complex or featureful as Parsec in any way, it is powerful enough to define grammars for simple languages. [A transcription](/static/Parser.hs) ([with syntax highlighting](/static/Parser.hs.html)) of this file is available as runnable Haskell. The transcription also features some extra combinators for use.