|
|
- ---
- title: You could have invented Parsec
- date: August 17, 2016 01:29 AM
- ---
-
- As most of us should know, [Parsec](https://hackage.haskell.org/package/parsec)
- is a relatively fast, lightweight monadic parser combinator library.
-
- In this post I aim to show that monadic parsing is not only useful, but a simple
- concept to grok.
-
- We shall implement a simple parsing library with instances of common typeclasses
- of the domain, such as Monad, Functor and Applicative, and some example
- combinators to show how powerful this abstraction really is.
-
- ---
-
- Getting the buzzwords out of the way, being _monadic_ just means that Parsers
- instances of `Monad`{.haskell}. Recall the Monad typeclass, as defined in
- `Control.Monad`{.haskell},
-
- ```haskell
- class Applicative m => Monad m where
- return :: a -> m a
- (>>=) :: m a -> (a -> m b) -> m b
- {- Some fields omitted -}
- ```
-
- How can we fit a parser in the above constraints? To answer that, we must first
- define what a parser _is_.
-
- A naïve implementation of the `Parser`{.haskell} type would be a simple type
- synonym.
-
- ```haskell
- type Parser a = String -> (a, String)
- ```
-
- This just defines that a parser is a function from a string to a result pair
- with the parsed value and the resulting stream. This would mean that parsers are
- just state transformers, and if we define it as a synonym for the existing mtl
- `State`{.haskell} monad, we get the Monad, Functor and Applicative instances for
- free! But alas, this will not do.
-
- Apart from modeling the state transformation that a parser expresses, we need a
- way to represent failure. You already know that `Maybe a`{.haskell} expresses
- failure, so we could try something like this:
-
- ```haskell
- type Parser a = String -> Maybe (a, String)
- ```
-
- But, as you might have guessed, this is not the optimal representation either:
- `Maybe`{.haskell} _does_ model failure, but in a way that is lacking. It can
- only express that a computation was successful or that it failed, not why it
- failed. We need a way to fail with an error message. That is, the
- `Either`{.haskell} monad.
-
- ```haskell
- type Parser e a = String -> Either e (a, String)
- ```
-
- Notice how we have the `Maybe`{.haskell} and `Either`{.haskell} outside the
- tuple, so that when an error happens we stop parsing immediately. We could
- instead have them inside the tuple for better error reporting, but that's out of
- scope for a simple blag post.
-
- This is pretty close to the optimal representation, but there are still some
- warts things to address: `String`{.haskell} is a bad representation for textual
- data, so ideally you'd have your own `Stream`{.haskell} class that has instances
- for things such as `Text`{.haskell}, `ByteString`{.haskell} and
- `String`{.haskell}.
-
- One issue, however, is more glaring: You _can't_ define typeclass instances for
- type synonyms! The fix, however, is simple: make `Parser`{.haskell} a newtype.
-
- ```haskell
- newtype Parser a
- = Parser { parse :: String -> Either String (a, String) }
- ```
-
- ---
-
- Now that that's out of the way, we can actually get around to instancing some
- typeclasses.
-
- Since the AMP landed in GHC 7.10 (base 4.8), the hierarchy of the Monad
- typeclass is as follows:
-
- ```haskell
- class Functor (m :: * -> *) where
- class Functor m => Applicative m where
- class Applicative m => Monad m where
- ```
-
- That is, we need to implement Functor and Applicative before we can actually
- implement Monad.
-
- We shall also add an `Alternative`{.haskell} instance for expressing choice.
-
- First we need some utility functions, such as `runParser`{.haskell}, that runs a
- parser from a given stream.
-
- ```haskell
- runParser :: Parser a -> String -> Either String a
- runParser (Parser p) s = fst <$> p s
- ```
-
- We could also use function for modifying error messages. For convenience, we
- make this an infix operator, `<?>`{.haskell}.
-
- ```haskell
- (<?>) :: Parser a -> String -> Parser a
- (Parser p) <?> err = Parser go where
- go s = case p s of
- Left _ -> Left err
- Right x -> return x
- infixl 2 <?>
- ```
-
-
- `Functor`
- =======
-
- Remember that Functor models something that can be mapped over (technically,
- `fmap`-ed over).
-
- We need to define semantics for `fmap` on Parsers. A sane implementation would
- only map over the result, and keeping errors the same. This is a homomorphism,
- and follows the Functor laws.
-
- However, since we can't modify a function in place, we need to return a new
- parser that applies the given function _after_ the parsing is done.
-
- ```haskell
- instance Functor Parser where
- fn `fmap` (Parser p) = Parser go where
- go st = case p st of
- Left e -> Left e
- Right (res, str') -> Right (fn res, str')
- ```
-
- ### `Applicative`
-
- While Functor is something that can be mapped over, Applicative defines
- semantics for applying a function inside a context to something inside a
- context.
-
- The Applicative class is defined as
-
- ```haskell
- class Functor m => Applicative m where
- pure :: a -> m a
- (<*>) :: f (a -> b) -> f a -> f b
- ```
-
- Notice how the `pure`{.haskell} and the `return`{.haskell} methods are
- equivalent, so we only have to implement one of them.
-
- Let's go over this by parts.
-
- ```haskell
- instance Applicative Parser where
- pure x = Parser $ \str -> Right (x, str)
- ```
-
- The `pure`{.haskell} function leaves the stream untouched, and sets the result
- to the given value.
-
- The `(<*>)`{.haskell} function needs to to evaluate and parse the left-hand side
- to get the in-context function to apply it.
-
- ```haskell
- (Parser p) <*> (Parser p') = Parser go where
- go st = case p st of
- Left e -> Left e
- Right (fn, st') -> case p' st' of
- Left e' -> Left e'
- Right (v, st'') -> Right (fn v, st'')
- ```
-
- ### `Alternative`
-
- Since the only superclass of Alternative is Applicative, we can instance it
- without a Monad instance defined. We do, however, need an import of
- `Control.Applicative`{.haskell}.
-
- ```haskell
- instance Alternative Parser where
- empty = Parser $ \_ -> Left "empty parser"
- (Parser p) <|> (Parser p') = Parser go where
- go st = case p st of
- Left _ -> p' st
- Right x -> Right x
- ```
-
- ### `Monad`
-
- After almost a thousand words, one would be excused for forgetting we're
- implementing a _monadic_ parser combinator library. That means, we need an
- instance of the `Monad`{.haskell} typeclass.
-
- Since we have an instance of Applicative, we don't need an implementation of
- return: it is equivalent to `pure`, save for the class constraint.
-
- ```haskell
- instance Monad Parser where
- return = pure
- ```
-
-
- The `(>>=)`{.haskell} implementation, however, needs a bit more thought. Its
- type signature is
-
- ```haskell
- (>>=) :: m a -> (a -> m b) -> m b
- ```
-
- That means we need to extract a value from the Parser monad and apply it to the
- given function, producing a new Parser.
-
- ```haskell
- (Parser p) >>= f = Parser go where
- go s = case p s of
- Left e -> Left e
- Right (x, s') -> parse (f x) s'
- ```
-
- While some people think that the `fail`{.haskell} is not supposed to be in the
- Monad typeclass, we do need an implementation for when pattern matching fails.
- It is also convenient to use `fail`{.haskell} for the parsing action that
- returns an error with a given message.
-
- ```haskell
- fail m = Parser $ \_ -> Left m
- ```
-
- ---
-
- We now have a `Parser`{.haskell} monad, that expresses a parsing action. But, a
- parser library is no good when actual parsing is made harder than easier. To
- make parsing easier, we define _combinators_, functions that modify a parser in
- one way or another.
-
- But first, we should get some parsing functions.
-
- ### any, satisfying
-
- `any` is the parsing action that pops a character off the stream and returns
- that. It does no further parsing at all.
-
- ```haskell
- any :: Parser Char
- any = Parser go where
- go [] = Left "any: end of file"
- go (x:xs) = Right (x,xs)
- ```
-
- `satisfying` tests the parsed value against a function of type `Char ->
- Bool`{.haskell} before deciding if it's successful or a failure.
-
- ```haskell
- satisfy :: (Char -> Bool) -> Parser Char
- satisfy f = d
- x <- any
- if f x
- then return x
- else fail "satisfy: does not satisfy"
- ```
-
- We use the `fail`{.haskell} function defined above to represent failure.
-
- ### `oneOf`, `char`
-
- These functions are defined in terms of `satisfying`, and parse individual
- characters.
-
- ```haskell
- char :: Char -> Parser Char
- char c = satisfy (c ==) <?> "char: expected literal " ++ [c]
-
- oneOf :: String -> Parser Char
- oneOf s = satisfy (`elem` s) <?> "oneOf: expected one of '" ++ s ++ "'"
- ```
-
- ### `string`
-
- This parser parses a sequence of characters, in order.
-
- ```haskell
- string :: String -> Parser String
- string [] = return []
- string (x:xs) = do
- char x
- string xs
- return $ x:xs
- ```
-
- ---
-
- And that's it! In a few hundred lines, we have built a working parser combinator
- library with Functor, Applicative, Alternative, and Monad instances. While it's
- not as complex or featureful as Parsec in any way, it is powerful enough to
- define grammars for simple languages.
-
- [A transcription](/static/Parser.hs) ([with syntax
- highlighting](/static/Parser.hs.html)) of this file is available as runnable
- Haskell. The transcription also features some extra combinators for use.
|