my blog lives here now
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

308 lines
8.9 KiB

7 years ago
  1. ---
  2. title: You could have invented Parsec
  3. date: August 17, 2016 01:29 AM
  4. ---
  5. As most of us should know, [Parsec](https://hackage.haskell.org/package/parsec)
  6. is a relatively fast, lightweight monadic parser combinator library.
  7. In this post I aim to show that monadic parsing is not only useful, but a simple
  8. concept to grok.
  9. We shall implement a simple parsing library with instances of common typeclasses
  10. of the domain, such as Monad, Functor and Applicative, and some example
  11. combinators to show how powerful this abstraction really is.
  12. ---
  13. Getting the buzzwords out of the way, being _monadic_ just means that Parsers
  14. instances of `Monad`{.haskell}. Recall the Monad typeclass, as defined in
  15. `Control.Monad`{.haskell},
  16. ```haskell
  17. class Applicative m => Monad m where
  18. return :: a -> m a
  19. (>>=) :: m a -> (a -> m b) -> m b
  20. {- Some fields omitted -}
  21. ```
  22. How can we fit a parser in the above constraints? To answer that, we must first
  23. define what a parser _is_.
  24. A naïve implementation of the `Parser`{.haskell} type would be a simple type
  25. synonym.
  26. ```haskell
  27. type Parser a = String -> (a, String)
  28. ```
  29. This just defines that a parser is a function from a string to a result pair
  30. with the parsed value and the resulting stream. This would mean that parsers are
  31. just state transformers, and if we define it as a synonym for the existing mtl
  32. `State`{.haskell} monad, we get the Monad, Functor and Applicative instances for
  33. free! But alas, this will not do.
  34. Apart from modeling the state transformation that a parser expresses, we need a
  35. way to represent failure. You already know that `Maybe a`{.haskell} expresses
  36. failure, so we could try something like this:
  37. ```haskell
  38. type Parser a = String -> Maybe (a, String)
  39. ```
  40. But, as you might have guessed, this is not the optimal representation either:
  41. `Maybe`{.haskell} _does_ model failure, but in a way that is lacking. It can
  42. only express that a computation was successful or that it failed, not why it
  43. failed. We need a way to fail with an error message. That is, the
  44. `Either`{.haskell} monad.
  45. ```haskell
  46. type Parser e a = String -> Either e (a, String)
  47. ```
  48. Notice how we have the `Maybe`{.haskell} and `Either`{.haskell} outside the
  49. tuple, so that when an error happens we stop parsing immediately. We could
  50. instead have them inside the tuple for better error reporting, but that's out of
  51. scope for a simple blag post.
  52. This is pretty close to the optimal representation, but there are still some
  53. warts things to address: `String`{.haskell} is a bad representation for textual
  54. data, so ideally you'd have your own `Stream`{.haskell} class that has instances
  55. for things such as `Text`{.haskell}, `ByteString`{.haskell} and
  56. `String`{.haskell}.
  57. One issue, however, is more glaring: You _can't_ define typeclass instances for
  58. type synonyms! The fix, however, is simple: make `Parser`{.haskell} a newtype.
  59. ```haskell
  60. newtype Parser a
  61. = Parser { parse :: String -> Either String (a, String) }
  62. ```
  63. ---
  64. Now that that's out of the way, we can actually get around to instancing some
  65. typeclasses.
  66. Since the AMP landed in GHC 7.10 (base 4.8), the hierarchy of the Monad
  67. typeclass is as follows:
  68. ```haskell
  69. class Functor (m :: * -> *) where
  70. class Functor m => Applicative m where
  71. class Applicative m => Monad m where
  72. ```
  73. That is, we need to implement Functor and Applicative before we can actually
  74. implement Monad.
  75. We shall also add an `Alternative`{.haskell} instance for expressing choice.
  76. First we need some utility functions, such as `runParser`{.haskell}, that runs a
  77. parser from a given stream.
  78. ```haskell
  79. runParser :: Parser a -> String -> Either String a
  80. runParser (Parser p) s = fst <$> p s
  81. ```
  82. We could also use function for modifying error messages. For convenience, we
  83. make this an infix operator, `<?>`{.haskell}.
  84. ```haskell
  85. (<?>) :: Parser a -> String -> Parser a
  86. (Parser p) <?> err = Parser go where
  87. go s = case p s of
  88. Left _ -> Left err
  89. Right x -> return x
  90. infixl 2 <?>
  91. ```
  92. `Functor`
  93. =======
  94. Remember that Functor models something that can be mapped over (technically,
  95. `fmap`-ed over).
  96. We need to define semantics for `fmap` on Parsers. A sane implementation would
  97. only map over the result, and keeping errors the same. This is a homomorphism,
  98. and follows the Functor laws.
  99. However, since we can't modify a function in place, we need to return a new
  100. parser that applies the given function _after_ the parsing is done.
  101. ```haskell
  102. instance Functor Parser where
  103. fn `fmap` (Parser p) = Parser go where
  104. go st = case p st of
  105. Left e -> Left e
  106. Right (res, str') -> Right (fn res, str')
  107. ```
  108. ### `Applicative`
  109. While Functor is something that can be mapped over, Applicative defines
  110. semantics for applying a function inside a context to something inside a
  111. context.
  112. The Applicative class is defined as
  113. ```haskell
  114. class Functor m => Applicative m where
  115. pure :: a -> m a
  116. (<*>) :: f (a -> b) -> f a -> f b
  117. ```
  118. Notice how the `pure`{.haskell} and the `return`{.haskell} methods are
  119. equivalent, so we only have to implement one of them.
  120. Let's go over this by parts.
  121. ```haskell
  122. instance Applicative Parser where
  123. pure x = Parser $ \str -> Right (x, str)
  124. ```
  125. The `pure`{.haskell} function leaves the stream untouched, and sets the result
  126. to the given value.
  127. The `(<*>)`{.haskell} function needs to to evaluate and parse the left-hand side
  128. to get the in-context function to apply it.
  129. ```haskell
  130. (Parser p) <*> (Parser p') = Parser go where
  131. go st = case p st of
  132. Left e -> Left e
  133. Right (fn, st') -> case p' st' of
  134. Left e' -> Left e'
  135. Right (v, st'') -> Right (fn v, st'')
  136. ```
  137. ### `Alternative`
  138. Since the only superclass of Alternative is Applicative, we can instance it
  139. without a Monad instance defined. We do, however, need an import of
  140. `Control.Applicative`{.haskell}.
  141. ```haskell
  142. instance Alternative Parser where
  143. empty = Parser $ \_ -> Left "empty parser"
  144. (Parser p) <|> (Parser p') = Parser go where
  145. go st = case p st of
  146. Left _ -> p' st
  147. Right x -> Right x
  148. ```
  149. ### `Monad`
  150. After almost a thousand words, one would be excused for forgetting we're
  151. implementing a _monadic_ parser combinator library. That means, we need an
  152. instance of the `Monad`{.haskell} typeclass.
  153. Since we have an instance of Applicative, we don't need an implementation of
  154. return: it is equivalent to `pure`, save for the class constraint.
  155. ```haskell
  156. instance Monad Parser where
  157. return = pure
  158. ```
  159. The `(>>=)`{.haskell} implementation, however, needs a bit more thought. Its
  160. type signature is
  161. ```haskell
  162. (>>=) :: m a -> (a -> m b) -> m b
  163. ```
  164. That means we need to extract a value from the Parser monad and apply it to the
  165. given function, producing a new Parser.
  166. ```haskell
  167. (Parser p) >>= f = Parser go where
  168. go s = case p s of
  169. Left e -> Left e
  170. Right (x, s') -> parse (f x) s'
  171. ```
  172. While some people think that the `fail`{.haskell} is not supposed to be in the
  173. Monad typeclass, we do need an implementation for when pattern matching fails.
  174. It is also convenient to use `fail`{.haskell} for the parsing action that
  175. returns an error with a given message.
  176. ```haskell
  177. fail m = Parser $ \_ -> Left m
  178. ```
  179. ---
  180. We now have a `Parser`{.haskell} monad, that expresses a parsing action. But, a
  181. parser library is no good when actual parsing is made harder than easier. To
  182. make parsing easier, we define _combinators_, functions that modify a parser in
  183. one way or another.
  184. But first, we should get some parsing functions.
  185. ### any, satisfying
  186. `any` is the parsing action that pops a character off the stream and returns
  187. that. It does no further parsing at all.
  188. ```haskell
  189. any :: Parser Char
  190. any = Parser go where
  191. go [] = Left "any: end of file"
  192. go (x:xs) = Right (x,xs)
  193. ```
  194. `satisfying` tests the parsed value against a function of type `Char ->
  195. Bool`{.haskell} before deciding if it's successful or a failure.
  196. ```haskell
  197. satisfy :: (Char -> Bool) -> Parser Char
  198. satisfy f = d
  199. x <- any
  200. if f x
  201. then return x
  202. else fail "satisfy: does not satisfy"
  203. ```
  204. We use the `fail`{.haskell} function defined above to represent failure.
  205. ### `oneOf`, `char`
  206. These functions are defined in terms of `satisfying`, and parse individual
  207. characters.
  208. ```haskell
  209. char :: Char -> Parser Char
  210. char c = satisfy (c ==) <?> "char: expected literal " ++ [c]
  211. oneOf :: String -> Parser Char
  212. oneOf s = satisfy (`elem` s) <?> "oneOf: expected one of '" ++ s ++ "'"
  213. ```
  214. ### `string`
  215. This parser parses a sequence of characters, in order.
  216. ```haskell
  217. string :: String -> Parser String
  218. string [] = return []
  219. string (x:xs) = do
  220. char x
  221. string xs
  222. return $ x:xs
  223. ```
  224. ---
  225. And that's it! In a few hundred lines, we have built a working parser combinator
  226. library with Functor, Applicative, Alternative, and Monad instances. While it's
  227. not as complex or featureful as Parsec in any way, it is powerful enough to
  228. define grammars for simple languages.
  229. [A transcription](/static/Parser.hs) ([with syntax
  230. highlighting](/static/Parser.hs.html)) of this file is available as runnable
  231. Haskell. The transcription also features some extra combinators for use.