my blog lives here now
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

309 lines
8.9 KiB

6 years ago
2 years ago
6 years ago
  1. ---
  2. title: You could have invented Parsec
  3. date: August 17, 2016 01:29 AM
  4. synopsys: 2
  5. ---
  6. As most of us should know, [Parsec](https://hackage.haskell.org/package/parsec)
  7. is a relatively fast, lightweight monadic parser combinator library.
  8. In this post I aim to show that monadic parsing is not only useful, but a simple
  9. concept to grok.
  10. We shall implement a simple parsing library with instances of common typeclasses
  11. of the domain, such as Monad, Functor and Applicative, and some example
  12. combinators to show how powerful this abstraction really is.
  13. ---
  14. Getting the buzzwords out of the way, being _monadic_ just means that Parsers
  15. instances of `Monad`{.haskell}. Recall the Monad typeclass, as defined in
  16. `Control.Monad`{.haskell},
  17. ```haskell
  18. class Applicative m => Monad m where
  19. return :: a -> m a
  20. (>>=) :: m a -> (a -> m b) -> m b
  21. {- Some fields omitted -}
  22. ```
  23. How can we fit a parser in the above constraints? To answer that, we must first
  24. define what a parser _is_.
  25. A naïve implementation of the `Parser`{.haskell} type would be a simple type
  26. synonym.
  27. ```haskell
  28. type Parser a = String -> (a, String)
  29. ```
  30. This just defines that a parser is a function from a string to a result pair
  31. with the parsed value and the resulting stream. This would mean that parsers are
  32. just state transformers, and if we define it as a synonym for the existing mtl
  33. `State`{.haskell} monad, we get the Monad, Functor and Applicative instances for
  34. free! But alas, this will not do.
  35. Apart from modeling the state transformation that a parser expresses, we need a
  36. way to represent failure. You already know that `Maybe a`{.haskell} expresses
  37. failure, so we could try something like this:
  38. ```haskell
  39. type Parser a = String -> Maybe (a, String)
  40. ```
  41. But, as you might have guessed, this is not the optimal representation either:
  42. `Maybe`{.haskell} _does_ model failure, but in a way that is lacking. It can
  43. only express that a computation was successful or that it failed, not why it
  44. failed. We need a way to fail with an error message. That is, the
  45. `Either`{.haskell} monad.
  46. ```haskell
  47. type Parser e a = String -> Either e (a, String)
  48. ```
  49. Notice how we have the `Maybe`{.haskell} and `Either`{.haskell} outside the
  50. tuple, so that when an error happens we stop parsing immediately. We could
  51. instead have them inside the tuple for better error reporting, but that's out of
  52. scope for a simple blag post.
  53. This is pretty close to the optimal representation, but there are still some
  54. warts things to address: `String`{.haskell} is a bad representation for textual
  55. data, so ideally you'd have your own `Stream`{.haskell} class that has instances
  56. for things such as `Text`{.haskell}, `ByteString`{.haskell} and
  57. `String`{.haskell}.
  58. One issue, however, is more glaring: You _can't_ define typeclass instances for
  59. type synonyms! The fix, however, is simple: make `Parser`{.haskell} a newtype.
  60. ```haskell
  61. newtype Parser a
  62. = Parser { parse :: String -> Either String (a, String) }
  63. ```
  64. ---
  65. Now that that's out of the way, we can actually get around to instancing some
  66. typeclasses.
  67. Since the AMP landed in GHC 7.10 (base 4.8), the hierarchy of the Monad
  68. typeclass is as follows:
  69. ```haskell
  70. class Functor (m :: * -> *) where
  71. class Functor m => Applicative m where
  72. class Applicative m => Monad m where
  73. ```
  74. That is, we need to implement Functor and Applicative before we can actually
  75. implement Monad.
  76. We shall also add an `Alternative`{.haskell} instance for expressing choice.
  77. First we need some utility functions, such as `runParser`{.haskell}, that runs a
  78. parser from a given stream.
  79. ```haskell
  80. runParser :: Parser a -> String -> Either String a
  81. runParser (Parser p) s = fst <$> p s
  82. ```
  83. We could also use function for modifying error messages. For convenience, we
  84. make this an infix operator, `<?>`{.haskell}.
  85. ```haskell
  86. (<?>) :: Parser a -> String -> Parser a
  87. (Parser p) <?> err = Parser go where
  88. go s = case p s of
  89. Left _ -> Left err
  90. Right x -> return x
  91. infixl 2 <?>
  92. ```
  93. `Functor`
  94. =======
  95. Remember that Functor models something that can be mapped over (technically,
  96. `fmap`-ed over).
  97. We need to define semantics for `fmap` on Parsers. A sane implementation would
  98. only map over the result, and keeping errors the same. This is a homomorphism,
  99. and follows the Functor laws.
  100. However, since we can't modify a function in place, we need to return a new
  101. parser that applies the given function _after_ the parsing is done.
  102. ```haskell
  103. instance Functor Parser where
  104. fn `fmap` (Parser p) = Parser go where
  105. go st = case p st of
  106. Left e -> Left e
  107. Right (res, str') -> Right (fn res, str')
  108. ```
  109. ### `Applicative`
  110. While Functor is something that can be mapped over, Applicative defines
  111. semantics for applying a function inside a context to something inside a
  112. context.
  113. The Applicative class is defined as
  114. ```haskell
  115. class Functor m => Applicative m where
  116. pure :: a -> m a
  117. (<*>) :: f (a -> b) -> f a -> f b
  118. ```
  119. Notice how the `pure`{.haskell} and the `return`{.haskell} methods are
  120. equivalent, so we only have to implement one of them.
  121. Let's go over this by parts.
  122. ```haskell
  123. instance Applicative Parser where
  124. pure x = Parser $ \str -> Right (x, str)
  125. ```
  126. The `pure`{.haskell} function leaves the stream untouched, and sets the result
  127. to the given value.
  128. The `(<*>)`{.haskell} function needs to to evaluate and parse the left-hand side
  129. to get the in-context function to apply it.
  130. ```haskell
  131. (Parser p) <*> (Parser p') = Parser go where
  132. go st = case p st of
  133. Left e -> Left e
  134. Right (fn, st') -> case p' st' of
  135. Left e' -> Left e'
  136. Right (v, st'') -> Right (fn v, st'')
  137. ```
  138. ### `Alternative`
  139. Since the only superclass of Alternative is Applicative, we can instance it
  140. without a Monad instance defined. We do, however, need an import of
  141. `Control.Applicative`{.haskell}.
  142. ```haskell
  143. instance Alternative Parser where
  144. empty = Parser $ \_ -> Left "empty parser"
  145. (Parser p) <|> (Parser p') = Parser go where
  146. go st = case p st of
  147. Left _ -> p' st
  148. Right x -> Right x
  149. ```
  150. ### `Monad`
  151. After almost a thousand words, one would be excused for forgetting we're
  152. implementing a _monadic_ parser combinator library. That means, we need an
  153. instance of the `Monad`{.haskell} typeclass.
  154. Since we have an instance of Applicative, we don't need an implementation of
  155. return: it is equivalent to `pure`, save for the class constraint.
  156. ```haskell
  157. instance Monad Parser where
  158. return = pure
  159. ```
  160. The `(>>=)`{.haskell} implementation, however, needs a bit more thought. Its
  161. type signature is
  162. ```haskell
  163. (>>=) :: m a -> (a -> m b) -> m b
  164. ```
  165. That means we need to extract a value from the Parser monad and apply it to the
  166. given function, producing a new Parser.
  167. ```haskell
  168. (Parser p) >>= f = Parser go where
  169. go s = case p s of
  170. Left e -> Left e
  171. Right (x, s') -> parse (f x) s'
  172. ```
  173. While some people think that the `fail`{.haskell} is not supposed to be in the
  174. Monad typeclass, we do need an implementation for when pattern matching fails.
  175. It is also convenient to use `fail`{.haskell} for the parsing action that
  176. returns an error with a given message.
  177. ```haskell
  178. fail m = Parser $ \_ -> Left m
  179. ```
  180. ---
  181. We now have a `Parser`{.haskell} monad, that expresses a parsing action. But, a
  182. parser library is no good when actual parsing is made harder than easier. To
  183. make parsing easier, we define _combinators_, functions that modify a parser in
  184. one way or another.
  185. But first, we should get some parsing functions.
  186. ### any, satisfying
  187. `any` is the parsing action that pops a character off the stream and returns
  188. that. It does no further parsing at all.
  189. ```haskell
  190. any :: Parser Char
  191. any = Parser go where
  192. go [] = Left "any: end of file"
  193. go (x:xs) = Right (x,xs)
  194. ```
  195. `satisfying` tests the parsed value against a function of type `Char ->
  196. Bool`{.haskell} before deciding if it's successful or a failure.
  197. ```haskell
  198. satisfy :: (Char -> Bool) -> Parser Char
  199. satisfy f = d
  200. x <- any
  201. if f x
  202. then return x
  203. else fail "satisfy: does not satisfy"
  204. ```
  205. We use the `fail`{.haskell} function defined above to represent failure.
  206. ### `oneOf`, `char`
  207. These functions are defined in terms of `satisfying`, and parse individual
  208. characters.
  209. ```haskell
  210. char :: Char -> Parser Char
  211. char c = satisfy (c ==) <?> "char: expected literal " ++ [c]
  212. oneOf :: String -> Parser Char
  213. oneOf s = satisfy (`elem` s) <?> "oneOf: expected one of '" ++ s ++ "'"
  214. ```
  215. ### `string`
  216. This parser parses a sequence of characters, in order.
  217. ```haskell
  218. string :: String -> Parser String
  219. string [] = return []
  220. string (x:xs) = do
  221. char x
  222. string xs
  223. return $ x:xs
  224. ```
  225. ---
  226. And that's it! In a few hundred lines, we have built a working parser combinator
  227. library with Functor, Applicative, Alternative, and Monad instances. While it's
  228. not as complex or featureful as Parsec in any way, it is powerful enough to
  229. define grammars for simple languages.
  230. [A transcription](/static/Parser.hs) ([with syntax
  231. highlighting](/static/Parser.hs.html)) of this file is available as runnable
  232. Haskell. The transcription also features some extra combinators for use.