### Python Parser Combinator

Algorithms
Parsing
Strings
Higher-order Functions
Functions
Control Flow
Basic Language Features
Fundamentals

Seeing how there aren't any parsing libraries available in any language i use besides Haskell, i wrote up this small parser combinator library in Python based on the ReadS type from Haskell. The idea is simliar to PyParsing but the combinators proviceded are based off Parsec.

The parser type `P<a>` is a light wrapper around a function of type `String, Int -> [(String, Int, a)]`, which, given an input string and the current index, returns a generator of all possible matching tuples of (input string, next index, matched item).

The combinator that matches a character can be written as:

``````def char(chr):
def inner(str, idx):
if idx >= len(str): return
if str[idx] != chr: return
yield (str, idx + 1, chr)
return P(inner)
``````

Using `char`, the combinator that matches a sequence of characters could be written by combining multiple parsers:

``````def sequence(x, y, z):
return P.char(x).then(P.char(y)).then(P.char(z)).replace(x + y + z)
# or alternatively...
def sequence(x, y, z):
return P.char(x) >> P.char(y) >> P.char(z) ** (x + y + z)
# or even...
def sequence(*xs):
return P.seq(*map(P.char, xs)) ** xs
# sequence('a', 'b', 'c') should match 'abc'
``````
``````from functools import *
from itertools import *
import re

M = staticmethod

class P:
# f :: String, Int -> [(String, Int, a)]
__init__ = lambda self, f=None: setattr(self, 'f', f)
# pass forward a parser, similar to PyParsing's Forward()
forward = lambda self, f: [self, setattr(self, 'f', f)][0]

# same as stuff in Haskell

# pure = return
pure = M(lambda a: P(lambda s, i: iter([(s, i, a)])))
# bind = (>>=)
bind = lambda self, f: P(lambda s, i: (x for xs in
map(lambda x: f(x[2]).f(x[0], x[1]), self.f(s, i)) for x in xs))
# fmap = (<\$>)
fmap = lambda self, f: P(lambda s, i: ((x, y, f(z)) for x, y, z in self.f(s, i)))
# replace = (\$>)
replace = lambda self, v: P(lambda s, i: ((x, y, v) for x, y, _ in self.f(s, i)))
# apply = (<*>)
apply = lambda self, f: self.bind(lambda x: f.bind(lambda y: P.pure(y(x))))
# then = (>>)
then = lambda self, p: self.bind(lambda _: p)
# before = (<<)
before = lambda self, p: self.bind(lambda x: p.replace(x))
# alter = (<|>)
# N.B. the choices are left biased
#      place the more likely matches on the left as an optimization
alter = lambda self, p: P(lambda s, i: chain(self.f(s, i), p.f(s, i)))
# many = zeroOrMore
many = lambda self: self.some().alter(P.pure(iter([])))
# some = oneOrMore
some = lambda self: P.fix(lambda p: self.bind(lambda x:
p.alter(P.pure(iter([]))).fmap(lambda ys: chain([x], ys))))
# seq = sequence
# N.B. this returns a generator, so do parser.fmap(list) to be able to index it
seq = M(lambda *xs: reduce(lambda p, x: x.bind(lambda a: p.fmap(
lambda b: chain([a], b))), list(xs)[::-1], P.pure(iter([]))))

creep = lambda self, n: P(lambda s, i: ((x, y + n, z) for x, y, z in self.f(s, i)))
# skip n or more whitespace chars
lex = lambda self, n=0: self.before([P.many, P.some][n](P.char(str.isspace)))
# possibly successful parse based on a predicate
# f :: String, Int -> Either () (Int, a)
pred = M(lambda f:P(lambda s,i:iter([[(s,i+b[0],b[1])]if b else[]for b in[f(s,i)]][0])))
# same as pred but must not be at eof
pred1 = M(lambda f: P.pred(lambda s, i: f(s, i) if i<len(s) else []))
# parse a single char
# f :: Either Char (Char -> Bool)
char = M(lambda f: P.pred1(lambda s,i: [1,s[i]]*(f if callable(f)else lambda c:f==c)(s[i])))
# parser that always fails
fail = M(lambda: P(lambda x, y: iter([])))
# parse a string
string = M(lambda s: P.seq(*map(lambda c: P.char(lambda x: x == c), s)).replace(s))
# choose from a list of alternatives
choice = M(lambda *xs: reduce(lambda p, x: p.alter(x), xs, P.fail()))
# parse a regex
# N.B. this only gives either 0 or 1 result, so no backtracking if you use (x|y)
regex = M(lambda pattern, group=None, flags=0: P.pred(lambda s, i:
[[m.end() - i, m.group(group) if group is not None else m] if m is not None else []
for m in [re.search(r'(?<=^.{{{}}}){}'.format(i, pattern), s, flags)]][0]))
# parse end of input
eof = M(lambda: P.pred(lambda s, i: [0,None]*(i>=len(s))))
# helper function to avoid having to use .forward()
# f :: Parser a -> Any
fix = M(lambda f: [p.forward(q.f) for p in [P(None)] for q in [f(p)]][0])

# run the parser
# set just=True to yield only the match result
# N.B. this returns a generator of all possible matches,
#   with the first usually being the longest match.
#   use parser.then(P.eof()) if you want the full match
parse = lambda self, s, i=0, just=False: (x[2] if just else x for x in self.f(s, i))

__pow__,__pos__,__invert__,__or__,__rshift__,__lshift__=replace,some,many,alter,then,before

# usually i'd copy paste this condensed version if i use this library in a kata