Popularity

9.7

Stable

Activity

3.9

Stars 360

Watchers 21

Forks 24

Last Commit 11 months ago

Monthly Downloads: 67

Programming language: Haskell

License: BSD 3-clause "New" or "Revised" License

Tags: Parsing

Latest version: v0.13.0.1

Earley alternatives and similar packages

Based on the "Parsing" category.
Alternatively, view Earley alternatives based on common mentions on social networks and blogs.

parsec

9.9 4.7 Earley VS parsec

A monadic parser combinator library
megaparsec

9.9 6.9 Earley VS megaparsec

Industrial-strength monadic parser combinator library

WorkOS - The modern identity platform for B2B SaaS

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

Promo workos.com

attoparsec

9.8 3.4 Earley VS attoparsec

A fast Haskell library for parsing ByteStrings
trifecta

9.7 5.9 Earley VS trifecta

Parser combinators with highlighting, slicing, layout, literate comments, Clang-style diagnostics and the kitchen sink
parsers

9.4 4.9 Earley VS parsers

Generic parser combinators
xeno

9.3 0.0 Earley VS xeno

Fast Haskell XML parser
cron

8.9 4.6 Earley VS cron

Cron data structure and parser for Haskell
parser-combinators

8.5 5.5 Earley VS parser-combinators

Lightweight package providing commonly useful parser combinators
replace-megaparsec

8.5 4.0 Earley VS replace-megaparsec

Stream editing with Haskell Megaparsec parsers
fixhs

8.3 0.0 Earley VS fixhs

FIX (co)parser in haskell
descriptive

8.2 0.0 Earley VS descriptive

Self-describing consumers/parsers; forms, cmd-line args, JSON, etc.
scanner

8.1 0.0 Earley VS scanner

Fast non-backtracking incremental combinator parsing for bytestrings
weighted-regexp

7.6 0.0 Earley VS weighted-regexp

Regular Expression Matching in Haskell
yoctoparsec

7.6 0.0 Earley VS yoctoparsec

A truly tiny monadic parsing library
pipes-aeson

7.4 3.8 Earley VS pipes-aeson

Encode and decode JSON streams using Aeson and Pipes.
uulib

7.2 0.0 Earley VS uulib

The UUlib libraries
pipes-attoparsec

7.2 3.7 Earley VS pipes-attoparsec

Utilities to convert a parser into a pipe
parsec-free

7.1 3.4 Earley VS parsec-free

Parsec API encoded as a deeply-embedded DSL, for debugging and analysis
incremental-parser

7.0 5.3 Earley VS incremental-parser

Haskell parsing combinator liibrary that can be fed the input and emit the parsed output incrementally
unparse-attoparsec

6.9 0.0 Earley VS unparse-attoparsec

An attoparsec roundtrip
hs-logo

6.8 0.0 Earley VS hs-logo

Logo interpreter written in Haskell
gcodehs

6.7 5.4 Earley VS gcodehs

GCode parser, pretty-printer and processing utils
data-stm32

6.7 8.3 Earley VS data-stm32

ARM SVD and CubeMX XML parser and pretty printer for STM32 family
hsemail

6.6 5.2 Earley VS hsemail

Haskell Parsec parsers for the syntax defined in RFC2821 and 2822
inchworm

6.5 1.8 Earley VS inchworm

Simple parser combinators for lexical analysis.
makefile

6.5 0.0 Earley VS makefile

Haskell Makefile parser
replace-attoparsec

6.3 2.3 Earley VS replace-attoparsec

Stream editing with Haskell Attoparsec parsers
appar

6.0 0.0 Earley VS appar

A simple applicative parser in Parsec style
timerep

6.0 3.2 Earley VS timerep

Reading and parsing time in Haskell
papillon

5.9 0.0 Earley VS papillon

simple peg parser generater for Haskell
parsec-parsers

5.7 0.0 Earley VS parsec-parsers

Orphan instances so you can use `parsers` with `parsec`.
parsers-megaparsec

5.4 2.5 Earley VS parsers-megaparsec

`parsers` instances for Megaparsec
matrix-market-attoparsec

5.1 0.0 Earley VS matrix-market-attoparsec

Attoparsec parsers for the NIST Matrix Market format
parsix

5.0 0.0 Earley VS parsix

Adventures in parser combinators
record-syntax

4.8 0.0 Earley VS record-syntax

A library for parsing and processing the Haskell syntax sprinkled with anonymous records
attoparsec-parsec

4.8 0.0 Earley VS attoparsec-parsec

An Attoparsec compatibility layer for Parsec
fastparser

4.6 2.3 Earley VS fastparser

A fast bytestring parser
rere

4.1 2.7 Earley VS rere

recursive regular expressions
diff-parse

4.1 0.0 Earley VS diff-parse

Haskell diff file parsing library
attoparsec-expr

3.9 0.0 Earley VS attoparsec-expr

Port of parsec's expression parser to attoparsec.
fuzzy-dates

3.7 0.0 Earley VS fuzzy-dates

Automatically detect and parse dates in many different formats
syntactical

3.6 5.5 Earley VS syntactical

Haskell library for distfix expression parsing
parsimony

3.4 0.0 Earley VS parsimony

Haskell parser combinators derived from Parsec
streaming-binary

3.4 0.0 Earley VS streaming-binary

Incremental serialization and deserialization of Haskell values.
ponder

3.1 0.0 Earley VS ponder

PEG parser combinator
hspec-parsec

2.8 0.0 Earley VS hspec-parsec

Hspec expectations for testing Parsec parsers
antlrc

2.5 0.0 Earley VS antlrc

Haskell binding to the ANTLR parser generator C runtime library http://www.antlr.org/wiki/display/ANTLR3/ANTLR3+Code+Generation+-+C
attoparsec-data

2.4 4.8 Earley VS attoparsec-data

Parsers for the standard Haskell data types
attoparsec-time

2.2 5.5 Earley VS attoparsec-time

Attoparsec parsers of time
bytearray-parsing

2.2 0.0 Earley VS bytearray-parsing

Parsing of bytearray-based data

Do you think we are missing an alternative of Earley or a related project?

Add another 'Parsing' Package

Popular Comparisons

README

Earley

Go to the API documentation on Hackage.

This (Text.Earley) is a library consisting of a few main parts:

Text.Earley.Grammar

An embedded context-free grammar (CFG) domain-specific language (DSL) with semantic action specification in applicative style.

An example of a typical expression grammar working on an input tokenised into strings is the following:

   expr :: Grammar r (Prod r String String Expr)
   expr = mdo
     x1 <- rule $ Add <$> x1 <* namedToken "+" <*> x2
               <|> x2
               <?> "sum"
     x2 <- rule $ Mul <$> x2 <* namedToken "*" <*> x3
               <|> x3
               <?> "product"
     x3 <- rule $ Var <$> (satisfy ident <?> "identifier")
               <|> namedToken "(" *> x1 <* namedToken ")"
     return x1
     where
       ident (x:_) = isAlpha x
       ident _     = False

Text.Earley.Parser

An implementation of (a modification of) the Earley parsing algorithm.

To invoke the parser on the above grammar, run e.g. (here using words as a stupid tokeniser):

   fullParses (parser expr) $ words "a + b * ( c + d )"
   = ( [Add (Var "a") (Mul (Var "b") (Add (Var "c") (Var "d")))]
     , Report {...}
     )

Note that we get a list of all the possible parses (though in this case there is only one).

Another invocation, which shows the error reporting capabilities (giving the last position that the parser reached and what it expected at that point), is the following:

   fullParses (parser expr) $ words "a +"
   = ( []
     , Report { position   = 2
              , expected   = ["(","identifier","product"]
              , unconsumed = []
              }
     )

Text.Earley.Generator

Functionality to generate the members of the language that a grammar generates.

To get the language of a grammar given a list of allowed tokens, run e.g.:

   language (generator romanNumeralsGrammar "VIX")
   = [(0,""),(1,"I"),(5,"V"),(10,"X"),(20,"XX"),(11,"XI"),(15,"XV"),(6,"VI"),(9,"IX"),(4,"IV"),(2,"II"),(3,"III"),(19,"XIX"),(16,"XVI"),(14,"XIV"),(12,"XII"),(7,"VII"),(21,"XXI"),(25,"XXV"),(30,"XXX"),(31,"XXXI"),(35,"XXXV"),(8,"VIII"),(13,"XIII"),(17,"XVII"),(26,"XXVI"),(29,"XXIX"),(24,"XXIV"),(22,"XXII"),(18,"XVIII"),(36,"XXXVI"),(39,"XXXIX"),(34,"XXXIV"),(32,"XXXII"),(23,"XXIII"),(27,"XXVII"),(33,"XXXIII"),(28,"XXVIII"),(37,"XXXVII"),(38,"XXXVIII")]

The above example shows the language generated by a [Roman numerals grammar](examples/RomanNumerals.hs) limited to the tokens 'V', 'I', and 'X'.

Text.Earley.Mixfix

Helper functionality for creating parsers for expressions with mixfix identifiers in the style of Agda.

How do I write grammars?

As hinted at above, the grammars are written inside Grammar, which is a Monad and MonadFix. For the library to be able to tame the recursion in the grammars, we have to use the rule function whenever a production is recursive.

Whenever you would write e.g.

...
p = foo <|> bar <*> p
...

in a conventional combinator parser library, you instead write the following:

grammar = mdo
  ...
  p <- rule $ foo <|> bar <*> p
  ...

Apart from making it possible to do recursion (even left-recursion), rules have an additional benefit: they control where work is shared, by the rule that any rule is only ever expanded once per position in the input string. If a rule is encountered more than once at a position, the work is shared.

Compared to parser generators and combinator libraries

This library differs from the main methods that are used to write parsers in the Haskell ecosystem:

Compared to parser generators (YACC, Happy, etc.) it requires very little pre-processing of the grammar. It also allows you to stay in the host language for both grammar and parser, i.e. there is no use of a separate tool. This also means that you are free to use the abstraction facilities of Haskell when writing a grammar. Currently the library requires a linear traversal of the grammar's rules before use, which is usually fast enough to do at run time, but precludes infinite grammars.
The grammar language is similar to that of many parser combinators (Parsec, Attoparsec, parallel parsing processes, etc.), providing an applicative interface, but the parser gracefully handles all finite CFGs, including those with left-recursion. On the other hand, its productions are not monadic meaning that it does not support context-sensitive or infinite grammars, which are supported by many parser combinator libraries.

Note: The Grammar type is a Monad (used to provide observable sharing) but it lives a layer above productions. It cannot be used to decide what production to use depending on the result of a previous production, i.e. it does not give us monadic parsing.

The parsing algorithm

The parsing algorithm that this library uses is based on Earley's parsing algorithm. The algorithm has been modified to produce online parse results, to give good error messages, and to allow garbage collection of the item sets. Essentially, instead of storing a sequence of sets of items like in the original algorithm, the modified algorithm just stores pointers back to sets of reachable items.

The worst-case run time performance of the Earley parsing algorithm is cubic in the length of the input, but for large classes of grammars it is linear. It should however be noted that this library will likely be slower than most parser generators and parser combinator libraries.

The parser implements an optimisation similar to that presented in Joop M.I.M Leo's paper A general context-free parsing algorithm running in linear time on every LR(k) grammar without using lookahead, which removes indirections in sequences of non-ambiguous backpointers between item sets.

For more in-depth information about the internals of the library, there are [implementation notes](docs/implementation.md) currently being written.