punkt alternatives and similar packages

Based on the "Natural Language Processing" category.
Alternatively, view punkt alternatives based on common mentions on social networks and blogs.

chatter

8.9 2.3 punkt VS chatter

A library of Natural Language Processing algorithms for Haskell.
numerals

7.9 0.0 punkt VS numerals

Convert numbers to number words

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

mecab

7.9 0.0 punkt VS mecab

A Haskell binding to MeCab
nerf

6.3 0.0 punkt VS nerf

Named entity recognition tool based on linear-chain CRFs
concraft-pl

6.3 0.0 punkt VS concraft-pl

A morphosyntactic tagger for Polish based on conditional random fields
cndict

6.2 0.0 punkt VS cndict

Chinese/Mandarin <-> English dictionary, Chinese lexer.
hext

5.8 0.7 punkt VS hext

DISCONTINUED. a text classification library
concraft

5.1 0.0 punkt VS concraft

A morphosyntactic disambiguation library based on constrained conditional random fields
partage

4.8 0.0 punkt VS partage

A* parser for tree adjoining grammars
minimorph

4.4 0.0 punkt VS minimorph

English spelling functions with an emphasis on simplicity. Originally by https://github.com/kowey.
tsuntsun

3.9 0.0 punkt VS tsuntsun

Interacts with tesseract to ease reading of RAW Japanese manga.
PTQ

3.7 0.0 punkt VS PTQ

An implementation of Montague's PTQ (Proper Treatment of Quantification).
numerals-base

3.4 0.0 punkt VS numerals-base

Convert numbers to number words
corenlp-parser

3.4 0.0 punkt VS corenlp-parser

Launches CoreNLP and parses the JSON output
hist-pl

2.8 0.0 punkt VS hist-pl

Programs and libraries related to the historical dictionary of Polish
polh-lexicon

2.8 0.0 punkt VS polh-lexicon

Programs and libraries related to the historical dictionary of Polish
sentiwordnet-parser

2.8 0.0 punkt VS sentiwordnet-parser

Parser for the [SentiWordNet](http://sentiwordnet.isti.cnr.it/) tab-separated file
data-named

2.4 0.0 punkt VS data-named

Named entity data layer
haskell-postal

2.4 0.0 punkt VS haskell-postal

Haskell binding for the libpostal library
crf-chain2-tiers

2.4 0.0 punkt VS crf-chain2-tiers

Second-order, tiered, constrained, linear conditional random fields
adict

2.2 0.0 punkt VS adict

Approximate dictionary searching Haskell library
phonetic-code

2.2 0.0 punkt VS phonetic-code

phonetic codes in Haskell
penntreebank-megaparsec

1.0 0.0 punkt VS penntreebank-megaparsec

Megaparsec parsers for trees in the Penn Treebank format
concraft-hr

1.0 0.0 punkt VS concraft-hr

A part-of-speech tagger for Croatian based on the concraft library.
ENIG

1.0 0.0 punkt VS ENIG

Korean postposition particle selector
moan

1.0 0.0 punkt VS moan

Language-agnostic analyzer for positional morphosyntactic descriptors
arpa

- - punkt VS arpa

DISCONTINUED. Library for reading ARPA n-gram models

Do you think we are missing an alternative of punkt or a related project?

Add another 'Natural Language Processing' Package

README

punkt

Multilingual unsupervised sentence tokenization with Punkt.

Usage

Note that abbreviations are detected at run time without the aid of a pre-built abbreviation list:

import Data.Text (Text, pack)
import NLP.Punkt (split_sentences)

corpus :: Text
corpus = pack "Look, Ma! The quick brown Mr. T. rex swallowed the lazy dog. \
              \It really did!"

main :: IO ()
main = mapM_ print (split_sentences corpus)

yields:

"Look, Ma!"
"The quick brown Mr. T. rex swallowed the lazy dog."
"It really did!"

References

Kiss, Tibor, and Jan Strunk. "Unsupervised multilingual sentence boundary detection." Computational Linguistics 32.4 (2006): 485-525.

TODO

parallelize
modularize tokenization
- custom tokenization rules
needs to go fasterer

punkt

Unsupervised multilingual sentence segmentation.

punkt alternatives and similar packages

README

punkt

Usage

References

TODO