Popularity
7.0
Stable
Activity
0.0
Stable
21
4
5
Monthly Downloads: 7
Programming language: Haskell
License: MIT License
punkt alternatives and similar packages
Based on the "Natural Language Processing" category.
Alternatively, view punkt alternatives based on common mentions on social networks and blogs.
-
minimorph
English spelling functions with an emphasis on simplicity. Originally by https://github.com/kowey. -
sentiwordnet-parser
Parser for the [SentiWordNet](http://sentiwordnet.isti.cnr.it/) tab-separated file
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Promo
www.influxdata.com
Do you think we are missing an alternative of punkt or a related project?
README
punkt
Multilingual unsupervised sentence tokenization with Punkt.
Usage
Note that abbreviations are detected at run time without the aid of a pre-built abbreviation list:
import Data.Text (Text, pack)
import NLP.Punkt (split_sentences)
corpus :: Text
corpus = pack "Look, Ma! The quick brown Mr. T. rex swallowed the lazy dog. \
\It really did!"
main :: IO ()
main = mapM_ print (split_sentences corpus)
yields:
"Look, Ma!"
"The quick brown Mr. T. rex swallowed the lazy dog."
"It really did!"
References
Kiss, Tibor, and Jan Strunk. "Unsupervised multilingual sentence boundary detection." Computational Linguistics 32.4 (2006): 485-525.
TODO
- parallelize
- modularize tokenization
- custom tokenization rules
- needs to go fasterer