punkt alternatives and similar packages
Based on the "Natural Language Processing" category.
Alternatively, view punkt alternatives based on common mentions on social networks and blogs.
Do you think we are missing an alternative of punkt or a related project?
Multilingual unsupervised sentence tokenization with Punkt.
Note that abbreviations are detected at run time without the aid of a pre-built abbreviation list:
import Data.Text (Text, pack) import NLP.Punkt (split_sentences) corpus :: Text corpus = pack "Look, Ma! The quick brown Mr. T. rex swallowed the lazy dog. \ \It really did!" main :: IO () main = mapM_ print (split_sentences corpus)
"Look, Ma!" "The quick brown Mr. T. rex swallowed the lazy dog." "It really did!"
Kiss, Tibor, and Jan Strunk. "Unsupervised multilingual sentence boundary detection." Computational Linguistics 32.4 (2006): 485-525.
- modularize tokenization
- custom tokenization rules
- needs to go fasterer