Popularity

6.3

Growing

Activity

0.0

Stable

Stars 15

Watchers 4

Forks 3

Last Commit over 4 years ago

Monthly Downloads: 27

Programming language: Haskell

License: BSD 3-clause "New" or "Revised" License

Tags: Natural Language Processing

nerf alternatives and similar packages

Based on the "Natural Language Processing" category.
Alternatively, view nerf alternatives based on common mentions on social networks and blogs.

chatter

8.9 2.3 nerf VS chatter

A library of Natural Language Processing algorithms for Haskell.
numerals

7.9 0.0 nerf VS numerals

Convert numbers to number words

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

mecab

7.9 0.0 nerf VS mecab

A Haskell binding to MeCab
punkt

7.0 0.0 nerf VS punkt

Unsupervised multilingual sentence segmentation.
concraft-pl

6.3 0.0 nerf VS concraft-pl

A morphosyntactic tagger for Polish based on conditional random fields
cndict

6.2 0.0 nerf VS cndict

Chinese/Mandarin <-> English dictionary, Chinese lexer.
hext

5.8 0.7 nerf VS hext

a text classification library
concraft

5.1 0.0 nerf VS concraft

A morphosyntactic disambiguation library based on constrained conditional random fields
partage

4.8 0.0 nerf VS partage

A* parser for tree adjoining grammars
minimorph

4.4 0.0 nerf VS minimorph

English spelling functions with an emphasis on simplicity. Originally by https://github.com/kowey.
tsuntsun

3.9 0.0 nerf VS tsuntsun

Interacts with tesseract to ease reading of RAW Japanese manga.
PTQ

3.7 0.0 nerf VS PTQ

An implementation of Montague's PTQ (Proper Treatment of Quantification).
corenlp-parser

3.4 0.0 nerf VS corenlp-parser

Launches CoreNLP and parses the JSON output
numerals-base

3.4 0.0 nerf VS numerals-base

Convert numbers to number words
hist-pl

2.8 0.0 nerf VS hist-pl

Programs and libraries related to the historical dictionary of Polish
polh-lexicon

2.8 0.0 nerf VS polh-lexicon

Programs and libraries related to the historical dictionary of Polish
sentiwordnet-parser

2.8 0.0 nerf VS sentiwordnet-parser

Parser for the [SentiWordNet](http://sentiwordnet.isti.cnr.it/) tab-separated file
data-named

2.4 0.0 nerf VS data-named

Named entity data layer
haskell-postal

2.4 0.0 nerf VS haskell-postal

Haskell binding for the libpostal library
crf-chain2-tiers

2.4 0.0 nerf VS crf-chain2-tiers

Second-order, tiered, constrained, linear conditional random fields
adict

2.2 0.0 nerf VS adict

Approximate dictionary searching Haskell library
phonetic-code

2.2 0.0 nerf VS phonetic-code

phonetic codes in Haskell
moan

1.0 0.0 nerf VS moan

Language-agnostic analyzer for positional morphosyntactic descriptors
concraft-hr

1.0 0.0 nerf VS concraft-hr

A part-of-speech tagger for Croatian based on the concraft library.
ENIG

1.0 0.0 nerf VS ENIG

Korean postposition particle selector
penntreebank-megaparsec

1.0 0.0 nerf VS penntreebank-megaparsec

Megaparsec parsers for trees in the Penn Treebank format
arpa

- - nerf VS arpa

Library for reading ARPA n-gram models

Do you think we are missing an alternative of nerf or a related project?

Add another 'Natural Language Processing' Package

README

Nerf

Nerf is a statistical named entity recognition (NER) tool based on linear-chain conditional random fields (CRFs). It has been adapted to recognize tree-like structures of NEs (i.e., with recursively embedded NEs) by using the joined label tagging method which -- for a particular sentence -- works as follows:

CRF model is used to determine the most probable sequence of labels,
Extended IOB method is used to decode the sequence into a forerst of NEs.

The extended IOB method also provides the inverse encoding function which is needed during the model training.

Installation

It is recommanded to install nerf using the Haskell Tool Stack, which you will need to downoload and install on your machine beforehand. Then clone this repository into a local directory and use stack to install the library by running:

stack install

Data formats

The only data encoding supported by Nerf is UTF-8.

Training data

The current version of Nerf works with a simple data format in which:

Each sentence is kept in a separate line,
Named entities are represented with embedded beginning and ending tags,
Contents of individual tags represent named entity types.

For example:

<organization>Church of the <deity>Flying Spaghetti Monster</deity></organization> .

Text and label values should be escaped by prepending the \ character before special >, <, \ and (space) characters.

Have a look in the example directory for an example of a file in the appropriate format.

NER input data

Below is a list of data formats supported within the NER mode.

Raw text

Nerf can be used to annotate raw text with named entites. The annotated data will be presented in the format which is also used for training and has already been described above. Each sentence should be supplied in a separate line -- currently, Nerf doesn't perform any sentence-level segmentation.

XCES format

It is also possible to annotate data stored in the XCES format.

Training

Once you have an annotated data file train.nes (and, optionally, an evaluation material eval.nes) conformant with the format described above you can train the Nerf model using the following command:

nerf train train.nes -e eval.nes -o model.bin

Run nerf train --help to learn more about the program arguments and possible training options.

The nerf tool can be also supplied with additional runtime system options. For example, to train the model using four threads, use:

nerf train train.nes -e eval.nes -o model.bin +RTS -N4

WARNING: Currently, the -N runtime option sometimes leads to errors in the training process and therefore should be avoided for the time being.

Dictionaries

Nerf supports a list of NE-related dictionaries:

To use the particular dictionary during NER you have to supply it as a command line argument during the training process, for example:

nerf train train.nes --polimorf PoliMorf-0.6.1.tab

Named entity recognition

To annotate the input.txt data file using the trained model.bin model, run:

nerf ner model.bin < input.txt

Annotated data will be printed to stdout. Data formats currently supported within the NER mode has been described above. Run nerf ner --help to learn more about the additional NER arguments.

Server

Nerf provides also a client/server mode. It is handy when, for example, you need to annotate a large collection of small files. Loading Nerf model from a disk takes considerable amount of time which makes the tagging method described above very slow in such a setting.

To start the Nerf server, run:

nerf server model.bin

You can supply a custom port number using a --port option. For example, to run the server on the 10101 port, use the following command:

nerf server model.bin --port 10101

To use the server in a multi-threaded environment, you need to specify the -N RTS option. A set of options which usually yield good server performance is presented in the following example:

nerf server model.bin +RTS -N -A4M -qg1 -I0

Run nerf server --help to learn more about possible server-mode options.

The client mode works just like the tagging mode. The only difference is that, instead of supplying your client with a model, you need to specify the port number (in case you used a custom one when starting the server; otherwise, the default port number will be used).

nerf client --port 10101 < input.txt > output.nes

Run nerf client --help to learn more about the possible client-mode options.

nerf

Named entity recognition tool based on linear-chain CRFs