unicode-transforms alternatives and similar packages
Based on the "Data" category.
Alternatively, view unicode-transforms alternatives based on common mentions on social networks and blogs.
-
lens
Lenses, Folds, and Traversals - Join us on web.libera.chat #haskell-lens -
semantic-source
Parsing, analyzing, and comparing source code across many languages -
text
Haskell library for space- and time-efficient operations over Unicode text. -
code-builder
Packages for defining APIs, running them, generating client code and documentation. -
compendium-client
Mu (μ) is a purely functional framework for building micro services. -
cassava
A CSV parsing and encoding library optimized for ease of use and high performance -
unordered-containers
Efficient hashing-based container types -
holmes
A reference library for constraint-solving with propagators and CDCL. -
hashable
A class for types that can be converted to a hash value -
resource-pool
A high-performance striped resource pooling implementation for Haskell -
primitive
This package provides various primitive memory-related operations. -
binary
Efficient, pure binary serialisation using ByteStrings in Haskell. -
discrimination
Fast linear time sorting and discrimination for a large class of data types -
json-autotype
Automatic Haskell type inference from JSON input -
hashtables
Mutable hash tables for Haskell, in the ST monad -
IORefCAS
A collection of different packages for CAS based data structures. -
audiovisual
Extensible records, variants, structs, effects, tangles -
dependent-sum
Dependent sums and supporting typeclasses for comparing and displaying them -
reflection
Reifies arbitrary Haskell terms into types that can be reflected back into terms -
dependent-map
Dependently-typed finite maps (partial dependent products) -
safecopy
An extension to Data.Serialize with built-in version control -
orgmode-parse
Attoparsec parser combinators for parsing org-mode structured text! -
scientific
Arbitrary-precision floating-point numbers represented using scientific notation -
bifunctors
Haskell 98 bifunctors, bifoldables and bitraversables -
streaming
An optimized general monad transformer for streaming applications, with a simple prelude of functions -
protobuf
An implementation of Google's Protocol Buffers in Haskell. -
text-icu
This package provides the Haskell Data.Text.ICU library, for performing complex manipulation of Unicode text. -
uuid-types
A Haskell library for creating, printing and parsing UUIDs
InfluxDB - Power Real-Time Data Analytics at Scale
Do you think we are missing an alternative of unicode-transforms or a related project?
README
Unicode Transforms
Fast Unicode 13.0.0 normalization in Haskell (NFC, NFKC, NFD, NFKD).
What is normalization?
Unicode characters with adornments (e.g. Á) can be represented in two different forms, as a single composed character (U+00C1 = Á) or as multiple decomposed characters (U+0041(A) U+0301( ́ ) = Á). They are differently encoded byte sequences but for humans they have exactly the same visual appearance.
A regular byte comparison may tell that two strings are different even though
they might be equivalent. We need to convert both the strings in a
normalized
form using the Unicode
Character Database before we can
compare them for equivalence. For example:
>> import Data.Text.Normalize
>> normalize NFC "\193" == normalize NFC "\65\769"
True
Performance
Normalization performance comparison of this package (v0.3.7) with
the text-icu package
using the ICU C++ library
version ICU4C 65.1 on macOS. The benchmarks compare the time taken in
milliseconds to normalize files in different languages and normalization
forms using both the packages. In most cases unicode-transforms
outperforms ICU.
Benchmark unicode-transforms(ms) ICU(ms) % Diff
--------------- ---------------------- ------- --------
NFKD/Korean 7.78 37.10 +376.87
NFD/Korean 7.86 37.06 +371.50
NFKD/Vietnamese 6.85 12.48 +82.20
NFKD/Deutsch 2.17 3.55 +63.30
NFKD/English 1.71 2.78 +62.30
NFKC/Korean 4.77 7.65 +60.28
NFD/Deutsch 2.24 3.53 +57.41
NFD/English 1.76 2.77 +57.32
NFC/Vietnamese 10.66 16.63 +56.00
NFKC/Vietnamese 10.95 16.58 +51.43
NFD/Devanagari 6.48 8.68 +34.10
NFC/Devanagari 6.77 8.49 +25.48
NFD/AllChars 6.18 7.41 +19.91
NFD/Japanese 7.80 9.20 +17.99
NFKC/Devanagari 7.33 8.48 +15.74
NFKD/Japanese 8.71 10.05 +15.39
NFD/Vietnamese 5.94 6.83 +14.99
NFKD/Devanagari 7.59 8.68 +14.27
NFKD/AllChars 9.80 10.66 +8.82
NFKC/Deutsch 3.21 3.18 -0.72
NFC/Korean 4.62 4.38 -5.35
NFKC/English 2.21 2.06 -6.88
NFC/English 2.19 2.04 -7.21
NFKC/AllChars 14.67 9.75 -50.51
NFC/Deutsch 3.02 1.95 -54.39
NFKC/Japanese 12.46 5.42 -129.93
NFC/AllChars 9.72 3.58 -171.63
NFC/Japanese 11.90 3.04 -292.04
Talks
Contributing
Please use https://github.com/harendra-kumar/unicode-transforms to raise issues, or send pull requests.