Popularity

9.4

Stable

Activity

0.0

Declining

Stars 154

Watchers 10

Forks 25

Last Commit over 1 year ago

Monthly Downloads: 32

Programming language: C

License: MIT License

Tags: Data

diskhash alternatives and similar packages

Based on the "Data" category.
Alternatively, view diskhash alternatives based on common mentions on social networks and blogs.

semantic-source

10.0 9.1 diskhash VS semantic-source

Parsing, analyzing, and comparing source code across many languages
lens

10.0 6.8 diskhash VS lens

Lenses, Folds, and Traversals - Join us on web.libera.chat #haskell-lens

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

hnix

9.9 6.8 diskhash VS hnix

A Haskell re-implementation of the Nix expression language
code-builder

9.8 0.0 diskhash VS code-builder

Packages for defining APIs, running them, generating client code and documentation.
text

9.8 8.4 diskhash VS text

Haskell library for space- and time-efficient operations over Unicode text.
massiv

9.7 5.9 diskhash VS massiv

Efficient Haskell Arrays featuring Parallel computation
compendium-client

9.7 0.0 diskhash VS compendium-client

Mu (μ) is a purely functional framework for building micro services.
cassava

9.7 4.7 diskhash VS cassava

A CSV parsing and encoding library optimized for ease of use and high performance
unordered-containers

9.7 5.0 diskhash VS unordered-containers

Efficient hashing-based container types
Frames

9.7 6.1 diskhash VS Frames

Data frames for tabular data.
holmes

9.6 0.0 diskhash VS holmes

A reference library for constraint-solving with propagators and CDCL.
alfred-margaret

9.5 3.5 diskhash VS alfred-margaret

Fast Aho-Corasick string searching
hashable

9.5 5.7 diskhash VS hashable

A class for types that can be converted to a hash value
resource-pool

9.5 0.0 diskhash VS resource-pool

A high-performance striped resource pooling implementation for Haskell
critbit

9.5 0.0 diskhash VS critbit

A Haskell implementation of crit-bit trees.
primitive

9.5 5.9 diskhash VS primitive

This package provides various primitive memory-related operations.
refined

9.5 1.5 diskhash VS refined

Refinement types with static checking
binary

9.5 4.3 diskhash VS binary

Efficient, pure binary serialisation using ByteStrings in Haskell.
caledon

9.4 0.0 diskhash VS caledon

higher order dependently typed logic programing
data-msgpack

9.4 diskhash VS data-msgpack

A Haskell implementation of MessagePack
jump

9.4 0.0 diskhash VS jump

Jump start your Haskell development
discrimination

9.4 3.1 diskhash VS discrimination

Fast linear time sorting and discrimination for a large class of data types
higgledy

9.4 2.2 diskhash VS higgledy

Higher-kinded data via generics
aeson-qq

9.4 3.6 diskhash VS aeson-qq

JSON quasiquoter for Haskell
network-msgpack-rpc

9.4 diskhash VS network-msgpack-rpc

A MessagePack-RPC Implementation
json-autotype

9.4 0.0 diskhash VS json-autotype

Automatic Haskell type inference from JSON input
hashtables

9.4 1.0 diskhash VS hashtables

Mutable hash tables for Haskell, in the ST monad
dependent-map

9.3 0.0 diskhash VS dependent-map

Dependently-typed finite maps (partial dependent products)
reflection

9.3 4.8 diskhash VS reflection

Reifies arbitrary Haskell terms into types that can be reflected back into terms
audiovisual

9.3 3.5 diskhash VS audiovisual

Extensible records, variants, structs, effects, tangles
IORefCAS

9.3 3.5 diskhash VS IORefCAS

A collection of different packages for CAS based data structures.
dependent-sum

9.3 4.3 diskhash VS dependent-sum

Dependent sums and supporting typeclasses for comparing and displaying them
cereal

9.3 0.0 diskhash VS cereal

A binary serialization library
certificate

9.3 0.0 diskhash VS certificate

Certificate and Key Reader/Writer in haskell
protobuf

9.2 2.6 diskhash VS protobuf

An implementation of Google's Protocol Buffers in Haskell.
safecopy

9.2 3.1 diskhash VS safecopy

An extension to Data.Serialize with built-in version control
orgmode-parse

9.2 0.0 diskhash VS orgmode-parse

Attoparsec parser combinators for parsing org-mode structured text!
streaming

9.2 0.0 diskhash VS streaming

An optimized general monad transformer for streaming applications, with a simple prelude of functions
rei

9.2 0.0 diskhash VS rei

Process lists easily
bifunctors

9.2 5.6 diskhash VS bifunctors

Haskell 98 bifunctors, bifoldables and bitraversables
avro

9.2 4.2 diskhash VS avro

Haskell Avro Encoding and Decoding Native Support (no RPC)
scientific

9.2 0.0 diskhash VS scientific

Arbitrary-precision floating-point numbers represented using scientific notation
MemoTrie

9.1 2.7 diskhash VS MemoTrie

Trie-based memo functions
uuid

9.1 2.9 diskhash VS uuid

A Haskell library for creating, printing and parsing UUIDs
b-tree

9.1 1.8 diskhash VS b-tree

Haskell on-disk B* tree implementation
stdio

9.1 1.8 diskhash VS stdio

Haskell Standard Input and Output
uuid-types

9.1 2.9 diskhash VS uuid-types

A Haskell library for creating, printing and parsing UUIDs
text-icu

9.1 6.1 diskhash VS text-icu

This package provides the Haskell Data.Text.ICU library, for performing complex manipulation of Unicode text.
witherable

9.1 6.4 diskhash VS witherable

Filter with effects
typerep-map

9.1 2.1 diskhash VS typerep-map

⚡️Efficient implementation of Map with types as keys

Do you think we are missing an alternative of diskhash or a related project?

Add another 'Data' Package

Popular Comparisons

README

Disk-based hashtable

A simple disk-based hash table (i.e., persistent hash table).

It is a hashtable implemented on memory-mapped disk, so that it can be loaded with a single mmap() system call and used in memory directly (being as fast as an in-memory hashtable once it is loaded from disk).

The code is in C, wrappers are provided for Python, Haskell, and C++. The wrappers follow similar APIs with variations to accommodate the language specificity. They all use the same underlying code, so you can open a hashtable created in C from Haskell, modify it within your Haskell code, and later open the result in Python.

Cross-language functionality will only work for simple types where you can control their binary representation (64-bit integers, for example).

Reading does not touch the disk representation at all and, thus, can be done on top of read-only files or using multiple threads (and different processes will share the memory: the operating system does that for you). Writing or modifying values is, however, not thread-safe.

Examples

The following examples all create a hashtable to store longs (int64_t), then set the value associated with the key "key" to 9. In the current API, the maximum size of the keys needs to be pre-specified, which is the value 15 below.

Raw C

#include <stdio.h>
#include <inttypes.h>
#include "diskhash.h"

int main(void) {
    HashTableOpts opts;
    opts.key_maxlen = 15;
    opts.object_datalen = sizeof(int64_t);
    char* err = NULL;
    HashTable* ht = dht_open("testing.dht", opts, O_RDWR|O_CREAT, &err);
    if (!ht) {
        if (!err) err = "Unknown error";
        fprintf(stderr, "Failed opening hash table: %s.\n", err);
        return 1;
    }
    long i = 9;
    dht_insert(ht, "key", &i);

    long* val = (long*) dht_lookup(ht, "key");
    printf("Looked up value: %l\n", *val);

    dht_free(ht);
    return 0;
}

The C API relies on error codes and error strings (the &err argument above). The header file has decent documentation.

Haskell

In Haskell, you have different types/functions for read-write and read-only hashtables. Read-write operations are IO operations, read-only hashtables are pure.

Read write example:

import Data.DiskHash
import Data.Int
main = do
    ht <- htOpenRW "testing.dht" 15
    htInsertRW ht "key" (9 :: Int64)
    val <- htLookupRW "key" ht
    print val

Read only example (htLookupRO is pure in this case):

import Data.DiskHash
import Data.Int
main = do
    ht <- htOpenRO "testing.dht" 15
    let val :: Int64
        val = htLookupRO "key" ht
    print val

Python

Python's interface is based on the struct module. For example, 'll' refers to a pair of 64-bit ints (longs):

import diskhash

tb = diskhash.StructHash(
    fname="testing.dht", 
    keysize=15, 
    structformat='ll',  # store pairs of longs
    mode='rw',
)
value = [1, 2]  # pair of longs
tb.insert("key", *value)
print(tb.lookup("key"))

The Python interface is currently Python 3 only. Patches to extend it to 2.7 are welcome, but it's not a priority.

C++

In C++, a simple wrapper is defined, which provides a modicum of type-safety. You use the DiskHash<T> template. Additionally, errors are reported through exceptions (both std::bad_alloc and std::runtime_error can be thrown) and not return codes.

#include <iostream>
#include <string>

#include <diskhash.hpp>

int main() {
    const int key_maxlen = 15;
    dht::DiskHash<uint64_t> ht("testing.dht", key_maxlen, dht::DHOpenRW);
    std::string line;
    uint64_t ix = 0;
    while (std::getline(std::cine, line)) {
        if (line.length() > key_maxlen) {
            std::cerr << "Key too long: '" << line << "'. Aborting.\n";
            return 2;
        }
        const bool inserted = ht.insert(line.c_str(), ix);
        if (!inserted) {
            std::cerr  << "Found repeated key '" << line << "' (ignored).\n";
        }
        ++ix;
    }
    return 0;
}

Stability

This is beta software. It is good enough that I am using it, but the API can change in the future with little warning. The binary format is versioned (the magic string encodes its version, so changes can be detected and you will get an error message in the future rather than some silent misbehaviour.

Automated unit testing ensures that basic mistakes will not go uncaught.

Limitations

You must specify the maximum key size. This can be worked around either by pre-hashing the keys (with a strong hash) or using multiple hash tables for different key sizes. Neither is currently implemented in diskhash.
You cannot delete objects. This was not a necessity for my uses, so it was not implemented. A simple implementation could be done by marking objects as "deleted" in place and recompacting when the hash table size changes or with an explicit dht_gc() call. It may also be important to add functionality to shrink hashtables so as to not waste disk space.
The algorithm is a rather naïve implementation of linear addression. It would not be hard to switch to robin hood hashing and this may indeed happen in the near future.

License: MIT

*Note that all licence references and agreements mentioned in the diskhash README section above are relevant to that project's source code only.

diskhash

Diskbased (persistent) hashtable