Monthly Downloads: 9
Programming language: Haskell
License: BSD 3-clause "New" or "Revised" License
Tags: Control     Distributed    

distributed-fork alternatives and similar packages

Based on the "distributed" category.
Alternatively, view distributed-fork alternatives based on common mentions on social networks and blogs.

Do you think we are missing an alternative of distributed-fork or a related project?

Add another 'distributed' Package



CI Status

A distributed data processing framework in pure Haskell. Inspired by Apache Spark.



This package provides a Dataset type which lets you express and execute transformations on a distributed multiset. Its API is highly inspired by Apache Spark.

It uses pluggable Backends for spawning executors and ShuffleStores for exchanging information. See 'distributed-dataset-aws' for an implementation using AWS Lambda and S3.

It also exposes a more primitive Control.Distributed.Fork module which lets you run IO actions remotely. It is especially useful when your task is embarrassingly parallel.


This package provides a backend for 'distributed-dataset' using AWS services. Currently it supports running functions on AWS Lambda and using an S3 bucket as a shuffle store.


Provides Dataset's reading from public open datasets. Currently it can fetch GitHub event data from GH Archive.

Running the example

  • Clone the repository.
  $ git clone https://github.com/utdemir/distributed-dataset
  $ cd distributed-dataset
  $ aws configure
  • Create an S3 bucket to put the deployment artifact in. You can use the console or the CLI:
  $ aws s3api create-bucket --bucket my-s3-bucket
  • Build an run the example:

    • If you use Nix on Linux:
    • (Recommended) Use my binary cache on Cachix to reduce compilation times:
    nix-env -i cachix # or your preferred installation method
    cachix use utdemir
    • Then:
      $ nix run -f ./default.nix example-gh -c example-gh my-s3-bucket
    • If you use stack (requires Docker, works on Linux and MacOS):
      $ stack run --docker-mount $HOME/.aws/ --docker-env HOME=$HOME example-gh my-s3-bucket


Experimental. Expect lots of missing features, bugs, instability and API changes. You will probably need to modify the source if you want to do anything serious. See issues.


I am open to contributions; any issue, PR or opinion is more than welcome.

  • In order to develop distributed-dataset, you can use;
    • On Linux: Nix, cabal-install or stack.
    • On MacOS: stack with docker.
  • Use ormolu to format source code.


  • You can use my binary cache on cachix so that you don't recompile half of the Hackage.
  • nix-shell will drop you into a shell with ormolu, cabal-install and steeloverseer alongside with all required haskell and system dependencies. You can use cabal new-* commands there.
  • Easiest way to get a development environment would be to run sos at the top level directory inside of a nix-shell.


  • Make sure that you have Docker installed.
  • Use stack as usual, it will automatically use a Docker image
  • Run ./make.sh stack-build before you send a PR to test different resolvers.

Related Work