Popularity
9.2
Stable
Activity
0.0
Stable
114
13
5

Monthly Downloads: 17
Programming language: Haskell
License: BSD 3-clause "New" or "Revised" License
Tags: Control     Distributed    

distributed-fork alternatives and similar packages

Based on the "distributed" category.
Alternatively, view distributed-fork alternatives based on common mentions on social networks and blogs.

Do you think we are missing an alternative of distributed-fork or a related project?

Add another 'distributed' Package

README

distributed-dataset

CI Status

A distributed data processing framework in pure Haskell. Inspired by Apache Spark.

Packages

distributed-dataset

This package provides a Dataset type which lets you express and execute transformations on a distributed multiset. Its API is highly inspired by Apache Spark.

It uses pluggable Backends for spawning executors and ShuffleStores for exchanging information. See 'distributed-dataset-aws' for an implementation using AWS Lambda and S3.

It also exposes a more primitive Control.Distributed.Fork module which lets you run IO actions remotely. It is especially useful when your task is embarrassingly parallel.

distributed-dataset-aws

This package provides a backend for 'distributed-dataset' using AWS services. Currently it supports running functions on AWS Lambda and using an S3 bucket as a shuffle store.

distributed-dataset-opendatasets

Provides Dataset's reading from public open datasets. Currently it can fetch GitHub event data from GH Archive.

Running the example

  • Clone the repository.
  $ git clone https://github.com/utdemir/distributed-dataset
  $ cd distributed-dataset
  $ aws configure
  • Create an S3 bucket to put the deployment artifact in. You can use the console or the CLI:
  $ aws s3api create-bucket --bucket my-s3-bucket
  • Build an run the example:

    • If you use Nix on Linux:
    • (Recommended) Use my binary cache on Cachix to reduce compilation times:
    nix-env -i cachix # or your preferred installation method
    cachix use utdemir
    
    • Then:
      $ nix run -f ./default.nix example-gh -c example-gh my-s3-bucket
    
    • If you use stack (requires Docker, works on Linux and MacOS):
      $ stack run --docker-mount $HOME/.aws/ --docker-env HOME=$HOME example-gh my-s3-bucket
    

Stability

Experimental. Expect lots of missing features, bugs, instability and API changes. You will probably need to modify the source if you want to do anything serious. See issues.

Contributing

I am open to contributions; any issue, PR or opinion is more than welcome.

  • In order to develop distributed-dataset, you can use;
    • On Linux: Nix, cabal-install or stack.
    • On MacOS: stack with docker.
  • Use ormolu to format source code.

Nix

  • You can use my binary cache on cachix so that you don't recompile half of the Hackage.
  • nix-shell will drop you into a shell with ormolu, cabal-install and steeloverseer alongside with all required haskell and system dependencies. You can use cabal new-* commands there.
  • Easiest way to get a development environment would be to run sos at the top level directory inside of a nix-shell.

Stack

  • Make sure that you have Docker installed.
  • Use stack as usual, it will automatically use a Docker image
  • Run ./make.sh stack-build before you send a PR to test different resolvers.

Related Work

Papers

Projects