Popularity
8.8
Growing
Activity
7.3
Declining
63
8
5

Monthly Downloads: 118
Programming language: Haskell
License: BSD 3-clause "New" or "Revised" License
Tags: Control     Distributed    

distributed-fork alternatives and similar packages

Based on the "distributed" category

Do you think we are missing an alternative of distributed-fork or a related project?

Add another 'distributed' Package

README

distributed-dataset

CI Status

A distributed data processing framework in pure Haskell. Inspired by Apache Spark.

Packages

distributed-dataset

This package provides a Dataset type which lets you express and execute transformations on a distributed multiset. Its API is highly inspired by Apache Spark.

It uses pluggable Backends for spawning executors and ShuffleStores for exchanging information. See 'distributed-dataset-aws' for an implementation using AWS Lambda and S3.

It also exposes a more primitive Control.Distributed.Fork module which lets you run IO actions remotely. It is especially useful when your task is embarrassingly parallel.

distributed-dataset-aws

This package provides a backend for 'distributed-dataset' using AWS services. Currently it supports running functions on AWS Lambda and using an S3 bucket as a shuffle store.

distributed-dataset-opendatasets

Provides Dataset's reading from public open datasets. Currently it can fetch GitHub event data from GH Archive.

Running the example

  • Clone the repository.
  $ git clone https://github.com/utdemir/distributed-dataset
  $ cd distributed-dataset
  $ aws configure
  • Create an S3 bucket to put the deployment artifact in. You can use the console or the CLI:
  $ aws s3api create-bucket --bucket my-s3-bucket
  • Build an run the example:

    • If you use Nix on Linux:
    • (Recommended) Use my binary cache on Cachix to reduce compilation times:
    $(nix-build -A cachix https://cachix.org/api/v1/install)/bin/cachix use utdemir
    
    • Then:
      $ $(nix-build -A example-gh)/bin/example-gh my-s3-bucket
    
    • If you use stack (requires Docker, works on Linux and MacOS):
      $ stack run --docker-mount $HOME/.aws/ --docker-env HOME=$HOME example-gh my-s3-bucket
    

Stability

Experimental. Expect lots of missing features, bugs, instability and API changes. You will probably need to modify the source if you want to do anything serious. See issues.

Contributing

I am open to contributions; any issue, PR or opinion is more than welcome.

  • In order to develop distributed-dataset, you can use;
    • On Linux: Nix, cabal-install or stack.
    • On MacOS: stack with docker.
  • Use ormolu to format source code.

Nix

  • You can use my binary cache on cachix so that you don't recompile half of the Hackage.
  • nix-shell will drop you into a shell with ormolu, cabal-install, .ghcid alongside with all required haskell and system dependencies. You can use cabal new-* commands there.
  • There is a ./make.sh at the root folder with some utilities like formatting the source code or running ghcid, run ./make.sh --help to see the usage.

Stack

  • Make sure that you have Docker installed.
  • Use stack as usual, it will automatically use a Docker image
  • Run ./make.sh stack-build before you send a PR to test different resolvers.

Papers

Projects