hadoop-streaming alternatives and similar packages

Based on the "Cloud" category.
Alternatively, view hadoop-streaming alternatives based on common mentions on social networks and blogs.

distributed-process-tests

9.9 5.9 hadoop-streaming VS distributed-process-tests

Cloud Haskell core library
arion-compose

9.8 6.8 hadoop-streaming VS arion-compose

Run docker-compose with help from Nix/NixOS

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

sparkle

9.8 0.0 hadoop-streaming VS sparkle

Haskell on Apache Spark.
remote

9.7 0.0 hadoop-streaming VS remote

A distributed computing framework for Haskell
haskell-mpi

8.6 2.7 hadoop-streaming VS haskell-mpi

MPI bindings for Haskell
api-tools

8.3 4.0 hadoop-streaming VS api-tools

A Haskell embedded DSL for generating an API's JSON wrappers and documentation.
push-notify

8.1 0.0 hadoop-streaming VS push-notify

A server-side library in Haskell for sending push notifications to devices running different OS.
push-notify-ccs

8.1 0.0 hadoop-streaming VS push-notify-ccs

A server-side library in Haskell for sending push notifications to devices running different OS.
courier

7.9 0.0 hadoop-streaming VS courier

A message-passing library, intended for simplifying network applications
push-notify-apn

7.3 0.0 hadoop-streaming VS push-notify-apn

Send Push Notifications from haskell using the new HTTP2 API
distributed-process-task

7.3 0.0 hadoop-streaming VS distributed-process-task

Cloud Haskell Task Execution Framework
distributed-process-systest

7.0 3.7 hadoop-streaming VS distributed-process-systest

Testing Tools and Capabilities for Cloud Haskell
hlivy

7.0 0.0 hadoop-streaming VS hlivy

Haskell bindings to the Apache Livy REST API
mpi-hs

6.9 5.5 hadoop-streaming VS mpi-hs

MPI bindings for Haskell
distributed-process-zookeeper

6.2 0.0 hadoop-streaming VS distributed-process-zookeeper

A Zookeeper backend for Cloud Haskell.
cloud-seeder

5.7 0.0 hadoop-streaming VS cloud-seeder

a Haskell library for interacting with CloudFormation stacks
distributed-process-lifted

5.1 0.0 hadoop-streaming VS distributed-process-lifted

A generalization of distributed-process functions to a MonadProcess typeclass and standard transformer instances using monad-control and similar technique.
linode

4.1 0.0 hadoop-streaming VS linode

Haskell bindings to the Linode API
micro-gateway

3.9 0.0 hadoop-streaming VS micro-gateway

A Micro service gateway.
task-distribution

3.9 0.0 hadoop-streaming VS task-distribution

A framework for distributing Haskell tasks running on HDFS data using Cloud Haskell. The goal is speedup through distribution on clusters using regular hardware. This framework provides different, simple workarounds to transport new code to other cluster nodes.
grpc-etcd-client

2.8 0.0 hadoop-streaming VS grpc-etcd-client

Haskell etcd client using the gRPC binding
firebase-database

1.3 0.0 hadoop-streaming VS firebase-database

Firebase Database SDK (Haskell)
push-notify-general

- hadoop-streaming VS push-notify-general

A general library for sending/receiving push notif. through dif. services.

Do you think we are missing an alternative of hadoop-streaming or a related project?

Add another 'Cloud' Package

Popular Comparisons

README

A simple Hadoop streaming library based on conduit, useful for writing mapper and reducer logic in Haskell and running it on AWS Elastic MapReduce, Azure HDInsight, GCP Dataproc, and so forth.

Hackage: https://hackage.haskell.org/package/hadoop-streaming

Word Count Example

See the Haddock in HadoopStreaming.Text for a simple word-count example.

A Few Things to Note

ByteString vs Text

The HadoopStreaming module provides the general Mapper and Reducer data types, whose input and output types are abstract. They are usually instantiated with either ByteString or Text. ByteString is more suitable if the input/output needs to be decoded/encoded, for instance using the base64-bytestring library. On the other hand, Text could make more sense if decoding/encoding is not needed, or if the data is not UTF-8 encoded (see below regarding encodings). In general I'd imagine ByteString being used much more often than Text.

The HadoopStreaming.ByteString and HadoopStreaming.Text modules provide some utilities for working with ByteString and Text, respectively.

Encoding

It is highly recommended that your input data be UTF-8 encoded, as this is the default encoding Hadoop uses. If you must use other encodings such as UTF-16, keep in mind the following gotchas:

It is not enough that your code can work with the encoding you choose to use:
- By default, if any of your input files does not end with a UTF-8 representation of newline, i.e., a 0x0A byte, Hadoop streaming will add a 0x0A byte.
- Likewise, if any line in your mapper output does not contain a UTF-8 representation of tab (0x09), Hadoop streaming will add it at the end of the line.

This will almost certainly break your job. It may be possible to configure Hadoop streaming and tell it to use other encodings, so that the above behavior is consistent with the encoding you choose to use, but I don't know whether that is the case. I tried -D mapreduce.map.java.opts="-Dfile.encoding=UTF-16BE" but that doesn't seem to work.

If you use ByteString as the input type and use Data.ByteString.hGetLine to read lines from the input, be aware that Data.ByteString.hGetLine uses 0x0A bytes as line breaks, so it doesn't work properly for non-UTF-8 encoded input. For example, in UTF-16BE and UTF-16LE, the newline character is encoded as 0x00 0x0A and 0x0A 0x00, respectively.