Popularity

9.7

Stable

Activity

6.1

Stars 295

Watchers 18

Forks 41

Last Commit 6 months ago

Monthly Downloads: 108

Programming language: Haskell

License: BSD 3-clause "New" or "Revised" License

Tags: Data

Latest version: v0.7.0

Frames alternatives and similar packages

Based on the "Data" category.
Alternatively, view Frames alternatives based on common mentions on social networks and blogs.

lens

10.0 6.8 Frames VS lens

Lenses, Folds, and Traversals - Join us on web.libera.chat #haskell-lens
semantic-source

10.0 9.1 Frames VS semantic-source

Parsing, analyzing, and comparing source code across many languages

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

hnix

9.9 6.8 Frames VS hnix

A Haskell re-implementation of the Nix expression language
code-builder

9.8 0.0 Frames VS code-builder

Packages for defining APIs, running them, generating client code and documentation.
text

9.8 8.4 Frames VS text

Haskell library for space- and time-efficient operations over Unicode text.
compendium-client

9.7 0.0 Frames VS compendium-client

Mu (μ) is a purely functional framework for building micro services.
massiv

9.7 5.9 Frames VS massiv

Efficient Haskell Arrays featuring Parallel computation
unordered-containers

9.7 5.0 Frames VS unordered-containers

Efficient hashing-based container types
cassava

9.7 4.7 Frames VS cassava

A CSV parsing and encoding library optimized for ease of use and high performance
holmes

9.6 0.0 Frames VS holmes

A reference library for constraint-solving with propagators and CDCL.
resource-pool

9.5 0.0 Frames VS resource-pool

A high-performance striped resource pooling implementation for Haskell
hashable

9.5 5.7 Frames VS hashable

A class for types that can be converted to a hash value
primitive

9.5 5.9 Frames VS primitive

This package provides various primitive memory-related operations.
binary

9.5 4.3 Frames VS binary

Efficient, pure binary serialisation using ByteStrings in Haskell.
alfred-margaret

9.5 3.5 Frames VS alfred-margaret

Fast Aho-Corasick string searching
refined

9.5 1.5 Frames VS refined

Refinement types with static checking
critbit

9.5 0.0 Frames VS critbit

A Haskell implementation of crit-bit trees.
network-msgpack-rpc

9.4 Frames VS network-msgpack-rpc

A MessagePack-RPC Implementation
jump

9.4 0.0 Frames VS jump

Jump start your Haskell development
aeson-qq

9.4 3.6 Frames VS aeson-qq

JSON quasiquoter for Haskell
diskhash

9.4 0.0 Frames VS diskhash

Diskbased (persistent) hashtable
higgledy

9.4 2.2 Frames VS higgledy

Higher-kinded data via generics
json-autotype

9.4 0.0 Frames VS json-autotype

Automatic Haskell type inference from JSON input
discrimination

9.4 3.1 Frames VS discrimination

Fast linear time sorting and discrimination for a large class of data types
hashtables

9.4 1.0 Frames VS hashtables

Mutable hash tables for Haskell, in the ST monad
caledon

9.4 0.0 Frames VS caledon

higher order dependently typed logic programing
data-msgpack

9.4 Frames VS data-msgpack

A Haskell implementation of MessagePack
IORefCAS

9.3 3.5 Frames VS IORefCAS

A collection of different packages for CAS based data structures.
dependent-map

9.3 0.0 Frames VS dependent-map

Dependently-typed finite maps (partial dependent products)
reflection

9.3 4.8 Frames VS reflection

Reifies arbitrary Haskell terms into types that can be reflected back into terms
audiovisual

9.3 3.5 Frames VS audiovisual

Extensible records, variants, structs, effects, tangles
dependent-sum

9.3 4.3 Frames VS dependent-sum

Dependent sums and supporting typeclasses for comparing and displaying them
cereal

9.3 0.0 Frames VS cereal

A binary serialization library
certificate

9.3 0.0 Frames VS certificate

Certificate and Key Reader/Writer in haskell
safecopy

9.2 3.1 Frames VS safecopy

An extension to Data.Serialize with built-in version control
rei

9.2 0.0 Frames VS rei

Process lists easily
scientific

9.2 0.0 Frames VS scientific

Arbitrary-precision floating-point numbers represented using scientific notation
orgmode-parse

9.2 0.0 Frames VS orgmode-parse

Attoparsec parser combinators for parsing org-mode structured text!
bifunctors

9.2 5.6 Frames VS bifunctors

Haskell 98 bifunctors, bifoldables and bitraversables
streaming

9.2 0.0 Frames VS streaming

An optimized general monad transformer for streaming applications, with a simple prelude of functions
avro

9.2 4.2 Frames VS avro

Haskell Avro Encoding and Decoding Native Support (no RPC)
protobuf

9.2 2.6 Frames VS protobuf

An implementation of Google's Protocol Buffers in Haskell.
witherable

9.1 6.4 Frames VS witherable

Filter with effects
b-tree

9.1 1.8 Frames VS b-tree

Haskell on-disk B* tree implementation
uuid

9.1 2.9 Frames VS uuid

A Haskell library for creating, printing and parsing UUIDs
stdio

9.1 1.8 Frames VS stdio

Haskell Standard Input and Output
tables

9.1 0.0 Frames VS tables

Deprecated because of
typerep-map

9.1 2.1 Frames VS typerep-map

⚡️Efficient implementation of Map with types as keys
uuid-types

9.1 2.9 Frames VS uuid-types

A Haskell library for creating, printing and parsing UUIDs
text-icu

9.1 6.1 Frames VS text-icu

This package provides the Haskell Data.Text.ICU library, for performing complex manipulation of Unicode text.

Do you think we are missing an alternative of Frames or a related project?

Add another 'Data' Package

Popular Comparisons

README

Frames

Data Frames for Haskell

User-friendly, type safe, runtime efficient tooling for working with tabular data deserialized from comma-separated values (CSV) files. The type of each row of data is inferred from data, which can then be streamed from disk, or worked with in memory.

We provide streaming and in-memory interfaces for efficiently working with datasets that can be safely indexed by column names found in the data files themselves. This type safety of column access and manipulation is checked at compile time.

Use Cases

For a running example, we will use variations of the prestige.csv data set. Each row includes 7 columns, but we just want to compute the average ratio of income to prestige.

Clean Data

If you have a CSV data where the values of each column may be classified by a single type, and ideally you have a header row giving each column a name, you may simply want to avoid writing out the Haskell type corresponding to each row. Frames provides TemplateHaskell machinery to infer a Haskell type for each row of your data set, thus preventing the situation where your code quietly diverges from your data.

We generate a collection of definitions generated by inspecting the data file at compile time (using tableTypes), then, at runtime, load that data into column-oriented storage in memory (an in-core array of structures (AoS)). We're going to compute the average ratio of two columns, so we'll use the foldl library. Our fold will project the columns we want, and apply a function that divides one by the other after appropriate numeric type conversions. Here is the entirety of that program.

{-# LANGUAGE DataKinds, FlexibleContexts, QuasiQuotes, TemplateHaskell #-}
module UncurryFold where
import qualified Control.Foldl as L
import Data.Vinyl (rcast)
import Data.Vinyl.Curry (runcurryX)
import Frames

-- Data set from http://vincentarelbundock.github.io/Rdatasets/datasets.html
tableTypes "Row" "test/data/prestige.csv"

loadRows :: IO (Frame Row)
loadRows = inCoreAoS (readTable "test/data/prestige.csv")

-- | Compute the ratio of income to prestige for a record containing
-- only those fields.
ratio :: Record '[Income, Prestige] -> Double
ratio = runcurryX (\i p -> fromIntegral i / p)

averageRatio :: IO Double
averageRatio = L.fold (L.premap (ratio . rcast) avg) <$> loadRows
  where avg = (/) <$> L.sum <*> L.genericLength

Missing Header Row

Now consider a case where our data file lacks a header row (I deleted the first row from `prestige.csv`). We will provide our own name for the generated row type, our own column names, and, for the sake of demonstration, we will also specify a prefix to be added to every column-based identifier (particularly useful if the column names do come from a header row, and you want to work with multiple CSV files some of whose column names coincide). We customize behavior by updating whichever fields of the record produced by rowGen we care to change, passing the result to tableTypes'. Link to code.

{-# LANGUAGE DataKinds, FlexibleContexts, QuasiQuotes, TemplateHaskell #-}
module UncurryFoldNoHeader where
import qualified Control.Foldl as L
import Data.Vinyl (rcast)
import Data.Vinyl.Curry (runcurryX)
import Frames
import Frames.TH (rowGen, RowGen(..))

-- Data set from http://vincentarelbundock.github.io/Rdatasets/datasets.html
tableTypes' (rowGen "test/data/prestigeNoHeader.csv")
            { rowTypeName = "NoH"
            , columnNames = [ "Job", "Schooling", "Money", "Females"
                            , "Respect", "Census", "Category" ]
            , tablePrefix = "NoHead"}

loadRows :: IO (Frame NoH)
loadRows = inCoreAoS (readTableOpt noHParser "test/data/prestigeNoHeader.csv")

-- | Compute the ratio of money to respect for a record containing
-- only those fields.
ratio :: Record '[NoHeadMoney, NoHeadRespect] -> Double
ratio = runcurryX (\m r -> fromIntegral m / r)

averageRatio :: IO Double
averageRatio = L.fold (L.premap (ratio . rcast) avg) <$> loadRows
  where avg = (/) <$> L.sum <*> L.genericLength

Missing Data

Sometimes not every row has a value for every column. I went ahead and blanked the prestige column of every row whose type column was NA in prestige.csv. For example, the first such row now reads,

"athletes",11.44,8206,8.13,,3373,NA

We can no longer parse a Double for that row, so we will work with row types parameterized by a Maybe type constructor. We are substantially filtering our data, so we will perform this operation in a streaming fashion without ever loading the entire table into memory. Our process will be to check if the prestige column was parsed, only keeping those rows for which it was not, then project the income column from those rows, and finally throw away Nothing elements. Link to code.

{-# LANGUAGE DataKinds, FlexibleContexts, QuasiQuotes, TemplateHaskell, TypeApplications, TypeOperators #-}
module UncurryFoldPartialData where
import qualified Control.Foldl as L
import Data.Maybe (isNothing)
import Data.Vinyl.XRec (toHKD)
import Frames
import Pipes (Producer, (>->))
import qualified Pipes.Prelude as P

-- Data set from http://vincentarelbundock.github.io/Rdatasets/datasets.html
-- The prestige column has been left blank for rows whose "type" is
-- listed as "NA".
tableTypes "Row" "test/data/prestigePartial.csv"

-- | A pipes 'Producer' of our 'Row' type with a column functor of
-- 'Maybe'. That is, each element of each row may have failed to parse
-- from the CSV file.
maybeRows :: MonadSafe m => Producer (Rec (Maybe :. ElField) (RecordColumns Row)) m ()
maybeRows = readTableMaybe "test/data/prestigePartial.csv"

-- | Return the number of rows with unknown prestige, and the average
-- income of those rows.
incomeOfUnknownPrestige :: IO (Int, Double)
incomeOfUnknownPrestige =
  runSafeEffect . L.purely P.fold avg $
    maybeRows >-> P.filter prestigeUnknown >-> P.map getIncome >-> P.concat
  where avg = (\s l -> (l, s / fromIntegral l)) <$> L.sum <*> L.length
        getIncome = fmap fromIntegral . toHKD . rget @Income
        prestigeUnknown :: Rec (Maybe :. ElField) (RecordColumns Row) -> Bool
        prestigeUnknown = isNothing . toHKD . rget @Prestige

Tutorial

For comparison to working with data frames in other languages, see the tutorial.

Demos

There are various demos in the repository. Be sure to run the getdata build target to download the data files used by the demos! You can also download the data files manually and put them in a data directory in the directory from which you will be running the executables.

Benchmarks

The benchmark shows several ways of dealing with data when you want to perform multiple traversals.

Another demo shows how to fuse multiple passes into one so that the full data set is never resident in memory. A Pandas version of a similar program is also provided for comparison.

This is a trivial program, but shows that performance is comparable to Pandas, and the memory savings of a compiled program are substantial.