Popularity

1.7

Stable

Activity

0.0

Stable

Stars 0

Watchers 3

Forks 0

Last Commit over 6 years ago

Monthly Downloads: 11

Programming language: Haskell

License: GNU General Public License v3.0 only

Tags: Bioinformatics Specific Industries

Latest version: v0.0.1.0

MutationOrder alternatives and similar packages

Based on the "Bioinformatics" category.
Alternatively, view MutationOrder alternatives based on common mentions on social networks and blogs.

hemokit

8.4 0.0 MutationOrder VS hemokit

Haskell library for the Emotiv EEG, inspired by the Emokit code
cobot

7.7 1.8 MutationOrder VS cobot

Computational biology toolkit to collaborate with researchers in constructive protein engineering

WorkOS - The modern identity platform for B2B SaaS

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

Promo workos.com

cobot-io

7.5 5.1 MutationOrder VS cobot-io

Biological data file formats and IO
hPDB

7.2 0.0 MutationOrder VS hPDB

PDB parser in Haskell
samtools

6.8 0.0 MutationOrder VS samtools

[Moved to: https://github.com/ingolia/SamTools]
RNAlien

6.6 0.0 MutationOrder VS RNAlien

RNAlien - unsupervised RNA family model construction
phybin

5.8 0.0 MutationOrder VS phybin

Binning (Newick) Phylogenetic Trees by Topology
bioinformatics-toolkit

5.7 0.0 MutationOrder VS bioinformatics-toolkit

A collection of bioinformatics algorithms
Genbank

5.6 0.0 MutationOrder VS Genbank

Genbank format tools and parser
BlastHTTP

5.3 0.0 MutationOrder VS BlastHTTP

Haskell cabal libary for submission and result retrieval from the NCBI Blast REST webservice
vcf

5.3 0.0 MutationOrder VS vcf

Haskell library to handle VCF (Variant Call Format) files
cmv

5.3 0.0 MutationOrder VS cmv

Visualize HMMs, CMs and their comparisons
FormalGrammars

5.1 0.0 MutationOrder VS FormalGrammars

Context-free and linear grammars in Haskell (parsing, pretty-printing, embedded DSL)
EntrezHTTP

5.0 0.0 MutationOrder VS EntrezHTTP

Haskell cabal libary for submission and result retrieval from the NCBI Entrez REST webservice
cobot-tools

5.0 3.6 MutationOrder VS cobot-tools

Biological data file formats and IO
SelectSequencesFromMSA

4.4 0.0 MutationOrder VS SelectSequencesFromMSA

Tool to select representative sequences from a multiple sequence alignment
TaxonomyTools

4.1 0.0 MutationOrder VS TaxonomyTools

Tools to process and visualize NCBI taxonomy data
Taxonomy

3.9 0.0 MutationOrder VS Taxonomy

Haskell cabal Taxonomy libary contains tools, parsers, datastructures and visualisation for the NCBI (National Center for Biotechnology Information) Taxonomy datasources.
ClustalParser

3.6 0.0 MutationOrder VS ClustalParser

Parse output of Clustal tools
memexml

3.4 0.0 MutationOrder VS memexml

Haskell cabal libary for parsing Meme motif finder xml output
BioHMM

3.4 0.0 MutationOrder VS BioHMM

Libary containing parsing and visualisation functions and datastructures for Hidden Markov Models in HMMER3 format.
StockholmAlignment

3.4 0.0 MutationOrder VS StockholmAlignment

Libary containing parsing and visualisation functions and datastructures for Stockholm aligmnent format.
ViennaRNAParser

3.4 0.0 MutationOrder VS ViennaRNAParser

Libary for parsing ViennaRNA package output
Forestry

2.2 0.0 MutationOrder VS Forestry

Science and craft of forests
seqloc

1.9 0.0 MutationOrder VS seqloc

Bio.SeqLoc
ADPfusionForest

1.9 0.0 MutationOrder VS ADPfusionForest

Dynamic programming on tree and forest structures
Gene-CluEDO

1.9 0.0 MutationOrder VS Gene-CluEDO

Gene Cluster Evolution Determined Order
rank-product

1.6 0.0 MutationOrder VS rank-product

Collects the functions pertaining to finding the rank product of a data set as well as the associated p-value.
uniprot-kb

1.3 0.0 MutationOrder VS uniprot-kb

UniProt-KB format parser
mmtf

0.6 0.0 MutationOrder VS mmtf

MMTF for Haskell
bio-sequence

- - MutationOrder VS bio-sequence

Initial project template from stack

Do you think we are missing an alternative of MutationOrder or a related project?

Add another 'Bioinformatics' Package

Popular Comparisons

README

Determine the most likely order of mutations from one RNA sequence to another.

Walter Costa, Maria Beatriz and Hoener zu Siederdissen, Christian and Tulpan, Dan and Stadler, Peter F. and Nowick, Katja  
*Uncovering the Structural Evolution of the Human Accelerated Region 1*  
2017, submitted  
[preprint](http://www.bioinf.uni-leipzig.de/~choener/pdfs/wal-hoe-2017.pdf)

General information

Given two RNA sequences, one ancestral, and one extant, we want to determine the most likely path of evolution under different measures of fitness.

This program produces the (i) maximum-likelihood path, (ii) all end probabilities, (iii) all start-end probabilities, (iv) all edge probabilities, and (v) the maximum expected accuracy path for these two RNA sequences.

In detail:
(i) gives the optimal path(s) for the fitness function
(ii) gives for each nucleotide polymorphism, how likely it is, that this mutation was introduced last
(iii) looks at all pairs of (first mutation, last mutation) and gives the probability that these two mutations are the begin and end of the chain of mutations
(iv) yields for all pairs of nodes (i -> j) the probability that this path occurs, over the whole ensemble of all possible paths
(v) produces the path of maximal weight using the probabilities produced in (iv)

Usage instructions

sequence generation

First, the sequence data base needs to be created. The following assumptions are being made:

chimp_118.fa is the origin sequence.
human_118.fa is the target sequence.
all known mutations are to be ordered.
One intermediate (or backmutation) is allowed. This will already lead to an expansion of the sequence space from ca. 250K sequences to 83.6M sequences! Use your local compute cluster or download our precalculated data.

The following command will prepare the working database and populate the seqs subdirectory.

mkdir workdb
mkdir workdb/seqs
mkdir workdb/rnafold
./MutationOrder gensequences -w workdb --ancestral chimp_118.fa -e human_118.fa -g 1 --sequencelimit 100000000 --alphabet=ACGT --seqsperfile=100000

example usage

We assume that you have two Fasta files, chimp_118.fa and human_118.fa but they can be named however is convenient. Each file has to contain exactly one sequence and both sequences have to be of the same length.

For testing with chimp and human, the provided chimp-human.json.gz database should be used, otherwise the initial foldings will be recalculated. All required files are available under 'Binaries' at the bottom of the page.

In case, you don't want or can't use the provided work database, run ./MutationOrder with --verbose

We then run

./MutationOrder --workdb chimp-human.json.gz --scoretype pairdistcen --onlypositive --outputprefix test chimp_118.fa human_118.fa

This will generate test.run, test-edge.eps, and


The ```test.run``` file provides extensive output of the optimal path, the
first-last probabilities, the edge probabilities, and the mea output. This
conforms to (i) -- (v) mentioned above.

The two ```eps``` files give a graphical representation of the edge
probabilities, for the ```meaorder``` in order of the path of maximum expected
accuracy.

The work database collects intermediate structures and their folding and is
only created once. The initial run will, however, take some time. I.e. for
'HAR1' this requires 1-4 hours depending on the machine. Further runs complete
*much* faster. In minutes for HAR1.

## Command-line options

    --help        provides short help
    --verbose     will show folding steps during the initial run

    -w
     --workdb=ITEM              the database where to store intermediate foldings
    -t
    --temperature=NUM           annealing temperature. Values close to 0 favor optimal paths. The default is 1.0
    --fillweight=FILLWEIGHT     provides logarithmic and linear fill styles for probability plots. The full style always fills the box
    --fillstyle=FILLSTYLE       normally, boxes are sized, but all in the same color. This changes the opacity of the color as well. Does not work well for eps files
    --cooptcount=INT            how many co-optimals to count for (the count in the .run file is produced differently)
    --cooptprint=INT            how many co-optimals to actually print
    --outprefix=ITEM            how to prefix all output files
    --scoretype=SCORETYPE       choose 'mfe', 'centroid', 'pairdistmfe', or 'pairdistcen' for the evaluation of each mutational step
    --positivesquared           square positive energies to penalize bad moves
    --onlypositive              minimize only over penalties, not energy gains
    --equalstart                each possible mutation is selected with equal probability as the initial one
    --posscaled=NUM,NUM         in =x,y will exponentiate all numbers >=x by the constant y. For value k>=x, we have k^y
    --lkupfile=ITEM             developer option to feed the initial work database with known foldings (usable but very raw and undocumented. needs 5-line rnafold output)
    --showmanual                will show this manual

The allowed score types are:  

    mfe
which optimizes based on the minimum free energy of each intermediate sequence
    centroid
which instead looks at the energy of the centroid structure
    pairdistmfe
which minimizes the base pair distance between following mutations using mfe structures
    pairdistcen
which minimizes the base pair distance between following mutations using centroid structures



# Installation

Pre-built binaries for Linux are avaiable under [github
releases](https://github.com/choener/MutationOrder/releases)

Follow [this
link](http://www.bioinf.uni-leipzig.de/~choener/software/MutationOrder.html) to
the bottom of the page for instructions to build from source.



#### Contact

Christian Hoener zu Siederdissen  
Leipzig University, Leipzig, Germany  
[email protected]  
http://www.bioinf.uni-leipzig.de/~choener/

MutationOrder

most likely order of mutation events in RNA