FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets

Users can either use precompiled tools on Linux or Mac 64-bit platforms or build tools from the source code

1. The splitting tools:
- Tree based splitting method (recommended tool):
Option 1 (use precompiled tool): Uncompress either (for Linux) or TreeBasedSplit_mac (for Mac).
Option 2 (compile tool from the source code): Uncompress; Go into TreeBasedSplit_sourceCodes folder; run ./build (gcc version 4.4 is recommended).
- Random splitting method: Install Perl; then use the Perl script.
2. The estimating tool:
Install Perl; uncompress either (for Mac) or (for Linux); use tool to estimate an amino acid replacement matrix from alignments.

Note that precompiled XRATE and PhyML programs used in the are provided, however, users can build XRATE and PhyML programs from the source code:
- Build XRATE from the DART package: Uncompress; Go into the dart_sourceCodes folder; run ./configure; make xrate (gcc version 4.4 is recommended).
- Build PhyML from PhyML package: Uncompress; Go into the phyml_sourceCodes folder; run ./configure; make (gcc version 4.4 is recommended).

The FastMG procedure consists of two phases: first, the large original alignments are split into non-overlapping sub-alignments by one of the alignment splitting algorithms (Step 1); then the matrix is estimated by joint maximum likelihood analysis of the smaller sub-alignments instead of the large original alignments (Step 2).

Step 1: Split alignments
- Create a folder to contain all sub-alignments, for example subAlignments.
- Use either TreeBasedSplit or RandomSplit to split alignments
To use TreeBasedSplit tool (recommended tool):
./TreeBasedSplit -k maxNumberOfSeqs -i alignmentFileName -z subAlignmentFolder
For example:
./TreeBasedSplit -k 16 -i sampleAlignment.phylip -z subAlignments

To use RandomSplit tool:
perl -k maxNumberOfSeqs -i alignmentFileName -z subAlignmentFolder
For example:
perl -k 16 -i sampleAlignment.phylip -z subAlignments

Note that: Input alignments must be in sequential PHYLIP format.

Step 2: Estimate an amino acid model using the estimate tool
- Create a folder to contain all sub alignments such as Alignments.
- Copy all sub alignments into the created alignment folder
- Run estimate tool as follows:
perl -a alignmentFolder -s JTT.paml -o outputMatrixFile
-a: The folder of all sub alignments
-s: The file of starting matrix in PAML format (default: JTT matrix as in JTT.paml)
-o: The file of the output matrix
For example:
perl -a Alignments -s JTT.paml -o testOutputMatrix.paml

Note that: The folder of all sub alignments must be in same place with the script.


- FastMG is a collection of application programs: XRATE, PhyML, etc.

- FastMG uses the Maximum Likelihood method. It is faster, and more accurate than other tools.

- The package was written in C, C++ and Perl script.

Note that: FastMG is released under the GNU General Public License version 3 (GPLv3).

Last edited Sep 9, 2014 at 10:35 AM by cuongdc, version 116