Map 454 reads onto a genome and assemble overlapping transcripts into transcript models.
The pipeline currently does not use base quality information during mapping and does not consider alternative transcripts.
To set up the pipeline in the current directory run:
python setup.py --method=map_transcripts_454 > setup.log
Add or link fasta files of reads into directory. These should end with the suffix .fasta. The pipeline will process several files at the same time. For example:
tissue1.fasta
tissue2.fasta
tissue3.fasta
Link towards the genome from /net/cpp-data/backup/databases/indexed_fasta and call the files genome.fasta and genome.idx. For example:
ln -s /net/cpp-data/backup/databases/indexed_fasta/hs_ncbi36_softmasked.fasta genome.fasta
ln -s /net/cpp-data/backup/databases/indexed_fasta/hs_ncbi36_softmasked.idx genome.idx
Build the index for gmap by running gmap_setup. By default, gmap indices should be put in /net/cpp-mirror/databases/gmap. Provide the location to the indices using the variable PARAM_GMAP_OPTIONS.
Note
Indices on networked disks are slow to load up. For performance reasons work with local indices.
Edit the Makefile to configure the pipeline. See Parameters below.
The following parameters can be set in the Makefile: