Map 454 reads onto a genome and assemble overlapping transcripts into transcript models.
The pipeline currently does not use base quality information during mapping and does not consider alternative transcripts.
To set up the pipeline in the current directory run:
python setup.py --method=compare_transcripts > setup.log
Link towards the genome from /net/cpp-data/backup/databases/indexed_fasta and call the files genome.fasta and genome.idx. For example:
ln -s /net/cpp-data/backup/databases/indexed_fasta/hs_ncbi36_softmasked.fasta genome.fasta
ln -s /net/cpp-data/backup/databases/indexed_fasta/hs_ncbi36_softmasked.idx genome.idx
Input (required):
The pipeline includes additional information if it is present:
GO annotations for genes in the reference set. Example format is:
cell_location ENSPPYG00000000676 GO:0016020 membrane NA
gene territories. GTF formatted file, an example entry would be:
chr1 protein_coding exon 3979975 4199559 . - . transcript_id "ENSPPYG00000000050"; gene_id "ENSPPYG00000000050";#
Output from the mapTranscripts454 project can be imported with a single command:
make PATH_TO_MAPPING_DIR.add-tracks
Edit the Makefile to configure the pipeline. See Parameters below.
The pipeline is controlled by running make targets. The results of the pipeline computation are stored as tab separated tables in the working directory. Most of these tables are then imported into an sqlite database called csvdb (see PARAM_DATABASE).
Type:
make all
to do all.
A more complete list of targets:
The following targets aid visualizatiov:
- ucsc-tracks-gtf
- export the segments as compressed gtf files. Can be viewed as user tracks in the ucsc genome browser.
GO analysis will compute the relative enrichment/depletion gene sets.
Requires PARAM_FILENAME_TERRITORIES, PARAM_FILENAME_GO and PARAM_FILENAME_GOSLIM to be set.
There are two counting methods. The first method (go) assigns GO terms associated with the reference gene set to TLs and counts these. The second method (territorygo) assigns TLs to genes in the reference set and then does a GO analysis on theses.
Note
The convential GO analysis based on gene list is the territorygo method.
Usage:
make <track>:<slice>:<subset>:<background>.<go>.<method>analysis
The fields are:
Results will be in the directory <track>:<slice>:<subset>:<background>.<go>.<method>analysis.dir.
For example:
make thoracic:known:all:thoracic.go.goanalysis
will compute the enrichment of protein coding TL in the track thoracic using all thoracic genes as the background.
The command:
make thoracic:known:all:ensembl.goslim.territorygoanalysis
will compute goslim term enrichment. The foreground set are genes from the reference set (ensembl) overlapping protein coding TL in the track thoracic. The background is the complete reference gene set (ensembl).
Annotator computes the statistical significance of enrichment/depletion of genomic features (called segments) within genomic regions (called annotations).
To run annotator analysis, two files need to be present:
All workspaces exclude contigs called matching random.
There is a convenience target:
make annotator-workspaces
that will build all available workspaces.
Annotations are built using makefile targets.
There is a convenience target:
make annotator-annotations
that will build all available annotations.
In order to perform Annotator analyses, you run a make target:
make <track>:<slice>:<subset>:<workspace>:<workspace2>_<annotations>.annotators
The fields determine which segments are used for the enrichment analysis.
Note
Annotations, segments and the workspace need to be chosen carefully for each experiment. For example, failing to use territories for goterritory analysis will measure enrichment of segments within goterritories in general, and not necessarily relative enrichment between go territories.
The results will be in the file <track>:<slice>:<subset>:<workspace>:<workspace2>_<annotations>.annotators.
The command:
make thoracic:unknown:all:intergenic:all_unknownsets.annotators
will test for enrichment among unknown transcripts in the track thoracic with intergenic segments the other sets. The command:
make thoracic:intronic:all:intronic:territories_intronicgoslimterritories.annotators
will check for enrichment of intronic transcripts from the track merged within intronic genomic segments that also have GO assignments (intersection of workspaces intronic and territories. It will label GO territories by GOslim territories.
Association analysis computes the significance of finding segments close to annotations.
Type:
make annotator-distance-run
to run all association analyses.
The following parameters can be set in the Makefile: