Tool reference

This page summarizes prominent tools within the CGAT Code collection. The tools are grouped losely by functionality.

Genomic intervals/features

<no title>
Compute overlap statistics of multiple bed files.
<no title>
Transform interval data in a bed formatted file into a fasta formatted file of sequence data.
<no title>
Convert between interval data. Convert a bed formatted file to a gff or gtf formatted file.
<no title>
Work on gff formatted files with genomic features. This tools sorts/renames feature files, reconciles chromosome names, and more.
<no title>
Filter or merge interval data in a bed formatted file.
<no title>
Compare two sets of genomic intervals and output a list of overlapping features.
<no title>
Compute summary statistics of genomic intervals.
<no title>
Annotate genomic intervals (composition, peak location, overlap, ...)
<no title>
Decompose multiple sets of genomic intervals into various intersections and unions.
<no title>
Compare multiple sets of interval data sets. The tools computes all-vs-all pairwise overlap summaries. Permits incremental updates of similarity table.
<no title>
Convert between formats
<no title>
Split a file in gff format into smaller files. The script ensures that overlapping intervals remain in the same file.
<no title>
This script computes the genomic coverage of intervals in a gff formatted file. The coverage is computed per feature.
<no title>
Output genomic sequences from intervals.
<no title>
Compute distributions of interval sizes, intersegmental distances and interval ovelap from list of intervals.
<no title>
Summarize features within a gff formatted file.
<no title>
Convert between formats.

Gene sets

<no title>
Translate a gene set into genomic annotations such as introns, intergenic regions, regulatory domains, etc.
<no title>
Annotate transcripts in a gtf formatted file. Annotations can be in reference to a second gene set (fragments, extensions), aligned reads (coverage, intron overrun, ...) or densities.
<no title>
Annotate each base in the genome according to its use within a transcript. Outputs lists of junctions.
<no title>
Derive genomic intervals (intergenic regions, introns) from a gene set.
<no title>
merge exons/transcripts/genes, filter transcripts/genes, rename transcripts/genes, ...
<no title>
convert gene set in gtf format to tabular format.
<no title>
Compare two gene sets - output common and unique lists of genes.
<no title>
Compare multiple gene sets. The tools computes all-vs-all pairwise overlap of exons, bases and genes. Permits incremental updates of similarity table.

Sequence data

<no title>
Interleave paired reads from two fastq files into a single fasta file.
<no title>
Build an index for a fasta file. Pre-requisite for many CGAT tools.
<no title>
Count kmer content in a set of fasta sequences.
<no title>
Compute features of sequences in fasta formatted files
<no title>
Compare two sets of sequences. Outputs missing, identical and fragmented sequences.
<no title>
Segment sequences based on G+C content, gaps, ...
<no title>
Concatentate sequences from multiple files.
<no title>
In-silico creation of variants of protein coding sequences.

NGS data

<no title>
Compute meta-gene profiles from aligned reads in a bam formatted file. Also accepts bed or bigwig formatted files.
<no title>
Operate on bam formatted files - filtering, stripping, setting flags.
<no title>
Convert bam formatted file of genomic alignments into genomic intervals. Permits merging of paired read data and filtering by insert-size.
<no title>
Save sequence and quality information from a bam formatted file.
<no title>
Compute read densities over a collection of intervals. Also accepts bed or bigwig formatted files.
<no title>
Compute summary statistics of a bam formatted file.
<no title>
Convert read coverage in a bam formatted file into a wiggle or bigwig formatted file.
<no title>
Compute stats on exon over-/underrun and spliced reads.
<no title>
Compute coverage of reads within multiple interval types.
<no title>
Outputs side-by-side comparison of residue level counts between multiple bam formatted files.
<no title>
Perform quality score conversion between fastq formatted files.
<no title>
Interleave paired end data.
<no title>
Output bases below quality threshold, number of N’s, quality score distribution.
<no title>
Ensure that paired read fastq formatted files are consistent after filtering on the individual files.
<no title>
Perform read-by-read comparison of two bam-files.

Variants

<no title>
Sort a vcf file.

Genomics

<no title>
How many residues to the same locations, do different locations, etc.
<no title>
Output coverage statistics for a UCSC liftover chain file.

Table Of Contents

This Page