Tool reference¶

This page summarizes prominent tools within the CGAT Code collection. The tools are grouped losely by functionality.

Genomic intervals/features¶

<no title>: Compute overlap statistics of multiple bed files.
<no title>: Transform interval data in a bed formatted file into a fasta formatted file of sequence data.
<no title>: Convert between interval data. Convert a bed formatted file to a gff or gtf formatted file.
<no title>: Work on gff formatted files with genomic features. This tools sorts/renames feature files, reconciles chromosome names, and more.
<no title>: Filter or merge interval data in a bed formatted file.
<no title>: Compare two sets of genomic intervals and output a list of overlapping features.
<no title>: Compute summary statistics of genomic intervals.
<no title>: Annotate genomic intervals (composition, peak location, overlap, ...)
<no title>: Decompose multiple sets of genomic intervals into various intersections and unions.
<no title>: Compare multiple sets of interval data sets. The tools computes all-vs-all pairwise overlap summaries. Permits incremental updates of similarity table.
<no title>: Convert between formats
<no title>: Split a file in gff format into smaller files. The script ensures that overlapping intervals remain in the same file.
<no title>: This script computes the genomic coverage of intervals in a gff formatted file. The coverage is computed per feature.
<no title>: Output genomic sequences from intervals.
<no title>: Compute distributions of interval sizes, intersegmental distances and interval ovelap from list of intervals.
<no title>: Summarize features within a gff formatted file.
<no title>: Convert between formats.

Gene sets¶

<no title>: Translate a gene set into genomic annotations such as introns, intergenic regions, regulatory domains, etc.
<no title>: Annotate transcripts in a gtf formatted file. Annotations can be in reference to a second gene set (fragments, extensions), aligned reads (coverage, intron overrun, ...) or densities.
<no title>: Annotate each base in the genome according to its use within a transcript. Outputs lists of junctions.
<no title>: Derive genomic intervals (intergenic regions, introns) from a gene set.
<no title>: merge exons/transcripts/genes, filter transcripts/genes, rename transcripts/genes, ...
<no title>: convert gene set in gtf format to tabular format.
<no title>: Compare two gene sets - output common and unique lists of genes.
<no title>: Compare multiple gene sets. The tools computes all-vs-all pairwise overlap of exons, bases and genes. Permits incremental updates of similarity table.

Sequence data¶

<no title>: Interleave paired reads from two fastq files into a single fasta file.
<no title>: Build an index for a fasta file. Pre-requisite for many CGAT tools.
<no title>: Count kmer content in a set of fasta sequences.
<no title>: Compute features of sequences in fasta formatted files
<no title>: Compare two sets of sequences. Outputs missing, identical and fragmented sequences.
<no title>: Segment sequences based on G+C content, gaps, ...
<no title>: Concatentate sequences from multiple files.
<no title>: In-silico creation of variants of protein coding sequences.

NGS data¶

<no title>: Compute meta-gene profiles from aligned reads in a bam formatted file. Also accepts bed or bigwig formatted files.
<no title>: Operate on bam formatted files - filtering, stripping, setting flags.
<no title>: Convert bam formatted file of genomic alignments into genomic intervals. Permits merging of paired read data and filtering by insert-size.
<no title>: Save sequence and quality information from a bam formatted file.
<no title>: Compute read densities over a collection of intervals. Also accepts bed or bigwig formatted files.
<no title>: Compute summary statistics of a bam formatted file.
<no title>: Convert read coverage in a bam formatted file into a wiggle or bigwig formatted file.
<no title>: Compute stats on exon over-/underrun and spliced reads.
<no title>: Compute coverage of reads within multiple interval types.
<no title>: Outputs side-by-side comparison of residue level counts between multiple bam formatted files.
<no title>: Perform quality score conversion between fastq formatted files.
<no title>: Interleave paired end data.
<no title>: Output bases below quality threshold, number of N’s, quality score distribution.
<no title>: Ensure that paired read fastq formatted files are consistent after filtering on the individual files.
<no title>: Perform read-by-read comparison of two bam-files.

Variants¶

<no title>: Sort a vcf file.

Genomics¶

<no title>: How many residues to the same locations, do different locations, etc.
<no title>: Output coverage statistics for a UCSC liftover chain file.

Tool reference¶

Genomic intervals/features¶

Gene sets¶

Sequence data¶

NGS data¶

Variants¶

Genomics¶

Table Of Contents

This Page

Navigation

Tool reference¶

Genomic intervals/features¶

Gene sets¶

Sequence data¶

NGS data¶

Variants¶

Genomics¶

Table Of Contents

This Page

Quick search

Navigation