File formats

yaml
Language to serialize objects. Used in the CGAT testing framework. (YAML).
bam
Format to store genomic alignments in a compressed format. (BAM).
bed
File containing genomic intervals. (BED).
vcf
Variant call format.
gtf
General transfer format. Format to store genes and transcripts.
gff
General feature format.
bigwig
Compressed format for displaying numerical values across genomic ranges (BIGWIG).
fasta
Sequence format.
wiggle
Format for displaying numerical values across genomic ranges (Wiggle).
psl
Genomic alignment format. The format is described in detail (PSL.
sam
Format to store genomic alignments (SAM).
gdl
gdl
tsv

Tab separated values. In these tables, records are separated by new-line characters and fields by tab characters. Lines with comments are started by the # character and are ignored. The first uncommented line should contain the column headers. For example:

# This is a comment
gene_id       length
gene1 1000
gene2 2000
# Another comment
svg
pass
edge list
pass
fastq
Sequence format containing quality scores, more background is here
sra
sra
axt
axt
maf
maf
rdf
Resource description framework

Other terms

test directory
Directory that contains the test.yaml, input and reference files for testing scripts.
experiment
experiment
replicate
replicate
graph
graph
track
track
graph
graph
submit host
pass
execution host
pass
edge list
pass
task
pass
sphinxreport
sphinxreport
query
pass
target
pass
code directory
pass
go
pass
goslim
pass
fastq
pass
tss
Transcription start site
production pipeline
A pipeline that performs common tasks on a certain type of data. The idea of a production pipeline is to provide common preprocessing of data and a first look. A project pipeline might then take data from one or more production pipeline to glean biological insight.
project pipeline
A pipeline that is project specific. Usually code is developed first inside a project pipeline. When it becomes generally useful, it may be refactored into a production pipeline.
stdin
Unix standard input. Most CGAT tools read data from stdin.
stdout
Unix standard output. Most CGAT tools output data to stdout.
stderr
Unix standard error. This is where errors go.
loglevel
Verbosity of logging information. The logging level can be determined by the --verbose option. A level of 0 means no logging output, while 1 is information messages only, while 2 outputs also debugging information.

Table Of Contents

Previous topic

Importing CGAT scripts into galaxy

This Page