Installing CGAT pipelines¶

The CGAT pipelines, scripts and libraries make several assumptions about the computing environment. This section describes how to install the code and set up your computing environment.

Downloading and installing the source code¶

To obtain the latest code, check it out from the public mercurial repository at:

hg clone http://www.cgat.org/hg/cgat/ cgat

Once checked-out, you can get the latest changes via pulling and updating:

hg pull
hg update

Some scripts contain cython code that needs to be recompiled if the script or the pysam installation has changed. To rebuild all scripts, for example after updating the repository, type:

python cgat/scripts/cgat_rebuild_extensions.py

Recompilation requires a C compiler to be installed.

Setting up the computing environment¶

The pipelines assume that Sun Grid Engine has been installed. Other queueing systems might work, but expect to be disappointed. The pipeline is started on a submit host assuming a default queue all.q. Other queues can be specified on the command line, for example:

python cgat/CGATPipelines/pipeline_<name>.py --cluster-queue=medium_jobs.q

A pipeline might start up to -p/--multiprocess processes. Preferentially, tasks are sent to the cluster, but for some tasks this is not possible. These might thus run on the submit host, so make sure it is fairly powerful.

Pipelines expects that the working directory is accessible with the same path both from the submit and the execution host.

Software requirements¶

On top of pipeline specific bioinformatics software, CGAT pipelines make use a variety of software. Unfortunately we can’t support many versions. The following table gives a list software we have currently installed:

Section	Software	Version
apps	java	jre1.6.0_26
apps	gccxml	0.9
apps	R	2.14.1
bio	alignlib	0.4.4
apps	python	2.7.1
apps	perl	5.12.3
apps	graphlib	0.1
bio	abiwtap	1.2.1
bio	bamstats	1.22
bio	batman	0.2.3
bio	bedtools	2.13.3
bio	belvu	2.16
bio	bfast	0.6.5a
bio	bioprospector	2004
bio	bowtie	0.12.7
bio	bwa	0.5.9
bio	cdhit	4.3
bio	clustalw	2.1
bio	cufflinks	1.3.0
bio	cpc	0.9-r2
bio	dialign	2.2.1
bio	ensembl	62
bio	ensembl-variation	62
bio	exonerate	2.2.0
bio	fastqc	0.9.2
bio	fastx	0.0.13
bio	gatk	1.0.5506
bio	gblocks	0.91b
bio	gcprofile	1.0
bio	gmap	2011.03.28
bio	galaxy	dist
bio	IGV	2.0.23
bio	IGVTools	1.5.12
bio	kent	1.0
bio	hmmer	3.0
bio	leotools	0.1
bio	meme	4.7.0
bio	muscle	3.8.31
bio	mappability_map	1.0
bio	ncbiblast	2.2.25+
bio	newickutils	1.3.0
bio	novoalign	2.07.11
bio	novoalignCS	1.01.11
bio	paml	4.4c
bio	picard-tools	1.48
bio	phylip	3.69
bio	polyphen	2.0.23
bio	samtools	0.1.18
bio	shrimp	2.1.1
bio	sicer	1.1
bio	sift	4.0.3
bio	simseq	72ce499
bio	soap	2.21
bio	soapsplice	1.0
bio	sratoolkit	2.1.7
bio	SpliceMap	3.3.5.2
bio	stampy	1.0.17
bio	statgen	0.1.4
bio	storm	0.1
bio	tabix	0.2.5
bio	tophat	1.4.1
bio	treebest	0.1
bio	tv	0.5
bio	vcftools	0.1.8a
bio	emboss	6.3.1
bio	velvet	1.1.04
bio	perm	0.3.5
bio	lastz	1.02.00
bio	hpeak	2.1
bio	boost	1.46.1
bio	Trinity	2012-01-25
bio	bowtie2	2.0.0-beta5
bio	tophat2	2.0.0
bio	all	1.0

What exactly is required will depend on the particular pipeline. The pipeline assumes that the executables are in the users PATH and that the rest of the environment has been set up for each tool.

Additionally, there is a list of additional software that is required that are usually shipped as a source package with the operating system. These are:

sqlite

Python libraries¶

CGAT uses python extensively and is currently developed against python 2.7.1. Python 2.6 should work as well, but some libraries present in 2.7.1 but missing in 2.6 might need to be installed. Scripts have not yet been ported to python 3.

CGAT requires the following in-house python libraries to be installed:

Library	Version	Purpose	Download
pysam	0.6.0	python bindings for samtools	hg clone https://code.google.com/p/pysam/ pysam
alignlib	0.4.5	C++ sequence alignment library with python bindings.	wget http://downloads.sourceforge.net/project/alignlib/alignlib/alignlib-0.4.5.tar.gz
sphinxreport	latest	report generator	svn checkout https://sphinx-report.googlecode.com/svn/trunk/ sphinx-report

In addition, CGAT scripts make extensive use of the following python libraries (list below might not be complete):

Library	Version	Purpose
numpy
scipy
rpy2
matplotlib
ruffus
drmaa_python

The full list of modules installed at CGAT is:

Module	Version	Method
pycairo	01/08/06	S
pygjobject	2.20.0	S
pygtk	2.16.0	S
wxPython	2.9.1.1	S
matplotlib	1	S
numpy	01/05/01	E
scipy	0.8.0	S
rpy	1.0.3	S
rpy2	02/02/00	S
networkx	1.3	E
pytables	2.2
pygccxml	1	S
pyplusplus	1	S
bx.python
pygresql	4	E
myqsl-python	01/02/03	E
biopython	1.56	E
ply	3.3	E
psyco
pyrex	0.9.9	E
cython	0.13	E
sphinx	1.0.5	E
reportlab	2.5	E
guppy	0.1.9	E
pil	01/01/07	E
threadpool	01/02/07	E
progressbar	2.3	E
virtualenv	01/05/01	E
sqlalchemy	0.6.5	E
ruffus	2.2	E
drmaa	0.4b3	E
bx.python	12/01/10	S
corebio	0.5.0	E
weblogolib	3	E
mercurial	01/07/03	E
scikits.learn	0.7.1	E
web.py	0.34	E
pandas	0.5.0	E
pybedtools	0.6	E

Method : Installation method (E = easy_install/setuptools, S = setup.py/distutils, C = CGAT)

Installing CGAT pipelines¶

Downloading and installing the source code¶

Setting up the computing environment¶

Software requirements¶

Python libraries¶

Table Of Contents

Previous topic

Next topic

This Page

Navigation

Installing CGAT pipelines¶

Downloading and installing the source code¶

Setting up the computing environment¶

Software requirements¶

Python libraries¶

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Navigation