Installation instructions

The section below describes how to install the CGAT scripts. Please note that we can not test our code on all systems and configurations out there. If something does not work, please try a CGAT Code Clean Installation or download a copy of the CGAT Virtual Machine with all the software installed.

Quick installation

Pre-install dependencies

Installing CGAT can be straight-forward if all its dependencies are satisfied:

pip install cgat

However, CGAT depends on numerous other python packages which themselves might require manual intervention. Please see Manual installation for a step-by-step installation approach.

Initialization

In order to run pipelines and code directly from the CGAT script repository, you need to perform the following initializations:

python setup.py develop --multi-version

This will compile all the extension modules without installing anything. To use, add the CGAT directory to $PYTHONPATH environment variable:

export PYTHONPATH=$PYTHONPATH:/location/to/cgat

You might also want to run the script:

python scripts/cgat_build_extensions.py

to test if all the scripts with associated cython code compile cleanly.

Manual installation

The CGAT installation requires setuptools version 1.1 or higher to be installed. If your system has no setuptools installed, or an old version, please install setuptools first by:

wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py -O - | python

CGAT depends on numerous other python packages not all of which might install cleanly. Here, we give some more detailed instructions. Generally we recommend when troubleshooting CGAT installation to do so within a virtual environment. To create a clean environment, type:

virtualenv --no-site-packages cgat-python
source cgat-python/bin/activate

Now, download the list of required packages:

wget https://raw.github.com/CGATOxford/cgat/master/requires.txt

To install the required basic packages:

pip install -r requires.txt

Also, bx-python needs to be installed. The current version on pypi is currently out of date, so to install, do:

pip install https://bitbucket.org/james_taylor/bx-python/get/tip.tar.bz2

If all of that works, installing the CGAT tools should now be straight-forward:

pip install cgat

If you continue having problems with the installation please try the CGAT Code Clean Installation guide or download a copy of the CGAT Virtual Machine with all the software installed.

Troubleshooting

Some packages will require additional system-level packages to be installed. The following depencies might cause problems:

PyGreSQL
requires postgres-devel
PyGTK
not installable via setuptools, install separately.
biopython
pip occasionally fails for biopython. If so, try installing manually.

CGAT Code Clean Installation

In this section you will find detailed information on how to install the CGAT Code Collection and all its dependencies inside a newly created environment.

Installation instructions for the following operating systems are available:

Furthermore, we also provide a CGAT Virtual Machine.

Installing in Galaxy

CGAT tools can be used through the galaxy framework. In order to set up the CGAT tool box in you own galaxy instance, use the cgat2rdf.py script.

The sequence of commands is:

  1. Install Galaxy

  2. Install CGAT

  3. Run the cgat2rdf.py script (see <no title>) to create an xml file for inclusion into galaxy. For example, to create a wrapper for bam2stats.py (see <no title>), run, where cgat-xml is the location of tool xml files within galaxy:

    python <cgat-scripts>cgat2rdf.py --format=galaxy <cgat-scripts>bam2stats.py > <cgat-xml>bam2stats.xml
  4. Add an entry to tool_conf.xml for the script within the galaxy distribution:

    <section name="CGAT Tools" id="cgat_tools">
        <tool file="<cgat-xml>/bam2stats.xml" />
    </section>

A list of galaxy compatible scripts is in file galaxy.list. This file is part of the CGAT repository and can be used to create all wrappers in one go:

cat galaxy.list
| cgat2rdf.py
     --source-dir=<cgat-scripts>  --input-regex="(.*).py"
     --output-pattern=<galaxy-xml>/%s.xml --format=galaxy

Within galaxy, CGAT scripts will use samtools formatted genomic sequences, which are located in the sam_fa_indexes galaxy resource.