We encourage everyone who uses parts of the CGAT code collection to contribute. Contributions can take many forms: bugreports, bugfixes, new scripts and pipelines, documentation, tests, etc. All contributions are welcome.
Before adding a new scripts to the repository, please check if the following are true:
Using pyximport, it is (relatively) straight-forward to add optimized C-code to python scripts and, for example, access pysam internals and the underlying samtools library. See for example <no title>.
To add an extension, the following needs to be in place:
The main script (scripts/bam2stats.py). The important lines in this script are:
try:
import pyximport
pyximport.install()
import _bam2stats
except ImportError:
import CGAT._bam2stats as _bam2stats
The snippet first attempts to build and import the extension by setting up pyximport and then importing the cython module as _bam2stats. In case this fails, as is the case for an installed code, it looks for a pre-built extension (by setup.py) in the CGAT pacakge.
The cython implementation _bam2stats.pyx. This script imports the pysam API via:
from csamtools cimport *
This statement imports, amongst others, AlignedRead into the namespace. Speed can be gained from declaring variables. For example, to efficiently iterate over a file, an AlignedRead object is declared:
# loop over samfile
cdef AlignedRead read
for read in samfile:
...
A pyxbld providing pyximport with build information. Required are the locations of the samtools and pysam header libraries of a source installation of pysam plus the csamtools.so shared library. For example:
def make_ext(modname, pyxfilename):
from distutils.extension import Extension
import pysam, os
dirname = os.path.dirname( pysam.__file__ )[:-len("pysam")]
return Extension(name = modname,
sources=[pyxfilename],
extra_link_args=[ os.path.join( dirname,
"csamtools.so")],
include_dirs = pysam.get_include(),
define_macros = pysam.get_defines() )
If the script bam2stats.py is called the first time, pyximport will compile the cython extension _bam2stats.pyx and make it available to the script. Compilation requires a working compiler and cython installation. Each time _bam2stats.pyx is modified, a new compilation will take place.