GenomicIO.py - Subroutines for working on I/O of large genomic files

Author:
Release:$Id$
Date:December 09, 2013
Tags:Python

I tried the Biopython parser, but it was too slow for large genomic chunks.

GenomicIO.index_file(filenames, db_name)

index file/files.

Two new files are create - db_name.fasta and db_name.idx

GenomicIO.index_exists(filename)

check if a certain file has been indexed.

GenomicIO.getSequence(db_name, sbjct_token, sbjct_strand, sbjct_from, sbjct_to, as_array=False, forward_coordinates=False)

get genomic fragment.

GenomicIO.splitFasta(infile, chunk_size, dir='/tmp', pattern=None)

split a fasta file into a subset of files.

If pattern is not given, random file names are chosen.

GenomicIO.getConverter(format)

return a converter function for converting various coordinate schemes into 0-based, both strand, closed-open ranges.

converter functions have the parameters x, y, s, l: with x and y the coordinates of a sequence fragment, s the strand (True is positive) and l being the length of the contig.

Previous topic

<no title>

Next topic

<no title>

This Page