SaryFasta.py - index fasta files by suffix array¶

Subroutines for working on I/O of large genomic files.

Index a fasta file to retrieve sequences by suffix-array fragment search.

python SaryFasta.py [options] name [ files ]

SaryFasta.getHID(sequence)¶: returns a hash identifier for a sequence.

SaryFasta.createDatabase(db, filenames, buf_size=400000000, force=False, regex_identifier=None)¶

index files in filenames to create database.

buf_size: buffer size for a sary chunk.

Two new files are created - db.fasta and db_name.idx

regex_identifier: pattern to extract identifier from description line. If None, the part until the first white-space character is used.

SaryFasta.benchmarkRandomFragment(fasta, size)¶: returns a random fragment of size.

SaryFasta.verify(reference, fasta, num_iterations, fragment_size, stdout=<open file '<stdout>', mode 'w' at 0x7f1ccf94d150>, quiet=False)¶

verify two databases.

Get segment from fasta and check for presence in fasta2.

SaryFasta.py - index fasta files by suffix array¶

Previous topic

Next topic

This Page

Navigation

SaryFasta.py - index fasta files by suffix array¶

Previous topic

Next topic

This Page

Quick search

Navigation