database
The “database” object handles the indexes of the sequence dataset in fasta format, and other useful information on the input dataset.
MacSyFinder needs to have the length of each sequence and its position in the database to compute some statistics on Hmmer hits. Additionally, for ordered datasets ( db_type = ‘gembase’ or ‘ordered_replicon’ ), MacSyFinder builds an internal “database” from these indexes to store information about replicons, their begin and end positions, and their topology.
The begin and end positions of each replicon are computed from the sequence file, and the topology from the parsing of the topology file (–topology-file, see Topology files).
Thus it also builds an index (with .idx suffix) that is stored in the same directory as the sequence dataset. If this file is found in the same folder than the input dataset, MacSyFinder will use it. Otherwise, it will build it.
The user can force MacSyFinder to rebuild these indexes with the “–idx” option on the command-line.
database API reference
Indexes
- class macsypy.database.Indexes(cfg)[source]
Handle the indexes for macsyfinder:
find the indexes required by macsyfinder to compute some scores, or build them.
- __init__(cfg)[source]
The constructor retrieves the file of indexes in the case they are not present or the user asked for build indexes (–idx) Launch the indexes building.
- Parameters
cfg (
macsypy.config.Config
object) – the configuration
- __iter__()[source]
- Raises
MacsypyError – if the indexes are not buid
- Returns
an iterator on the indexes
To use it the index must be build.
- __weakref__
list of weak references to the object (if defined)
- _build_my_indexes(index_dir)[source]
Build macsyfinder indexes. These indexes are stored in a file.
- The file format is the following:
the first line is the path of the sequence-db indexed
one entry per line, with each line having this format:
sequence id;sequence length;sequence rank
- _index_dir(build=False)[source]
search where to store(build=True) read indexes
- Parameters
build (bool) – if check the index-dir permissions to write
- Returns
The directory where read or write the indexes
- Return type
str
- Raises
ValueError – if the directory specify by –index-dir option does not exists or if build = True index-dir is not writable
RepliconInfo
Module to handle sequences and their indexes
- class macsypy.database.RepliconInfo(topology, min, max, genes)
handle information about a replicon
- topology
The type of replicon topology ‘linear or ‘circular’
- min
The position of the last gene of the replicon in the sequence dataset.
- max
The position of the last gene of the replicon in the sequence dataset.
- genes
A list of genes beloging to the replicon. Each genes is representing by a tuple (str seq_id, int length)
- genes
Alias for field number 3
- max
Alias for field number 2
- min
Alias for field number 1
- topology
Alias for field number 0
RepliconDB
- class macsypy.database.RepliconDB(cfg)[source]
Stores information (topology, min, max, [genes]) for all replicons in the sequence_db the Replicon object must be instantiated only for sequence_db of type ‘gembase’ or ‘ordered_replicon’
- __contains__(replicon_name)[source]
- Parameters
replicon_name (string) – the name of the replicon
- Returns
True if replicon_name is in the repliconDB, false otherwise.
- Return type
boolean
- __getitem__(replicon_name)[source]
- Parameters
replicon_name (string) – the name of the replicon to get information on
- Returns
the RepliconInfo for the provided replicon_name
- Return type
RepliconInfo
object- Raise
KeyError if replicon_name is not in repliconDB
- __init__(cfg)[source]
- Parameters
cfg (
macsypy.config.Config
object) – The configuration object
Note
This class can be instanciated only if the db_type is ‘gembase’ or ‘ordered_replicon’
- __weakref__
list of weak references to the object (if defined)
- _fill_gembase_min_max(topology, default_topology)[source]
For each replicon_name of a gembase dataset, it fills the internal dictionary with a namedtuple RepliconInfo
- Parameters
topology (dict) – the topologies for each replicon (parsed from the file specified with the option –topology-file)
default_topology (string) – the topology provided by the config.replicon_topology
- _fill_ordered_min_max(default_topology=None)[source]
For the replicon_name of the ordered_replicon sequence base, fill the internal dict with RepliconInfo
- Parameters
default_topology (string) – the topology provided by config.replicon_topology
- _fill_topology()[source]
Fill the internal dictionary with min and max positions for each replicon_name of the sequence_db
- get(replicon_name, default=None)[source]
- Parameters
replicon_name (string) – the name of the replicon to get informations
default (any) – the value to return if the replicon_name is not in the RepliconDB
- Returns
the RepliconInfo for replicon_name if replicon_name is in the repliconDB, else default. If default is not given, it is set to None, so that this method never raises a KeyError.
- Return type
RepliconInfo
object
- iteritems()[source]
- Returns
an iterator over the RepliconDB as a list (replicon_name, RepliconInfo) pairs