vg
tools for working with variation graphs
|
#include <gapless_extender.hpp>
Public Types | |
typedef GaplessExtension::seed_type | seed_type |
typedef pair_hash_set< seed_type > | cluster_type |
Public Member Functions | |
GaplessExtender () | |
Create an empty GaplessExtender. More... | |
GaplessExtender (const gbwtgraph::GBWTGraph &graph, const Aligner &aligner) | |
Create a GaplessExtender using the given GBWTGraph and Aligner objects. More... | |
std::vector< GaplessExtension > | extend (cluster_type &cluster, const std::string &sequence, const gbwtgraph::CachedGBWTGraph *cache=nullptr, size_t max_mismatches=MAX_MISMATCHES, double overlap_threshold=OVERLAP_THRESHOLD) const |
void | unfold_haplotypes (const std::unordered_set< nid_t > &subgraph, std::vector< std::vector< handle_t >> &haplotype_paths, bdsg::HashGraph &unfolded, const gbwtgraph::CachedGBWTGraph *cache=nullptr) const |
void | transform_alignment (Alignment &aln, const std::vector< std::vector< handle_t >> &haplotype_paths) const |
Static Public Member Functions | |
static seed_type | to_seed (pos_t pos, size_t read_offset) |
Convert (graph position, read offset) to a seed. More... | |
static pos_t | get_pos (seed_type seed) |
Get the graph position from a seed. More... | |
static handle_t | get_handle (seed_type seed) |
Get the handle from a seed. More... | |
static size_t | get_node_offset (seed_type seed) |
Get the node offset from a seed. More... | |
static size_t | get_read_offset (seed_type seed) |
Get the read offset from a seed. More... | |
Public Attributes | |
const gbwtgraph::GBWTGraph * | graph |
const Aligner * | aligner |
Static Public Attributes | |
constexpr static size_t | MAX_MISMATCHES = 4 |
The default value for the maximum number of mismatches. More... | |
constexpr static double | OVERLAP_THRESHOLD = 0.8 |
A class that supports haplotype-consistent seed extension using GBWTGraph. Each seed is a pair of matching read/graph positions and each extension is a gapless alignment of an interval of the read to a haplotype. A cluster is an unordered set of distinct seeds. Seeds in the same node with the same (read_offset - node_offset) difference are considered equivalent. All seeds in a cluster should correspond to the same alignment or positions near it. GaplessExtender also needs an Aligner object for scoring the extension candidates.
vg::GaplessExtender::GaplessExtender | ( | ) |
Create an empty GaplessExtender.
|
explicit |
Create a GaplessExtender using the given GBWTGraph and Aligner objects.
std::vector< GaplessExtension > vg::GaplessExtender::extend | ( | cluster_type & | cluster, |
const std::string & | sequence, | ||
const gbwtgraph::CachedGBWTGraph * | cache = nullptr , |
||
size_t | max_mismatches = MAX_MISMATCHES , |
||
double | overlap_threshold = OVERLAP_THRESHOLD |
||
) | const |
Find the highest-scoring extension for each seed in the cluster. If there is a full-length extension with at most max_mismatches mismatches, return the (up to two) best full-length extensions with less than overlap_threshold overlap, sorted by score in descending order. If that is not possible, trim the extensions to maximize score, sort them by read interval, and remove duplicates. Allow any number of mismatches in the initial node, at least max_mismatches mismatches in the entire extension, and at least max_mismatches / 2 mismatches on each flank. Use the provided CachedGBWTGraph or allocate a new one.
Get the handle from a seed.
|
inlinestatic |
Get the node offset from a seed.
Get the graph position from a seed.
|
inlinestatic |
Get the read offset from a seed.
Convert (graph position, read offset) to a seed.
void vg::GaplessExtender::transform_alignment | ( | Alignment & | aln, |
const std::vector< std::vector< handle_t >> & | haplotype_paths | ||
) | const |
Transform an alignment to a single node in the unfold_haplotypes() graph to an alignment to the corresponding path in the original graph.
void vg::GaplessExtender::unfold_haplotypes | ( | const std::unordered_set< nid_t > & | subgraph, |
std::vector< std::vector< handle_t >> & | haplotype_paths, | ||
bdsg::HashGraph & | unfolded, | ||
const gbwtgraph::CachedGBWTGraph * | cache = nullptr |
||
) | const |
Find the distinct local haplotypes in the given subgraph and return the corresponding paths. For each path haplotype_paths[i], the output graph will contain node 2i + 1 with sequence corresponding to the path and node 2i + 2 with the reverse complement of the sequence. Use the provided CachedGBWTGraph or allocate a new one.
const Aligner* vg::GaplessExtender::aligner |
const gbwtgraph::GBWTGraph* vg::GaplessExtender::graph |
|
staticconstexpr |
The default value for the maximum number of mismatches.
|
staticconstexpr |
Two full-length alignments are distinct, if the fraction of overlapping position pairs is at most this.