Class CollectWgsMetrics

  • Direct Known Subclasses:
    CollectRawWgsMetrics, CollectWgsMetricsWithNonZeroCoverage

    @DocumentedFeature
    public class CollectWgsMetrics
    extends CommandLineProgram
    Computes a number of metrics that are useful for evaluating coverage and performance of whole genome sequencing experiments. Two algorithms are available for this metrics: default and fast. The fast algorithm is enabled by USE_FAST_ALGORITHM option. The fast algorithm works better for regions of BAM file with coverage at least 10 reads per locus, for lower coverage the algorithms perform the same.
    • Field Detail

      • INPUT

        @Argument(shortName="I",
                  doc="Input SAM/BAM/CRAM file.")
        public File INPUT
      • OUTPUT

        @Argument(shortName="O",
                  doc="Output metrics file.")
        public File OUTPUT
      • MINIMUM_MAPPING_QUALITY

        @Argument(shortName="MQ",
                  doc="Minimum mapping quality for a read to contribute coverage.")
        public int MINIMUM_MAPPING_QUALITY
      • MINIMUM_BASE_QUALITY

        @Argument(shortName="Q",
                  doc="Minimum base quality for a base to contribute coverage. N bases will be treated as having a base quality of negative infinity and will therefore be excluded from coverage regardless of the value of this parameter.")
        public int MINIMUM_BASE_QUALITY
      • COVERAGE_CAP

        @Argument(shortName="CAP",
                  doc="Treat positions with coverage exceeding this value as if they had coverage at this value (but calculate the difference for PCT_EXC_CAPPED).")
        public int COVERAGE_CAP
      • LOCUS_ACCUMULATION_CAP

        @Argument(doc="At positions with coverage exceeding this value, completely ignore reads that accumulate beyond this value (so that they will not be considered for PCT_EXC_CAPPED).  Used to keep memory consumption in check, but could create bias if set too low")
        public int LOCUS_ACCUMULATION_CAP
      • STOP_AFTER

        @Argument(doc="For debugging purposes, stop after processing this many genomic bases.")
        public long STOP_AFTER
      • INCLUDE_BQ_HISTOGRAM

        @Argument(doc="Determines whether to include the base quality histogram in the metrics file.")
        public boolean INCLUDE_BQ_HISTOGRAM
      • COUNT_UNPAIRED

        @Argument(doc="If true, count unpaired reads, and paired reads with one end unmapped")
        public boolean COUNT_UNPAIRED
      • SAMPLE_SIZE

        @Argument(doc="Sample Size used for Theoretical Het Sensitivity sampling. Default is 10000.",
                  optional=true)
        public int SAMPLE_SIZE
      • THEORETICAL_SENSITIVITY_OUTPUT

        @Argument(doc="Output for Theoretical Sensitivity metrics.",
                  optional=true)
        public File THEORETICAL_SENSITIVITY_OUTPUT
      • ALLELE_FRACTION

        @Argument(doc="Allele fraction for which to calculate theoretical sensitivity.",
                  optional=true)
        public List<Double> ALLELE_FRACTION
      • USE_FAST_ALGORITHM

        @Argument(doc="If true, fast algorithm is used.")
        public boolean USE_FAST_ALGORITHM
      • READ_LENGTH

        @Argument(doc="Average read length in the file. Default is 150.",
                  optional=true)
        public int READ_LENGTH
      • INTERVALS

        protected File INTERVALS
    • Constructor Detail

      • CollectWgsMetrics

        public CollectWgsMetrics()
    • Method Detail

      • makeIntervalArgumentCollection

        protected IntervalArgumentCollection makeIntervalArgumentCollection()
        Returns:
        An interval argument collection to be used for this tool. Subclasses can override this to provide an argument collection with alternative arguments or argument annotations.
      • getSamReader

        protected htsjdk.samtools.SamReader getSamReader()
        Gets the SamReader from which records will be examined. This will also set the header so that it is available in
      • doWork

        protected int doWork()
        Description copied from class: CommandLineProgram
        Do the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.
        Specified by:
        doWork in class CommandLineProgram
        Returns:
        program exit status.
      • getIntervalsToExamine

        protected htsjdk.samtools.util.IntervalList getIntervalsToExamine()
        Gets the intervals over which we will calculate metrics.
      • getSamFileHeader

        protected htsjdk.samtools.SAMFileHeader getSamFileHeader()
        This method should only be called after getSamReader() is called.
      • generateWgsMetrics

        protected WgsMetrics generateWgsMetrics​(htsjdk.samtools.util.IntervalList intervals,
                                                htsjdk.samtools.util.Histogram<Integer> highQualityDepthHistogram,
                                                htsjdk.samtools.util.Histogram<Integer> unfilteredDepthHistogram,
                                                double pctExcludedByAdapter,
                                                double pctExcludedByMapq,
                                                double pctExcludedByDupes,
                                                double pctExcludedByPairing,
                                                double pctExcludedByBaseq,
                                                double pctExcludedByOverlap,
                                                double pctExcludedByCapping,
                                                double pctTotal,
                                                int coverageCap,
                                                htsjdk.samtools.util.Histogram<Integer> unfilteredBaseQHistogram,
                                                int theoreticalHetSensitivitySampleSize)
      • getBasesExcludedBy

        protected long getBasesExcludedBy​(CountingFilter filter)
        If INTERVALS is specified, this will count bases beyond the interval list when the read overlaps the intervals and extends beyond the edge. Ideally INTERVALS should only include regions that have hard edges without reads that could extend beyond the boundary (such as a whole contig).
      • getLocusIterator

        protected htsjdk.samtools.util.AbstractLocusIterator getLocusIterator​(htsjdk.samtools.SamReader in)
        Creates AbstractLocusIterator implementation according to USE_FAST_ALGORITHM value.
        Parameters:
        in - inner SamReader
        Returns:
        if USE_FAST_ALGORITHM is enabled, returns EdgeReadIterator implementation, otherwise default algorithm is used and SamLocusIterator is returned.