Package picard.sam
Class DuplicationMetrics
- java.lang.Object
-
- htsjdk.samtools.metrics.MetricBase
-
- picard.analysis.MergeableMetricBase
-
- picard.sam.DuplicationMetrics
-
public class DuplicationMetrics extends MergeableMetricBase
Metrics that are calculated during the process of marking duplicates within a stream of SAMRecords.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class picard.analysis.MergeableMetricBase
MergeableMetricBase.MergeByAdding, MergeableMetricBase.MergeByAssertEquals, MergeableMetricBase.MergingIsManual, MergeableMetricBase.NoMergingIsDerived, MergeableMetricBase.NoMergingKeepsValue
-
-
Field Summary
Fields Modifier and Type Field Description Long
ESTIMATED_LIBRARY_SIZE
The estimated number of unique molecules in the library based on PE duplication.String
LIBRARY
The library on which the duplicate marking was performed.Double
PERCENT_DUPLICATION
The fraction of mapped sequence that is marked as duplicate.long
READ_PAIR_DUPLICATES
The number of read pairs that were marked as duplicates.long
READ_PAIR_OPTICAL_DUPLICATES
The number of read pairs duplicates that were caused by optical duplication.long
READ_PAIRS_EXAMINED
The number of mapped read pairs examined.long
SECONDARY_OR_SUPPLEMENTARY_RDS
The number of reads that were either secondary or supplementarylong
UNMAPPED_READS
The total number of unmapped reads examined.long
UNPAIRED_READ_DUPLICATES
The number of fragments that were marked as duplicates.long
UNPAIRED_READS_EXAMINED
The number of mapped reads examined which did not have a mapped mate pair, either because the read is unpaired, or the read is paired to an unmapped mate.
-
Constructor Summary
Constructors Constructor Description DuplicationMetrics()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
calculateDerivedFields()
Fills in the ESTIMATED_LIBRARY_SIZE based on the paired read data examined where possible and the PERCENT_DUPLICATION.void
calculateDerivedMetrics()
Deprecated.htsjdk.samtools.util.Histogram<Double>
calculateRoiHistogram()
Calculates a histogram using the estimateRoi method to estimate the effective yield doing x sequencing for x=1..10.static Long
estimateLibrarySize(long readPairs, long uniqueReadPairs)
Estimates the size of a library based on the number of paired end molecules observed and the number of unique pairs observed.static double
estimateRoi(long estimatedLibrarySize, double x, long pairs, long uniquePairs)
Estimates the ROI (return on investment) that one would see if a library was sequenced to x higher coverage than the observed coverage.static void
main(String[] args)
-
Methods inherited from class picard.analysis.MergeableMetricBase
canMerge, merge, merge, mergeIfCan
-
-
-
-
Field Detail
-
LIBRARY
public String LIBRARY
The library on which the duplicate marking was performed.
-
UNPAIRED_READS_EXAMINED
public long UNPAIRED_READS_EXAMINED
The number of mapped reads examined which did not have a mapped mate pair, either because the read is unpaired, or the read is paired to an unmapped mate.
-
READ_PAIRS_EXAMINED
public long READ_PAIRS_EXAMINED
The number of mapped read pairs examined. (Primary, non-supplemental)
-
SECONDARY_OR_SUPPLEMENTARY_RDS
public long SECONDARY_OR_SUPPLEMENTARY_RDS
The number of reads that were either secondary or supplementary
-
UNMAPPED_READS
public long UNMAPPED_READS
The total number of unmapped reads examined. (Primary, non-supplemental)
-
UNPAIRED_READ_DUPLICATES
public long UNPAIRED_READ_DUPLICATES
The number of fragments that were marked as duplicates.
-
READ_PAIR_DUPLICATES
public long READ_PAIR_DUPLICATES
The number of read pairs that were marked as duplicates.
-
READ_PAIR_OPTICAL_DUPLICATES
public long READ_PAIR_OPTICAL_DUPLICATES
The number of read pairs duplicates that were caused by optical duplication. Value is always < READ_PAIR_DUPLICATES, which counts all duplicates regardless of source.
-
PERCENT_DUPLICATION
public Double PERCENT_DUPLICATION
The fraction of mapped sequence that is marked as duplicate.
-
ESTIMATED_LIBRARY_SIZE
public Long ESTIMATED_LIBRARY_SIZE
The estimated number of unique molecules in the library based on PE duplication.
-
-
Method Detail
-
calculateDerivedFields
public void calculateDerivedFields()
Fills in the ESTIMATED_LIBRARY_SIZE based on the paired read data examined where possible and the PERCENT_DUPLICATION.- Overrides:
calculateDerivedFields
in classMergeableMetricBase
-
calculateDerivedMetrics
@Deprecated public void calculateDerivedMetrics()
Deprecated.Fills in the ESTIMATED_LIBRARY_SIZE based on the paired read data examined where possible and the PERCENT_DUPLICATION.Deprecated, use
calculateDerivedFields()
instead.
-
estimateLibrarySize
public static Long estimateLibrarySize(long readPairs, long uniqueReadPairs)
Estimates the size of a library based on the number of paired end molecules observed and the number of unique pairs observed.Based on the Lander-Waterman equation that states: C/X = 1 - exp( -N/X ) where X = number of distinct molecules in library N = number of read pairs C = number of distinct fragments observed in read pairs
-
estimateRoi
public static double estimateRoi(long estimatedLibrarySize, double x, long pairs, long uniquePairs)
Estimates the ROI (return on investment) that one would see if a library was sequenced to x higher coverage than the observed coverage.- Parameters:
estimatedLibrarySize
- the estimated number of molecules in the libraryx
- the multiple of sequencing to be simulated (i.e. how many X sequencing)pairs
- the number of pairs observed in the actual sequencinguniquePairs
- the number of unique pairs observed in the actual sequencing- Returns:
- a number z <= x that estimates if you had pairs*x as your sequencing then you would observe uniquePairs*z unique pairs.
-
calculateRoiHistogram
public htsjdk.samtools.util.Histogram<Double> calculateRoiHistogram()
Calculates a histogram using the estimateRoi method to estimate the effective yield doing x sequencing for x=1..10.
-
main
public static void main(String[] args)
-
-