Class CheckFingerprint


  • @DocumentedFeature
    public class CheckFingerprint
    extends CommandLineProgram
    Checks the sample identity of the sequence/genotype data in the provided file (SAM/BAM or VCF) against a set of known genotypes in the supplied genotype file (in VCF format).

    Summary

    Computes a fingerprint (essentially, genotype information from different parts of the genome) from the supplied input file (SAM/BAM or VCF) file and compares it to the expected fingerprint genotypes provided. The key output is a LOD score which represents the relative likelihood of the sequence data originating from the same sample as the genotypes vs. from a random sample.
    Two outputs are produced:
    1. A summary metrics file that gives metrics of the fingerprint matches when comparing the input to a set of genotypes for the expected sample. At the single sample level (if the input was a VCF) or at the read level (lane or index within a lane) (if the input was a SAM/BAM)
    2. A detail metrics file that contains an individual SNP/Haplotype comparison within a fingerprint comparison.
    The metrics files fill the fields of the classes FingerprintingSummaryMetrics and FingerprintingDetailMetrics. The output files may be specified individually using the SUMMARY_OUTPUT and DETAIL_OUTPUT options. Alternatively the OUTPUT option may be used instead to give the base of the two output files, with the summary metrics having a file extension "fingerprinting_summary_metrics", and the detail metrics having a file extension "fingerprinting_detail_metrics".

    Example comparing a bam against known genotypes:

         java -jar picard.jar CheckFingerprint \
              INPUT=sample.bam \
              GENOTYPES=sample_genotypes.vcf \
              HAPLOTYPE_MAP=fingerprinting_haplotype_database.txt \
              OUTPUT=sample_fingerprinting
     

    Detailed Explanation

    This tool calculates a single number that reports the LOD score for identity check between the INPUT and the GENOTYPES. A positive value indicates that the data seems to have come from the same individual or, in other words the identity checks out. The scale is logarithmic (base 10), so a LOD of 6 indicates that it is 1,000,000 more likely that the data matches the genotypes than not. A negative value indicates that the data do not match. A score that is near zero is inconclusive and can result from low coverage or non-informative genotypes.

    The identity check makes use of haplotype blocks defined in the HAPLOTYPE_MAP file to enable it to have higher statistical power for detecting identity or swap by aggregating data from several SNPs in the haplotype block. This enables an identity check of samples with very low coverage (e.g. ~1x mean coverage).

    When provided a VCF, the identity check looks at the PL, GL and GT fields (in that order) and uses the first one that it finds.

    • Field Detail

      • INPUT

        @Argument(shortName="I",
                  doc="Input file SAM/BAM/CRAM or VCF.  If a VCF is used, it must have at least one sample.  If there are more than one samples in the VCF, the parameter OBSERVED_SAMPLE_ALIAS must be provided in order to indicate which sample\'s data to use.  If there are no samples in the VCF, an exception will be thrown.")
        public String INPUT
      • OBSERVED_SAMPLE_ALIAS

        @Argument(optional=true,
                  doc="If the input is a VCF, this parameters used to select which sample\'s data in the VCF to use.")
        public String OBSERVED_SAMPLE_ALIAS
      • OUTPUT

        @Argument(shortName="O",
                  doc="The base prefix of output files to write.  The summary metrics will have the file extension \'fingerprinting_summary_metrics\' and the detail metrics will have the extension \'fingerprinting_detail_metrics\'.",
                  mutex={"SUMMARY_OUTPUT","DETAIL_OUTPUT"})
        public String OUTPUT
      • SUMMARY_OUTPUT

        @Argument(shortName="S",
                  doc="The text file to which to write summary metrics.",
                  mutex="OUTPUT")
        public File SUMMARY_OUTPUT
      • DETAIL_OUTPUT

        @Argument(shortName="D",
                  doc="The text file to which to write detail metrics.",
                  mutex="OUTPUT")
        public File DETAIL_OUTPUT
      • GENOTYPES

        @Argument(shortName="G",
                  doc="File of genotypes (VCF) to be used in comparison. May contain any number of genotypes; CheckFingerprint will use only those that are usable for fingerprinting.")
        public String GENOTYPES
      • EXPECTED_SAMPLE_ALIAS

        @Argument(shortName="SAMPLE_ALIAS",
                  optional=true,
                  doc="This parameter can be used to specify which sample\'s genotypes to use from the expected VCF file (the GENOTYPES file).  If it is not supplied, the sample name from the input (VCF or BAM read group header) will be used.")
        public String EXPECTED_SAMPLE_ALIAS
      • HAPLOTYPE_MAP

        @Argument(shortName="H",
                  doc="The file lists a set of SNPs, optionally arranged in high-LD blocks, to be used for fingerprinting. See https://software.broadinstitute.org/gatk/documentation/article?id=9526 for details.")
        public File HAPLOTYPE_MAP
      • GENOTYPE_LOD_THRESHOLD

        @Argument(shortName="LOD",
                  doc="When counting haplotypes checked and matching, count only haplotypes where the most likely haplotype achieves at least this LOD.")
        public double GENOTYPE_LOD_THRESHOLD
      • IGNORE_READ_GROUPS

        @Argument(optional=true,
                  shortName="IGNORE_RG",
                  doc="If the input is a SAM/BAM/CRAM, and this parameter is true, treat the entire input BAM as one single read group in the calculation, ignoring RG annotations, and producing a single fingerprint metric for the entire BAM.")
        public boolean IGNORE_READ_GROUPS
      • EXIT_CODE_WHEN_EXPECTED_SAMPLE_NOT_FOUND

        @Argument(doc="When the expected fingerprint sample is not found in the genotypes file, this exit code is returned.")
        public int EXIT_CODE_WHEN_EXPECTED_SAMPLE_NOT_FOUND
      • EXIT_CODE_WHEN_NO_VALID_CHECKS

        @Argument(doc="When all LOD score are zero, exit with this value.")
        public int EXIT_CODE_WHEN_NO_VALID_CHECKS
    • Constructor Detail

      • CheckFingerprint

        public CheckFingerprint()
    • Method Detail

      • doWork

        protected int doWork()
        Description copied from class: CommandLineProgram
        Do the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.
        Specified by:
        doWork in class CommandLineProgram
        Returns:
        program exit status.
      • customCommandLineValidation

        protected String[] customCommandLineValidation()
        Description copied from class: CommandLineProgram
        Put any custom command-line validation in an override of this method. clp is initialized at this point and can be used to print usage and access argv. Any options set by command-line parser can be validated.
        Overrides:
        customCommandLineValidation in class CommandLineProgram
        Returns:
        null if command line is valid. If command line is invalid, returns an array of error message to be written to the appropriate place.