Class DnaCoder

  • All Implemented Interfaces:
    java.io.Serializable
    Direct Known Subclasses:
    DnaQualityCoder

    public class DnaCoder
    extends Coder
    Class used to encode & decode sequences into binary and vice-versa Note:This is a singleton class. It stores DNA bases into 2 bits {a,c,g,t} <-> {0,1,2,3}
    Author:
    pcingola
    See Also:
    Serialized Form
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int basesPerWord()
      How many bases can we pack in a word
      int baseToBits​(char c)
      Encode a base using 2 bits
      int baseToBits​(char c, boolean ignoreErrors)  
      int bitsPerBase()
      How many bits do we need for each base
      void copyBases​(long[] src, int srcStart, long[] dst, int dstStart, int length)
      Copy 'length' bases from 'src' (starting from 'srcStart') to 'dst' (starting from 'dstStart')
      void copyBases​(long[] src, long[] dst, int start, int length)
      Copy 'length' bases from 'src' to 'dst' (starting from 'start')
      int decodeWord​(long word, int pos)
      Decode bits from a given position
      long encodeWord​(char base, int pos)
      Encode a base to a given position in a word
      static DnaCoder get()  
      int lastBaseinWord()
      Index of the last base coded in a word
      int length2words​(int len)
      Calculate the coded length of a sequence in 'words' (depends on coder)
      long mask​(int baseIndexInWord)
      Bitmask for a base in a word
      long replaceBase​(long code, int pos, char newBase)
      Decode bits from a given position
      long reverseBases​(long code)
      Reverse all bases in 'code'
      int score​(long[] dst, long[] src, int srcStart, int length, int threshold)
      Calculate a 'score' for a sequence (dst) and a sub-sequence (src).
      char toBase​(int code)
      Decode a base using 2 bits
      char toBase​(long word, int pos)
      Decode a base from a given position in a word
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • debug

        public static boolean debug
      • LAST_BASE_IN_LONGWORD

        protected static final int LAST_BASE_IN_LONGWORD
        See Also:
        Constant Field Values
      • TO_BASE

        public static final char[] TO_BASE
      • MASK_BASE

        public long[] MASK_BASE
      • MASK_LOW

        public long[] MASK_LOW
      • MASK_HIGH

        public long[] MASK_HIGH
      • COUNT_DIFFS

        public int[] COUNT_DIFFS
    • Method Detail

      • basesPerWord

        public int basesPerWord()
        Description copied from class: Coder
        How many bases can we pack in a word
        Specified by:
        basesPerWord in class Coder
        Returns:
      • baseToBits

        public int baseToBits​(char c)
        Encode a base using 2 bits
        Specified by:
        baseToBits in class Coder
        Parameters:
        c -
        Returns:
      • baseToBits

        public int baseToBits​(char c,
                              boolean ignoreErrors)
      • bitsPerBase

        public int bitsPerBase()
        Description copied from class: Coder
        How many bits do we need for each base
        Specified by:
        bitsPerBase in class Coder
        Returns:
      • copyBases

        public void copyBases​(long[] src,
                              int srcStart,
                              long[] dst,
                              int dstStart,
                              int length)
        Copy 'length' bases from 'src' (starting from 'srcStart') to 'dst' (starting from 'dstStart')
        Parameters:
        src -
        srcStart -
        dst -
        length -
      • copyBases

        public void copyBases​(long[] src,
                              long[] dst,
                              int start,
                              int length)
        Copy 'length' bases from 'src' to 'dst' (starting from 'start')
        Parameters:
        src -
        start -
        dst -
        length -
      • decodeWord

        public int decodeWord​(long word,
                              int pos)
        Decode bits from a given position
        Specified by:
        decodeWord in class Coder
        Parameters:
        word -
        pos -
        Returns:
      • encodeWord

        public long encodeWord​(char base,
                               int pos)
        Encode a base to a given position in a word
        Parameters:
        base -
        pos -
        Returns:
      • lastBaseinWord

        public int lastBaseinWord()
        Description copied from class: Coder
        Index of the last base coded in a word
        Specified by:
        lastBaseinWord in class Coder
        Returns:
      • length2words

        public int length2words​(int len)
        Calculate the coded length of a sequence in 'words' (depends on coder)
        Parameters:
        len -
        Returns:
      • mask

        public long mask​(int baseIndexInWord)
        Description copied from class: Coder
        Bitmask for a base in a word
        Specified by:
        mask in class Coder
        Returns:
      • replaceBase

        public long replaceBase​(long code,
                                int pos,
                                char newBase)
        Decode bits from a given position
        Parameters:
        code -
        pos -
        Returns:
      • reverseBases

        public long reverseBases​(long code)
        Reverse all bases in 'code'
        Parameters:
        linearIndex -
        Returns:
      • score

        public int score​(long[] dst,
                         long[] src,
                         int srcStart,
                         int length,
                         int threshold)
        Calculate a 'score' for a sequence (dst) and a sub-sequence (src). The score is the number of equal bases (or zero if they differ)
        Parameters:
        dst - : Destination sequence codes[]
        src - : Source sequence codes[]
        srcStart - : Source sub-sequence start
        length - : Number of bases to compare
        threshold - : Number of bases allowed to differ
        Returns:
      • toBase

        public char toBase​(int code)
        Decode a base using 2 bits
        Specified by:
        toBase in class Coder
        Returns:
      • toBase

        public char toBase​(long word,
                           int pos)
        Description copied from class: Coder
        Decode a base from a given position in a word
        Specified by:
        toBase in class Coder
        Returns: