org.biojava3.core.sequence.storage
Class BitSequenceReader.BitArrayWorker<C extends Compound>

java.lang.Object
  extended by org.biojava3.core.sequence.storage.BitSequenceReader.BitArrayWorker<C>
Type Parameters:
C - The Compound to use
Direct Known Subclasses:
FourBitSequenceReader.FourBitArrayWorker, TwoBitSequenceReader.TwoBitArrayWorker
Enclosing class:
BitSequenceReader<C extends Compound>

public abstract static class BitSequenceReader.BitArrayWorker<C extends Compound>
extends Object

The logic of working with a bit has been separated out into this class to help developers create the bit data structures without having to put the code into an intermediate format and to also use the format without the need to copy this code. This class behaves just like a Sequence without the interface

Author:
ayates

Field Summary
static int BYTES_PER_INT
           
 
Constructor Summary
BitSequenceReader.BitArrayWorker(CompoundSet<C> compoundSet, int length)
           
BitSequenceReader.BitArrayWorker(CompoundSet<C> compoundSet, int[] sequence)
           
BitSequenceReader.BitArrayWorker(Sequence<C> sequence)
           
BitSequenceReader.BitArrayWorker(String sequence, CompoundSet<C> compoundSet)
           
 
Method Summary
protected abstract  byte bitMask()
          This method should return the bit mask to be used to extract the bytes you are interested in working with.
protected  int bitsPerCompound()
          Returns how many bits are used to represent a compound e.g.
protected abstract  int compoundsPerDatatype()
          Should return the maximum amount of compounds we can encode per int
 boolean equals(Object o)
           
protected abstract  Map<C,Integer> generateCompoundsToIndex()
          Returns what the value of a compound is in the backing bit storage i.e.
protected abstract  List<C> generateIndexToCompounds()
          Should return the inverse information that generateCompoundsToIndex() returns i.e.
 C getCompoundAt(int position)
          Returns the compound at the specified biological index
 CompoundSet<C> getCompoundSet()
          Returns the compound set backing this store
protected  Map<C,Integer> getCompoundsToIndexLookup()
          Returns a map which converts from compound to an integer representation
protected  List<C> getIndexToCompoundsLookup()
          Returns a list of compounds the index position of which is used to translate from the byte representation into a compound.
 int getLength()
           
 int hashCode()
           
 void populate(Sequence<C> sequence)
          Loops through the Compounds in a Sequence and passes them onto setCompoundAt(Compound, int)
 void populate(String sequence)
          Loops through the chars in a String and passes them onto setCompoundAt(char, int)
protected  byte processUnknownCompound(C compound, int position)
          Since bit encoding only supports a finite number of bases it is more than likely when processing sequence you will encounter a compound which is not covered by the encoding e.g.
 int seqArraySize(int length)
           
 void setCompoundAt(char base, int position)
          Converts from char to Compound and sets it at the given biological index
 void setCompoundAt(C compound, int position)
          Sets the compound at the specified biological index
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

BYTES_PER_INT

public static final int BYTES_PER_INT
See Also:
Constant Field Values
Constructor Detail

BitSequenceReader.BitArrayWorker

public BitSequenceReader.BitArrayWorker(Sequence<C> sequence)

BitSequenceReader.BitArrayWorker

public BitSequenceReader.BitArrayWorker(String sequence,
                                        CompoundSet<C> compoundSet)

BitSequenceReader.BitArrayWorker

public BitSequenceReader.BitArrayWorker(CompoundSet<C> compoundSet,
                                        int length)

BitSequenceReader.BitArrayWorker

public BitSequenceReader.BitArrayWorker(CompoundSet<C> compoundSet,
                                        int[] sequence)
Method Detail

bitMask

protected abstract byte bitMask()
This method should return the bit mask to be used to extract the bytes you are interested in working with. See solid implementations on how to create these


compoundsPerDatatype

protected abstract int compoundsPerDatatype()
Should return the maximum amount of compounds we can encode per int


generateIndexToCompounds

protected abstract List<C> generateIndexToCompounds()
Should return the inverse information that generateCompoundsToIndex() returns i.e. if the Compound C returns 1 from compoundsToIndex then we should find that compound here in position 1


generateCompoundsToIndex

protected abstract Map<C,Integer> generateCompoundsToIndex()
Returns what the value of a compound is in the backing bit storage i.e. in 2bit storage the value 0 is encoded as 00 (in binary).


bitsPerCompound

protected int bitsPerCompound()
Returns how many bits are used to represent a compound e.g. 2 if using 2bit encoding.


seqArraySize

public int seqArraySize(int length)

populate

public void populate(Sequence<C> sequence)
Loops through the Compounds in a Sequence and passes them onto setCompoundAt(Compound, int)


populate

public void populate(String sequence)
Loops through the chars in a String and passes them onto setCompoundAt(char, int)


setCompoundAt

public void setCompoundAt(char base,
                          int position)
Converts from char to Compound and sets it at the given biological index


setCompoundAt

public void setCompoundAt(C compound,
                          int position)
Sets the compound at the specified biological index


getCompoundAt

public C getCompoundAt(int position)
Returns the compound at the specified biological index


processUnknownCompound

protected byte processUnknownCompound(C compound,
                                      int position)
                               throws IllegalStateException
Since bit encoding only supports a finite number of bases it is more than likely when processing sequence you will encounter a compound which is not covered by the encoding e.g. N in a 2bit sequence. You can override this to convert the unknown base into one you can process or store locations of unknown bases for a level of post processing in your subclass.

Parameters:
compound - Compound process
Returns:
Byte representation of the compound
Throws:
IllegalStateException - Done whenever this method is invoked

getIndexToCompoundsLookup

protected List<C> getIndexToCompoundsLookup()
Returns a list of compounds the index position of which is used to translate from the byte representation into a compound.


getCompoundsToIndexLookup

protected Map<C,Integer> getCompoundsToIndexLookup()
Returns a map which converts from compound to an integer representation


getCompoundSet

public CompoundSet<C> getCompoundSet()
Returns the compound set backing this store


getLength

public int getLength()

hashCode

public int hashCode()
Overrides:
hashCode in class Object

equals

public boolean equals(Object o)
Overrides:
equals in class Object