org.biojava3.core.sequence.template
Class SequenceMixin

java.lang.Object
  extended by org.biojava3.core.sequence.template.SequenceMixin

public class SequenceMixin
extends Object

Provides a set of static methods to be used as static imports when needed across multiple Sequence implementations but inheritance gets in the way. It also provides a place to put utility methods whose application can be to a single class of Sequence e.g. NucleotideCompound Sequence; or to any Sequence e.g. looking for the getComposition(Sequence) or getDistribution(Sequence) for any type of Sequence. All of these methods assume that you can use the Iterable interface offered by the implementations of Sequence to provide all the compounds that implementation allows you to see. Since sequence should know nothing about its backing stores (apart from calling out to it) this should be true.

Author:
ayates

Nested Class Summary
static class SequenceMixin.SequenceIterator<C extends Compound>
          A basic sequence iterator which iterates over the given Sequence by biological index.
 
Constructor Summary
SequenceMixin()
           
 
Method Summary
static
<C extends Compound>
String
checksum(Sequence<C> sequence)
          Performs a simple CRC64 checksum on any given sequence.
static int countAT(Sequence<NucleotideCompound> sequence)
          Returns the count of AT in the given sequence
static
<C extends Compound>
int
countCompounds(Sequence<C> sequence, C... compounds)
          For the given vargs of compounds this method counts the number of times those compounds appear in the given sequence
static int countGC(Sequence<NucleotideCompound> sequence)
          Returns the count of GC in the given sequence
static
<C extends Compound>
Iterator<C>
createIterator(Sequence<C> sequence)
          Creates a simple sequence iterator which moves through a sequence going from 1 to the length of the Sequence.
static
<C extends Compound>
SequenceView<C>
createSubSequence(Sequence<C> sequence, int start, int end)
          Creates a simple sub sequence view delimited by the given start and end.
static
<C extends Compound>
Map<C,Integer>
getComposition(Sequence<C> sequence)
          Does a linear scan over the given Sequence and records the number of times each base appears.
static
<C extends Compound>
Map<C,Double>
getDistribution(Sequence<C> sequence)
          Analogous to getComposition(Sequence) but returns the distribution of that Compound over the given sequence.
static
<C extends Compound>
int
indexOf(Sequence<C> sequence, C compound)
          Performs a linear search of the given Sequence for the given compound.
static
<C extends Compound>
SequenceView<C>
inverse(Sequence<C> sequence)
          A method which attempts to do the right thing when is comes to a reverse/reverse complement
static
<C extends Compound>
int
lastIndexOf(Sequence<C> sequence, C compound)
          Performs a reversed linear search of the given Sequence by wrapping it in a ReversedSequenceView and passing it into indexOf(Sequence, Compound).
static
<C extends Compound>
List<SequenceView<C>>
nonOverlappingKmers(Sequence<C> sequence, int kmer)
          Produces kmers of the specified size e.g.
static
<C extends Compound>
List<SequenceView<C>>
overlappingKmers(Sequence<C> sequence, int kmer)
          Used to generate overlapping k-mers such i.e.
static
<C extends Compound>
boolean
sequenceEquality(Sequence<C> source, Sequence<C> target)
          A case-sensitive manner of comparing two sequence objects together.
static
<C extends Compound>
boolean
sequenceEqualityIgnoreCase(Sequence<C> source, Sequence<C> target)
          A case-insensitive manner of comparing two sequence objects together.
static
<C extends Compound>
Sequence<C>
shuffle(Sequence<C> sequence)
          Implements sequence shuffling by first materializing the given Sequence into a List, applying Collections.shuffle(List) and then returning the shuffled elements in a new instance of SequenceBackingStore which behaves as a Sequence.
static
<C extends Compound>
List<C>
toList(Sequence<C> sequence)
          For the given Sequence this will return a List filled with the Compounds of that Sequence.
static
<C extends Compound>
String
toString(Sequence<C> sequence)
          Shortcut to toStringBuilder(org.biojava3.core.sequence.template.Sequence) which calls toString() on the resulting object.
static
<C extends Compound>
StringBuilder
toStringBuilder(Sequence<C> sequence)
          For the given Sequence this will return a StringBuilder object filled with the results of Compound#toString().
static
<C extends Compound>
void
write(Appendable appendable, Sequence<C> sequence)
          Used as a way of sending a Sequence to a writer without the cost of converting to a full length String and then writing the data out
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SequenceMixin

public SequenceMixin()
Method Detail

countCompounds

public static <C extends Compound> int countCompounds(Sequence<C> sequence,
                                                      C... compounds)
For the given vargs of compounds this method counts the number of times those compounds appear in the given sequence

Type Parameters:
C - The type of compound we are looking for
Parameters:
sequence - The Sequence to perform the count on
compounds - The compounds to look for
Returns:
The number of times the given compounds appear in this Sequence

countGC

public static int countGC(Sequence<NucleotideCompound> sequence)
Returns the count of GC in the given sequence

Parameters:
sequence - The NucleotideCompound Sequence to perform the GC analysis on
Returns:
The number of GC compounds in the sequence

countAT

public static int countAT(Sequence<NucleotideCompound> sequence)
Returns the count of AT in the given sequence

Parameters:
sequence - The NucleotideCompound Sequence to perform the AT analysis on
Returns:
The number of AT compounds in the sequence

getDistribution

public static <C extends Compound> Map<C,Double> getDistribution(Sequence<C> sequence)
Analogous to getComposition(Sequence) but returns the distribution of that Compound over the given sequence.

Type Parameters:
C - The type of compound to look for
Parameters:
sequence - The type of sequence to look over
Returns:
Returns the decimal fraction of the compounds in the given sequence. Any compound not in the Map will return a fraction of 0.

getComposition

public static <C extends Compound> Map<C,Integer> getComposition(Sequence<C> sequence)
Does a linear scan over the given Sequence and records the number of times each base appears. The returned map will return 0 if a compound is asked for and the Map has no record of it.

Type Parameters:
C - The type of compound to look for
Parameters:
sequence - The type of sequence to look over
Returns:
Counts for the instances of all compounds in the sequence

write

public static <C extends Compound> void write(Appendable appendable,
                                              Sequence<C> sequence)
                  throws IOException
Used as a way of sending a Sequence to a writer without the cost of converting to a full length String and then writing the data out

Type Parameters:
C - Type of compound
Parameters:
writer - The writer to send data to
sequence - The sequence to write out
Throws:
IOException - Thrown if we encounter a problem

toStringBuilder

public static <C extends Compound> StringBuilder toStringBuilder(Sequence<C> sequence)
For the given Sequence this will return a StringBuilder object filled with the results of Compound#toString(). Does not used write(java.lang.Appendable, org.biojava3.core.sequence.template.Sequence) because of its IOException signature.


toString

public static <C extends Compound> String toString(Sequence<C> sequence)
Shortcut to toStringBuilder(org.biojava3.core.sequence.template.Sequence) which calls toString() on the resulting object.


toList

public static <C extends Compound> List<C> toList(Sequence<C> sequence)
For the given Sequence this will return a List filled with the Compounds of that Sequence.


indexOf

public static <C extends Compound> int indexOf(Sequence<C> sequence,
                                               C compound)
Performs a linear search of the given Sequence for the given compound. Once we find the compound we return the position.


lastIndexOf

public static <C extends Compound> int lastIndexOf(Sequence<C> sequence,
                                                   C compound)
Performs a reversed linear search of the given Sequence by wrapping it in a ReversedSequenceView and passing it into indexOf(Sequence, Compound). We then inverse the index coming out of it.


createIterator

public static <C extends Compound> Iterator<C> createIterator(Sequence<C> sequence)
Creates a simple sequence iterator which moves through a sequence going from 1 to the length of the Sequence. Modification of the Sequence is not allowed.


createSubSequence

public static <C extends Compound> SequenceView<C> createSubSequence(Sequence<C> sequence,
                                                                     int start,
                                                                     int end)
Creates a simple sub sequence view delimited by the given start and end.


shuffle

public static <C extends Compound> Sequence<C> shuffle(Sequence<C> sequence)
Implements sequence shuffling by first materializing the given Sequence into a List, applying Collections.shuffle(List) and then returning the shuffled elements in a new instance of SequenceBackingStore which behaves as a Sequence.


checksum

public static <C extends Compound> String checksum(Sequence<C> sequence)
Performs a simple CRC64 checksum on any given sequence.


nonOverlappingKmers

public static <C extends Compound> List<SequenceView<C>> nonOverlappingKmers(Sequence<C> sequence,
                                                                             int kmer)
Produces kmers of the specified size e.g. ATGTGA returns two views which have ATG TGA

Type Parameters:
C - Compound to use
Parameters:
sequence - Sequence to build from
kmer - Kmer size
Returns:
The list of non-overlapping K-mers

overlappingKmers

public static <C extends Compound> List<SequenceView<C>> overlappingKmers(Sequence<C> sequence,
                                                                          int kmer)
Used to generate overlapping k-mers such i.e. ATGTA will give rise to ATG, TGT & GTA

Type Parameters:
C - Compound to use
Parameters:
sequence - Sequence to build from
kmer - Kmer size
Returns:
The list of overlapping K-mers

inverse

public static <C extends Compound> SequenceView<C> inverse(Sequence<C> sequence)
A method which attempts to do the right thing when is comes to a reverse/reverse complement

Type Parameters:
C - The type of compound
Parameters:
sequence - The input sequence
Returns:
The inverted sequence which is optionally complemented

sequenceEqualityIgnoreCase

public static <C extends Compound> boolean sequenceEqualityIgnoreCase(Sequence<C> source,
                                                                      Sequence<C> target)
A case-insensitive manner of comparing two sequence objects together. We will throw out any compounds which fail to match on their sequence length & compound sets used. The code will also bail out the moment we find something is wrong with a Sequence. Cost to run is linear to the length of the Sequence.

Type Parameters:
C - The type of compound
Parameters:
source - Source sequence to assess
target - Target sequence to assess
Returns:
Boolean indicating if the sequences matched ignoring case

sequenceEquality

public static <C extends Compound> boolean sequenceEquality(Sequence<C> source,
                                                            Sequence<C> target)
A case-sensitive manner of comparing two sequence objects together. We will throw out any compounds which fail to match on their sequence length & compound sets used. The code will also bail out the moment we find something is wrong with a Sequence. Cost to run is linear to the length of the Sequence.

Type Parameters:
C - The type of compound
Parameters:
source - Source sequence to assess
target - Target sequence to assess
Returns:
Boolean indicating if the sequences matched