org.biojava.bio.structure.io
Class StructureSequenceMatcher

java.lang.Object
  extended by org.biojava.bio.structure.io.StructureSequenceMatcher

public class StructureSequenceMatcher
extends Object

A utility class with methods for matching ProteinSequences with Structures.

Author:
Spencer Bliven

Constructor Summary
StructureSequenceMatcher()
           
 
Method Summary
static ProteinSequence getProteinSequenceForStructure(Structure struct, Map<Integer,Group> groupIndexPosition)
          Generates a ProteinSequence corresponding to the sequence of struct, and maintains a mapping from the sequence back to the original groups.
static ResidueNumber[] matchSequenceToStructure(ProteinSequence seq, Structure struct)
          Given a sequence and the corresponding Structure, get the ResidueNumber for each residue in the sequence.
static ProteinSequence removeGaps(ProteinSequence gapped)
          Removes all gaps ('-') from a protein sequence
static
<T> T[][]
removeGaps(T[][] gapped)
          Creates a new list consisting of all columns of gapped where no row contained a null value.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StructureSequenceMatcher

public StructureSequenceMatcher()
Method Detail

getProteinSequenceForStructure

public static ProteinSequence getProteinSequenceForStructure(Structure struct,
                                                             Map<Integer,Group> groupIndexPosition)
Generates a ProteinSequence corresponding to the sequence of struct, and maintains a mapping from the sequence back to the original groups. Chains are appended to one another. 'X' is used for heteroatoms.

Parameters:
struct - Input structure
groupIndexPosition - An empty map, which will be populated with (residue index in returned ProteinSequence) -> (Group within struct)
Returns:
A ProteinSequence with the full sequence of struct. Chains are concatenated in the same order as the input structures
See Also:
SeqRes2AtomAligner#getFullAtomSequence(List, Map)}, which does the heavy lifting.

matchSequenceToStructure

public static ResidueNumber[] matchSequenceToStructure(ProteinSequence seq,
                                                       Structure struct)
Given a sequence and the corresponding Structure, get the ResidueNumber for each residue in the sequence.

Smith-Waterman alignment is used to match the sequences. Residues in the sequence but not the structure or mismatched between sequence and structure will have a null atom, while residues in the structure but not the sequence are ignored with a warning.

Parameters:
seq - The protein sequence. Should match the sequence of struct very closely.
struct - The corresponding protein structure
Returns:
A list of ResidueNumbers of the same length as seq, containing either the corresponding residue or null.

removeGaps

public static ProteinSequence removeGaps(ProteinSequence gapped)
Removes all gaps ('-') from a protein sequence

Parameters:
gapped -
Returns:

removeGaps

public static <T> T[][] removeGaps(T[][] gapped)
Creates a new list consisting of all columns of gapped where no row contained a null value. Here, "row" refers to the first index and "column" to the second, eg gapped.get(row).get(column)

Parameters:
gapped - A rectangular matrix containing null to mark gaps
Returns:
A new List without columns containing nulls