org.biojava.bio.structure.align.util
Class AlignmentTools

java.lang.Object
  extended by org.biojava.bio.structure.align.util.AlignmentTools

public class AlignmentTools
extends Object

Some utility methods for analyzing and manipulating AFPChains.

Author:
Spencer Bliven

Nested Class Summary
static class AlignmentTools.IdentityMap<K>
          A Map can be viewed as a function from K to V.
 
Constructor Summary
AlignmentTools()
           
 
Method Summary
static Map<Integer,Integer> alignmentAsMap(AFPChain afpChain)
          Creates a Map specifying the alignment as a mapping between residue indices of protein 1 and residue indices of protein 2.
static AFPChain createAFPChain(Atom[] ca1, Atom[] ca2, ResidueNumber[] aligned1, ResidueNumber[] aligned2)
          Fundimentally, an alignment is just a list of aligned residues in each protein.
static List<List<List<Integer>>> getOptAlnAsList(AFPChain afpChain)
          Retrieves the optimum alignment from an AFPChain and returns it as a java collection.
static int getSymmetryOrder(AFPChain afpChain, int maxSymmetry, float minimumMetricChange)
          Guesses the order of symmetry in an alignment
static int getSymmetryOrder(Map<Integer,Integer> alignment, int maxSymmetry, float minimumMetricChange)
          Helper for getSymmetryOrder(Map, Map, int, float) with a true identity function (X->X).
static int getSymmetryOrder(Map<Integer,Integer> alignment, Map<Integer,Integer> identity, int maxSymmetry, float minimumMetricChange)
          Tries to detect symmetry in an alignment.
static Map<Integer,Integer> guessSequentialAlignment(Map<Integer,Integer> alignment, boolean inverseAlignment)
          Takes a potentially non-sequential alignment and guesses a sequential version of it.
static boolean isSequentialAlignment(AFPChain afpChain, boolean checkWithinBlocks)
          Checks that the alignment given by afpChain is sequential.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AlignmentTools

public AlignmentTools()
Method Detail

isSequentialAlignment

public static boolean isSequentialAlignment(AFPChain afpChain,
                                            boolean checkWithinBlocks)
Checks that the alignment given by afpChain is sequential. This means that the residue indices of both proteins increase monotonically as a function of the alignment position (ie both proteins are sorted). This will return false for circularly permuted alignments or other non-topological alignments. It will also return false for cases where the alignment itself is sequential but it is not stored in the afpChain in a sorted manner. Since algorithms which create non-sequential alignments split the alignment into multiple blocks, some computational time can be saved by only checking block boundaries for sequentiality. Setting checkWithinBlocks to true makes this function slower, but detects AFPChains with non-sequential blocks. Note that this method should give the same results as AFPChain.isSequentialAlignment(). However, the AFPChain version relies on the StructureAlignment algorithm correctly setting this parameter, which is sadly not always the case.

Parameters:
afpChain - An alignment
checkWithinBlocks - Indicates whether individual blocks should be checked for sequentiality
Returns:
True if the alignment is sequential.

alignmentAsMap

public static Map<Integer,Integer> alignmentAsMap(AFPChain afpChain)
                                           throws StructureException
Creates a Map specifying the alignment as a mapping between residue indices of protein 1 and residue indices of protein 2.

For example,

 1234
 5678
becomes
 1->5
 2->6
 3->7
 4->8

Parameters:
afpChain - An alignment
Returns:
A mapping from aligned residues of protein 1 to their partners in protein 2.
Throws:
StructureException - If afpChain is not one-to-one

getSymmetryOrder

public static int getSymmetryOrder(Map<Integer,Integer> alignment,
                                   int maxSymmetry,
                                   float minimumMetricChange)
Helper for getSymmetryOrder(Map, Map, int, float) with a true identity function (X->X).

This method should only be used in cases where the two proteins aligned have identical numbering, as for self-alignments. See getSymmetryOrder(AFPChain, int, float) for a way to guess the sequential correspondence between two proteins.

Parameters:
alignment -
maxSymmetry -
minimumMetricChange -
Returns:

getSymmetryOrder

public static int getSymmetryOrder(Map<Integer,Integer> alignment,
                                   Map<Integer,Integer> identity,
                                   int maxSymmetry,
                                   float minimumMetricChange)
Tries to detect symmetry in an alignment.

Conceptually, an alignment is a function f:A->B between two sets of integers. The function may have simple topology (meaning that if two elements of A are close, then their images in B will also be close), or may have more complex topology (such as a circular permutation). This function checks alignment against a reference function identity, which should have simple topology. It then tries to determine the symmetry order of alignment relative to identity, up to a maximum order of maxSymmetry.

Details
Considers the offset (in number of residues) which a residue moves after undergoing n alternating transforms by alignment and identity. If n corresponds to the intrinsic order of the alignment, this will be small. This algorithm tries increasing values of n and looks for abrupt decreases in the root mean squared offset. If none are found at n<=maxSymmetry, the alignment is reported as non-symmetric.

Parameters:
alignment - The alignment to test for symmetry
identity - An alignment with simple topology which approximates the sequential relationship between the two proteins. Should map in the reverse direction from alignment.
maxSymmetry - Maximum symmetry to consider. High values increase the calculation time and can lead to overfitting.
minimumMetricChange - Percent decrease in root mean squared offsets in order to declare symmetry. 0.4f seems to work well for CeSymm.
Returns:
The order of symmetry of alignment, or -1 if no order <= maxSymmetry is found.
See Also:
For a simple identity function

getSymmetryOrder

public static int getSymmetryOrder(AFPChain afpChain,
                                   int maxSymmetry,
                                   float minimumMetricChange)
                            throws StructureException
Guesses the order of symmetry in an alignment

Uses getSymmetryOrder(Map alignment, Map identity, int, float) to determine the the symmetry order. For the identity alignment, sorts the aligned residues of each protein sequentially, then defines the ith residues of each protein to be equivalent.

Throws:
StructureException

guessSequentialAlignment

public static Map<Integer,Integer> guessSequentialAlignment(Map<Integer,Integer> alignment,
                                                            boolean inverseAlignment)
Takes a potentially non-sequential alignment and guesses a sequential version of it. Residues from each structure are sorted sequentially and then compared directly.

The results of this method are consistent with what one might expect from an identity function, and are therefore useful with getSymmetryOrder(Map, Map identity, int, float).

Example:

A non sequential alignment, represented schematically as
 12456789
 78912345
would result in a map
 12456789
 12345789

Parameters:
afpChain - The non-sequential input alignment
inverseAlignment - If false, map from structure1 to structure2. If true, generate the inverse of that map.
Returns:
A mapping from sequential residues of one protein to those of the other
Throws:
IllegalArgumentException - if the input alignment is not one-to-one.

getOptAlnAsList

public static List<List<List<Integer>>> getOptAlnAsList(AFPChain afpChain)
Retrieves the optimum alignment from an AFPChain and returns it as a java collection. The result is indexed in the same way as AFPChain.getOptAln(), but has the correct size().
 List>> aln = getOptAlnAsList(AFPChain afpChain);
 aln.get(blockNum).get(structureNum={0,1}).get(pos)

Parameters:
afpChain -
Returns:

createAFPChain

public static AFPChain createAFPChain(Atom[] ca1,
                                      Atom[] ca2,
                                      ResidueNumber[] aligned1,
                                      ResidueNumber[] aligned2)
Fundimentally, an alignment is just a list of aligned residues in each protein. This method converts two lists of ResidueNumbers into an AFPChain.

Parameters:
ca1 - CA atoms of the first protein
ca2 - CA atoms of the second protein
aligned1 - A list of aligned residues from the first protein
aligned2 - A list of aligned residues from the second protein. Must be the same length as aligned1.
Returns:
An AFPChain representing the alignment. Many properties may be null or another default.
Throws:
IllegalArgumentException - if aligned1 and aligned2 have different lengths