org.biojava3.aaproperties
Interface IPeptideProperties

All Known Implementing Classes:
PeptidePropertiesImpl

public interface IPeptideProperties

An interface to generate some basic physico-chemical properties of protein sequences.
The following properties could be generated:

Molecular weight
Absorbance
Extinction coefficient
Instability index
Apliphatic index
Average hydropathy value
Isoelectric point
Net charge at pH 7
Composition of specified amino acid
Composition of the 20 standard amino acid

Version:
2011.05.09
Author:
kohchuanhock
See Also:
PeptideProperties

Method Summary
 Map<AminoAcidCompound,Double> getAAComposition(ProteinSequence sequence)
          Returns the composition of the 20 standard amino acid in the sequence.
 double getAbsorbance(ProteinSequence sequence, boolean assumeCysReduced)
          Returns the absorbance (optical density) of sequence.
 double getApliphaticIndex(ProteinSequence sequence)
          Returns the apliphatic index of sequence.
 double getAvgHydropathy(ProteinSequence sequence)
          Returns the average hydropathy value of sequence.
 double getEnrichment(ProteinSequence sequence, AminoAcidCompound aminoAcidCode)
          Returns the composition of specified amino acid in the sequence.
 double getExtinctionCoefficient(ProteinSequence sequence, boolean assumeCysReduced)
          Returns the extinction coefficient of sequence.
 double getInstabilityIndex(ProteinSequence sequence)
          Returns the instability index of sequence.
 double getIsoelectricPoint(ProteinSequence seuqence)
           
 double getIsoelectricPoint(ProteinSequence sequence, boolean useExpasyValues)
          Returns the isoelectric point of sequence.
 double getMolecularWeight(ProteinSequence sequence)
          Returns the molecular weight of sequence.
 double getMolecularWeight(ProteinSequence sequence, File aminoAcidCompositionFile)
          Returns the molecular weight of sequence.
 double getMolecularWeight(ProteinSequence sequence, File elementMassFile, File aminoAcidCompositionFile)
          Returns the molecular weight of sequence.
 double getMolecularWeightBasedOnXML(ProteinSequence sequence, AminoAcidCompositionTable aminoAcidCompositionTable)
          Returns the molecular weight of sequence.
 double getNetCharge(ProteinSequence sequence)
           
 double getNetCharge(ProteinSequence sequence, boolean useExpasyValues)
           
 double getNetCharge(ProteinSequence sequence, boolean useExpasyValues, double pHPoint)
          Returns the net charge of sequence at pH 7.
 AminoAcidCompositionTable obtainAminoAcidCompositionTable(File aminoAcidCompositionFile)
          This method would initialize amino acid composition table based on the input xml files and stores the table for usage in future calls to IPeptideProperties.getMolecularWeightBasedOnXML(ProteinSequence, AminoAcidCompositionTable).
 AminoAcidCompositionTable obtainAminoAcidCompositionTable(File elementMassFile, File aminoAcidCompositionFile)
          This method would initialize amino acid composition table based on the input xml files and stores the table for usage in future calls to IPeptideProperties.getMolecularWeightBasedOnXML(ProteinSequence, AminoAcidCompositionTable).
 

Method Detail

getMolecularWeight

double getMolecularWeight(ProteinSequence sequence)
Returns the molecular weight of sequence. The sequence argument must be a protein sequence consisting of only non-ambiguous characters. This method will sum the molecular weight of each amino acid in the sequence. Molecular weights are based on here.

Parameters:
sequence - a protein sequence consisting of non-ambiguous characters only
Returns:
the total molecular weight of sequence + weight of water molecule
See Also:
ProteinSequence

getMolecularWeight

double getMolecularWeight(ProteinSequence sequence,
                          File aminoAcidCompositionFile)
                          throws JAXBException,
                                 FileNotFoundException
Returns the molecular weight of sequence. The sequence argument must be a protein sequence consisting of only non-ambiguous characters. This method will sum the molecular weight of each amino acid in the sequence. Molecular weights are based on the input files. These input files must be XML using the defined schema. Note that it assumes that ElementMass.xml file can be found in default location.

Parameters:
sequence - a protein sequence consisting of non-ambiguous characters only xml file that details the mass of each elements and isotopes
aminoAcidCompositionFile - xml file that details the composition of amino acids
Returns:
the total molecular weight of sequence + weight of water molecule
Throws:
JAXBException - thrown if unable to properly parse either elementMassFile or aminoAcidCompositionFile
FileNotFoundException - thrown if either elementMassFile or aminoAcidCompositionFile are not found

getMolecularWeight

double getMolecularWeight(ProteinSequence sequence,
                          File elementMassFile,
                          File aminoAcidCompositionFile)
                          throws JAXBException,
                                 FileNotFoundException
Returns the molecular weight of sequence. The sequence argument must be a protein sequence consisting of only non-ambiguous characters. This method will sum the molecular weight of each amino acid in the sequence. Molecular weights are based on the input files. These input files must be XML using the defined schema.

Parameters:
sequence - a protein sequence consisting of non-ambiguous characters only
elementMassFile - xml file that details the mass of each elements and isotopes
aminoAcidCompositionFile - xml file that details the composition of amino acids
Returns:
the total molecular weight of sequence + weight of water molecule
Throws:
JAXBException - thrown if unable to properly parse either elementMassFile or aminoAcidCompositionFile
FileNotFoundException - thrown if either elementMassFile or aminoAcidCompositionFile are not found

getMolecularWeightBasedOnXML

double getMolecularWeightBasedOnXML(ProteinSequence sequence,
                                    AminoAcidCompositionTable aminoAcidCompositionTable)
Returns the molecular weight of sequence. The sequence argument must be a protein sequence consisting of only non-ambiguous characters. This method will sum the molecular weight of each amino acid in the sequence. Molecular weights are based on the AminoAcidCompositionTable. Those input files must be XML using the defined schema.

Parameters:
sequence - a protein sequence consisting of non-ambiguous characters only
aminoAcidCompositionTable - a amino acid composition table obtained by calling IPeptideProperties.obtainAminoAcidCompositionTable
Returns:
the total molecular weight of sequence + weight of water molecule

obtainAminoAcidCompositionTable

AminoAcidCompositionTable obtainAminoAcidCompositionTable(File aminoAcidCompositionFile)
                                                          throws JAXBException,
                                                                 FileNotFoundException
This method would initialize amino acid composition table based on the input xml files and stores the table for usage in future calls to IPeptideProperties.getMolecularWeightBasedOnXML(ProteinSequence, AminoAcidCompositionTable). Note that ElementMass.xml is assumed to be able to be seen in default location.

Parameters:
aminoAcidCompositionFile - xml file that details the composition of amino acids
Returns:
the initialized amino acid composition table
Throws:
JAXBException - thrown if unable to properly parse either elementMassFile or aminoAcidCompositionFile
FileNotFoundException - thrown if either elementMassFile or aminoAcidCompositionFile are not found

obtainAminoAcidCompositionTable

AminoAcidCompositionTable obtainAminoAcidCompositionTable(File elementMassFile,
                                                          File aminoAcidCompositionFile)
                                                          throws JAXBException,
                                                                 FileNotFoundException
This method would initialize amino acid composition table based on the input xml files and stores the table for usage in future calls to IPeptideProperties.getMolecularWeightBasedOnXML(ProteinSequence, AminoAcidCompositionTable).

Parameters:
elementMassFile - xml file that details the mass of each elements and isotopes
aminoAcidCompositionFile - xml file that details the composition of amino acids
Returns:
the initialized amino acid composition table
Throws:
JAXBException - thrown if unable to properly parse either elementMassFile or aminoAcidCompositionFile
FileNotFoundException - thrown if either elementMassFile or aminoAcidCompositionFile are not found

getExtinctionCoefficient

double getExtinctionCoefficient(ProteinSequence sequence,
                                boolean assumeCysReduced)
Returns the extinction coefficient of sequence. The sequence argument must be a protein sequence consisting of only non-ambiguous characters. The extinction coefficient indicates how much light a protein absorbs at a certain wavelength. It is useful to have an estimation of this coefficient for following a protein which a spectrophotometer when purifying it. The computation of extinction coefficient follows the documentation in here.

Parameters:
sequence - a protein sequence consisting of non-ambiguous characters only
assumeCysReduced - true if Cys are assumed to be reduced and false if Cys are assumed to form cystines
Returns:
the extinction coefficient of sequence
See Also:
ProteinSequence

getAbsorbance

double getAbsorbance(ProteinSequence sequence,
                     boolean assumeCysReduced)
Returns the absorbance (optical density) of sequence. The sequence argument must be a protein sequence consisting of only non-ambiguous characters. The computation of absorbance (optical density) follows the documentation in here.

Parameters:
sequence - a protein sequence consisting of non-ambiguous characters only
assumeCysReduced - true if Cys are assumed to be reduced and false if Cys are assumed to form cystines
Returns:
the absorbance (optical density) of sequence
See Also:
ProteinSequence

getInstabilityIndex

double getInstabilityIndex(ProteinSequence sequence)
Returns the instability index of sequence. The sequence argument must be a protein sequence consisting of only non-ambiguous characters. The instability index provides an estimate of the stability of your protein in a test tube. The computation of instability index follows the documentation in here.

Parameters:
sequence - a protein sequence consisting of non-ambiguous characters only
Returns:
the instability index of sequence
See Also:
ProteinSequence

getApliphaticIndex

double getApliphaticIndex(ProteinSequence sequence)
Returns the apliphatic index of sequence. The sequence argument must be a protein sequence consisting of only non-ambiguous characters. The aliphatic index of a protein is defined as the relative volume occupied by aliphatic side chains (alanine, valine, isoleucine, and leucine). It may be regarded as a positive factor for the increase of thermostability of globular proteins. The computation of aliphatic index follows the documentation in here. A protein whose instability index is smaller than 40 is predicted as stable, a value above 40 predicts that the protein may be unstable.

Parameters:
sequence - a protein sequence consisting of non-ambiguous characters only
Returns:
the aliphatic index of sequence
See Also:
ProteinSequence

getAvgHydropathy

double getAvgHydropathy(ProteinSequence sequence)
Returns the average hydropathy value of sequence. The sequence argument must be a protein sequence consisting of only non-ambiguous characters. The average value for a sequence is calculated as the sum of hydropathy values of all the amino acids, divided by the number of residues in the sequence. Hydropathy values are based on (Kyte, J. and Doolittle, R.F. (1982) A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105-132).

Parameters:
sequence - a protein sequence consisting of non-ambiguous characters only
Returns:
the average hydropathy value of sequence
See Also:
ProteinSequence

getIsoelectricPoint

double getIsoelectricPoint(ProteinSequence sequence,
                           boolean useExpasyValues)
Returns the isoelectric point of sequence. The sequence argument must be a protein sequence consisting of only non-ambiguous characters. The isoelectric point is the pH at which the protein carries no net electrical charge. The isoelectric point will be computed based on approach stated in here pKa values used will be either those used by Expasy which referenced "Electrophoresis 1994, 15, 529-539" OR A.Lehninger, Principles of Biochemistry, 4th Edition (2005), Chapter 3, page78, Table 3-1.

Parameters:
sequence - a protein sequence consisting of non-ambiguous characters only
useExpasyValues - whether to use Expasy values (Default) or Innovagen values
Returns:
the isoelectric point of sequence
See Also:
ProteinSequence

getIsoelectricPoint

double getIsoelectricPoint(ProteinSequence seuqence)

getNetCharge

double getNetCharge(ProteinSequence sequence,
                    boolean useExpasyValues,
                    double pHPoint)
Returns the net charge of sequence at pH 7. The sequence argument must be a protein sequence consisting of only non-ambiguous characters. The net charge will be computed using the approach stated in
Parameters:
sequence - a protein sequence consisting of non-ambiguous characters only
useExpasyValues - whether to use Expasy values (Default) or Innovagen values
pHPoint - the pH value to use for computation of the net charge. Default at 7.
Returns:
the net charge of sequence at given pHPoint
See Also:
ProteinSequence

getNetCharge

double getNetCharge(ProteinSequence sequence,
                    boolean useExpasyValues)

getNetCharge

double getNetCharge(ProteinSequence sequence)

getEnrichment

double getEnrichment(ProteinSequence sequence,
                     AminoAcidCompound aminoAcidCode)
Returns the composition of specified amino acid in the sequence. The sequence argument must be a protein sequence consisting of only non-ambiguous characters. The aminoAcidCode must be a non-ambiguous character. The composition of an amino acid is the total number of its occurrence, divided by the total length of the sequence.

Parameters:
sequence - a protein sequence consisting of non-ambiguous characters only
aminoAcidCode - the code of the amino acid to compute
Returns:
the composition of specified amino acid in the sequence
See Also:
ProteinSequence, AminoAcidCompound

getAAComposition

Map<AminoAcidCompound,Double> getAAComposition(ProteinSequence sequence)
Returns the composition of the 20 standard amino acid in the sequence. The sequence argument must be a protein sequence consisting of only non-ambiguous characters. The composition of an amino acid is the total number of its occurrence, divided by the total length of the sequence.

Parameters:
sequence - a protein sequence consisting of non-ambiguous characters only
Returns:
the composition of the 20 standard amino acid in the sequence
See Also:
ProteinSequence, AminoAcidCompound