org.biojava3.core.sequence.loader
Class UniprotProxySequenceReader<C extends Compound>

java.lang.Object
  extended by org.biojava3.core.sequence.loader.UniprotProxySequenceReader<C>
Type Parameters:
C -
All Implemented Interfaces:
Iterable<C>, DatabaseReferenceInterface, FeaturesKeyWordInterface, Accessioned, ProxySequenceReader<C>, Sequence<C>, SequenceReader<C>

public class UniprotProxySequenceReader<C extends Compound>
extends Object
implements ProxySequenceReader<C>, FeaturesKeyWordInterface, DatabaseReferenceInterface

Pass in a Uniprot ID and this ProxySequenceReader when passed to a ProteinSequence will get the sequence data and other data elements associated with the ProteinSequence by Uniprot. This is an example of how to map external databases of proteins and features to the BioJava3 ProteinSequence. Important to call @see setUniprotDirectoryCache to allow caching of XML files so they don't need to be reloaded each time. Does not manage cache.


Constructor Summary
UniprotProxySequenceReader(String accession, CompoundSet<C> compoundSet)
          The uniprot id is used to retrieve the uniprot XML which is then parsed as a DOM object so we know everything about the protein.
 
Method Summary
 int countCompounds(C... compounds)
          Returns the number of times we found a compound in the Sequence
 AccessionID getAccession()
          Returns the AccessionID this location is currently bound with
 List<C> getAsList()
          Returns the Sequence as a List of compounds
 C getCompoundAt(int position)
          Returns the Compound at the given biological index
 CompoundSet<C> getCompoundSet()
          Gets the compound set used to back this Sequence
 LinkedHashMap<String,ArrayList<DBReferenceInfo>> getDatabaseReferences()
          The Uniprot mappings to other database identifiers for this sequence
 String getGeneName()
          Get the gene name associated with this sequence.
 int getIndexOf(C compound)
          Scans through the Sequence looking for the first occurrence of the given compound
 SequenceView<C> getInverse()
          Does the right thing to get the inverse of the current Sequence.
 ArrayList<String> getKeyWords()
          Pull uniprot key words which is a mixed bag of words associated with this sequence
 int getLastIndexOf(C compound)
          Scans through the Sequence looking for the last occurrence of the given compound
 int getLength()
          The sequence length
 String getOrganismName()
          Get the organism name assigned to this sequence
 String getSequenceAsString()
          Returns the String representation of the Sequence
 String getSequenceAsString(Integer bioBegin, Integer bioEnd, Strand strand)
           
 SequenceView<C> getSubSequence(Integer bioBegin, Integer bioEnd)
          Returns a portion of the sequence from the different positions.
static String getUniprotbaseURL()
          The current unirpot URL to deal with caching issues.
static String getUniprotDirectoryCache()
          Local directory cache of XML that can be downloaded
 Iterator<C> iterator()
           
static void main(String[] args)
           
 void setCompoundSet(CompoundSet<C> compoundSet)
           
 void setContents(String sequence)
          Once the sequence is retrieved set the contents and make sure everything this is valid
static void setUniprotbaseURL(String aUniprotbaseURL)
           
static void setUniprotDirectoryCache(String aUniprotDirectoryCache)
           
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

UniprotProxySequenceReader

public UniprotProxySequenceReader(String accession,
                                  CompoundSet<C> compoundSet)
                           throws Exception
The uniprot id is used to retrieve the uniprot XML which is then parsed as a DOM object so we know everything about the protein. If an error occurs throw an exception. We could have a bad uniprot id or network error

Parameters:
accession -
compoundSet -
Throws:
Exception
Method Detail

setCompoundSet

public void setCompoundSet(CompoundSet<C> compoundSet)
Specified by:
setCompoundSet in interface SequenceReader<C extends Compound>

setContents

public void setContents(String sequence)
Once the sequence is retrieved set the contents and make sure everything this is valid

Specified by:
setContents in interface SequenceReader<C extends Compound>
Parameters:
sequence -

getLength

public int getLength()
The sequence length

Specified by:
getLength in interface Sequence<C extends Compound>
Returns:

getCompoundAt

public C getCompoundAt(int position)
Description copied from interface: Sequence
Returns the Compound at the given biological index

Specified by:
getCompoundAt in interface Sequence<C extends Compound>
Parameters:
position -
Returns:

getIndexOf

public int getIndexOf(C compound)
Description copied from interface: Sequence
Scans through the Sequence looking for the first occurrence of the given compound

Specified by:
getIndexOf in interface Sequence<C extends Compound>
Parameters:
compound -
Returns:

getLastIndexOf

public int getLastIndexOf(C compound)
Description copied from interface: Sequence
Scans through the Sequence looking for the last occurrence of the given compound

Specified by:
getLastIndexOf in interface Sequence<C extends Compound>
Parameters:
compound -
Returns:

toString

public String toString()
Overrides:
toString in class Object
Returns:

getSequenceAsString

public String getSequenceAsString()
Description copied from interface: Sequence
Returns the String representation of the Sequence

Specified by:
getSequenceAsString in interface Sequence<C extends Compound>
Returns:

getAsList

public List<C> getAsList()
Description copied from interface: Sequence
Returns the Sequence as a List of compounds

Specified by:
getAsList in interface Sequence<C extends Compound>
Returns:

getInverse

public SequenceView<C> getInverse()
Description copied from interface: Sequence
Does the right thing to get the inverse of the current Sequence. This means either reversing the Sequence and optionally complementing the Sequence.

Specified by:
getInverse in interface Sequence<C extends Compound>
Returns:

getSequenceAsString

public String getSequenceAsString(Integer bioBegin,
                                  Integer bioEnd,
                                  Strand strand)
Parameters:
bioBegin -
bioEnd -
strand -
Returns:

getSubSequence

public SequenceView<C> getSubSequence(Integer bioBegin,
                                      Integer bioEnd)
Description copied from interface: Sequence
Returns a portion of the sequence from the different positions. This is indexed from 1

Specified by:
getSubSequence in interface Sequence<C extends Compound>
Parameters:
bioBegin -
bioEnd -
Returns:

iterator

public Iterator<C> iterator()
Specified by:
iterator in interface Iterable<C extends Compound>
Returns:

getCompoundSet

public CompoundSet<C> getCompoundSet()
Description copied from interface: Sequence
Gets the compound set used to back this Sequence

Specified by:
getCompoundSet in interface Sequence<C extends Compound>
Returns:

getAccession

public AccessionID getAccession()
Description copied from interface: Accessioned
Returns the AccessionID this location is currently bound with

Specified by:
getAccession in interface Accessioned
Returns:

countCompounds

public int countCompounds(C... compounds)
Description copied from interface: Sequence
Returns the number of times we found a compound in the Sequence

Specified by:
countCompounds in interface Sequence<C extends Compound>
Parameters:
compounds -
Returns:

getUniprotbaseURL

public static String getUniprotbaseURL()
The current unirpot URL to deal with caching issues. www.uniprot.org is loaded balanced but you can access pir.uniprot.org directly.

Returns:
the uniprotbaseURL

setUniprotbaseURL

public static void setUniprotbaseURL(String aUniprotbaseURL)
Parameters:
aUniprotbaseURL - the uniprotbaseURL to set

getUniprotDirectoryCache

public static String getUniprotDirectoryCache()
Local directory cache of XML that can be downloaded

Returns:
the uniprotDirectoryCache

setUniprotDirectoryCache

public static void setUniprotDirectoryCache(String aUniprotDirectoryCache)
Parameters:
aUniprotDirectoryCache - the uniprotDirectoryCache to set

main

public static void main(String[] args)

getGeneName

public String getGeneName()
                   throws Exception
Get the gene name associated with this sequence.

Returns:
Throws:
Exception

getOrganismName

public String getOrganismName()
                       throws Exception
Get the organism name assigned to this sequence

Returns:
Throws:
Exception

getKeyWords

public ArrayList<String> getKeyWords()
                              throws Exception
Pull uniprot key words which is a mixed bag of words associated with this sequence

Specified by:
getKeyWords in interface FeaturesKeyWordInterface
Returns:
Throws:
Exception

getDatabaseReferences

public LinkedHashMap<String,ArrayList<DBReferenceInfo>> getDatabaseReferences()
                                                                       throws Exception
The Uniprot mappings to other database identifiers for this sequence

Specified by:
getDatabaseReferences in interface DatabaseReferenceInterface
Returns:
Throws:
Exception