org.biojava.bio.structure.align.util
Class AtomCache

java.lang.Object
  extended by org.biojava.bio.structure.align.util.AtomCache

public class AtomCache
extends Object

A utility class that provides easy access to Structure objects. If you are running a script that is frequently re-using the same PDB structures, the AtomCache keeps an in-memory cache of the files for quicker access. The cache is a soft-cache, this means it won't cause out of memory exceptions, but garbage collects the data if the Java virtual machine needs to free up space. The AtomCache is thread-safe.

Since:
3.0
Author:
Andreas Prlic, Spencer Bliven, Peter Rose

Field Summary
static String CHAIN_NR_SYMBOL
           
static String CHAIN_SPLIT_SYMBOL
           
static String PDP_DOMAIN_IDENTIFIER
           
protected  PDPProvider pdpprovider
           
static Pattern scopIDregex
           
static String UNDERSCORE
           
 
Constructor Summary
AtomCache()
          Default AtomCache constructor.
AtomCache(String pdbFilePath, boolean isSplit)
          Creates an instance of an AtomCache that is pointed to the a particular path in the file system.
AtomCache(UserConfiguration config)
          Creates a new AtomCache object based on the provided UserConfiguration.
 
Method Summary
 Atom[] getAtoms(String name)
          Returns the CA atoms for the provided name.
 Atom[] getAtoms(String name, boolean clone)
          Deprecated. does the same as getAtoms(String) ;
 Structure getBiologicalAssembly(String pdbId, int bioAssemblyId, boolean bioAssemblyFallback)
          Loads the biological assembly for a given PDB ID and bioAssemblyId.
 Structure getBiologicalUnit(String pdbId)
          Loads the default biological unit (*.pdb1.gz) file.
 FileParsingParameters getFileParsingParams()
           
 String getPath()
          Get the path that is used to cache PDB files.
 PDPProvider getPdpprovider()
           
 ScopDatabase getScopInstallation()
           
 Structure getStructure(String name)
          Request a Structure based on a name.
 Structure getStructureForDomain(ScopDomain domain)
          Returns the representation of a ScopDomain as a BioJava Structure object
 boolean isAutoFetch()
          Does the cache automatically download files that are missing from the local installation from the PDB FTP site?
 boolean isFetchCurrent()
          N.B. This feature won't work unless the structure wasn't found & autoFetch is set to true.
 boolean isFetchFileEvenIfObsolete()
          forces the cache to fetch the file if its status is OBSOLETE.
 boolean isSplit()
          Is the organization of files within the directory split, as on the PDB FTP servers, or are all files contained in one directory.
 boolean isStrictSCOP()
          Reports whether strict scop naming will be enforced, or whether this AtomCache should try to guess some simple variants on scop domains.
 void notifyShutdown()
          Send a signal to the cache that the system is shutting down.
 void setAutoFetch(boolean autoFetch)
          Does the cache automatically download files that are missing from the local installation from the PDB FTP site?
 void setFetchCurrent(boolean fetchNewestCurrent)
          if enabled, the reader searches for the newest possible PDB ID, if not present in he local installation.
 void setFetchFileEvenIfObsolete(boolean fetchFileEvenIfObsolete)
          N.B. This feature won't work unless the structure wasn't found & autoFetch is set to true.
 void setFileParsingParams(FileParsingParameters params)
           
 void setPath(String path)
          Set the path that is used to cache PDB files.
 void setPdpprovider(PDPProvider pdpprovider)
           
 void setSplit(boolean isSplit)
          Is the organization of files within the directory split, as on the PDB FTP servers, or are all files contained in one directory.
 void setStrictSCOP(boolean strictSCOP)
          When strictSCOP is enabled, SCOP domain identifiers (eg 'd1gbga_') are matched literally to the SCOP database.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CHAIN_NR_SYMBOL

public static final String CHAIN_NR_SYMBOL
See Also:
Constant Field Values

UNDERSCORE

public static final String UNDERSCORE
See Also:
Constant Field Values

CHAIN_SPLIT_SYMBOL

public static final String CHAIN_SPLIT_SYMBOL
See Also:
Constant Field Values

pdpprovider

protected PDPProvider pdpprovider

PDP_DOMAIN_IDENTIFIER

public static final String PDP_DOMAIN_IDENTIFIER
See Also:
Constant Field Values

scopIDregex

public static final Pattern scopIDregex
Constructor Detail

AtomCache

public AtomCache()
Default AtomCache constructor. Usually stores files in a temp directory, but this can be overriden by setting the PDB_DIR variable at runtime.

See Also:
UserConfiguration.UserConfiguration()

AtomCache

public AtomCache(String pdbFilePath,
                 boolean isSplit)
Creates an instance of an AtomCache that is pointed to the a particular path in the file system.

Parameters:
pdbFilePath - a directory in the file system to use as a location to cache files.
isSplit - a flag to indicate if the directory organisation is "split" as on the PDB ftp servers, or if all files are contained in one directory.

AtomCache

public AtomCache(UserConfiguration config)
Creates a new AtomCache object based on the provided UserConfiguration.

Parameters:
config - the UserConfiguration to use for this cache.
Method Detail

getPath

public String getPath()
Get the path that is used to cache PDB files.

Returns:
path to a directory

setPath

public void setPath(String path)
Set the path that is used to cache PDB files.

Parameters:
path - to a directory

isSplit

public boolean isSplit()
Is the organization of files within the directory split, as on the PDB FTP servers, or are all files contained in one directory.

Returns:
flag

setSplit

public void setSplit(boolean isSplit)
Is the organization of files within the directory split, as on the PDB FTP servers, or are all files contained in one directory.

Parameters:
isSplit - flag

isAutoFetch

public boolean isAutoFetch()
Does the cache automatically download files that are missing from the local installation from the PDB FTP site?

Returns:
flag

setAutoFetch

public void setAutoFetch(boolean autoFetch)
Does the cache automatically download files that are missing from the local installation from the PDB FTP site?

Parameters:
autoFetch - flag

setFetchFileEvenIfObsolete

public void setFetchFileEvenIfObsolete(boolean fetchFileEvenIfObsolete)
N.B. This feature won't work unless the structure wasn't found & autoFetch is set to true.

Parameters:
fetchFileEvenIfObsolete - the fetchFileEvenIfObsolete to set

isFetchFileEvenIfObsolete

public boolean isFetchFileEvenIfObsolete()
forces the cache to fetch the file if its status is OBSOLETE. This feature has a higher priority than setFetchCurrent(boolean).
N.B. This feature won't work unless the structure wasn't found & autoFetch is set to true.

Returns:
the fetchFileEvenIfObsolete
Since:
3.0.2
See Also:
fetchCurrent

setFetchCurrent

public void setFetchCurrent(boolean fetchNewestCurrent)
if enabled, the reader searches for the newest possible PDB ID, if not present in he local installation. The setFetchFileEvenIfObsolete(boolean) function has a higher priority than this function.
N.B. This feature won't work unless the structure wasn't found & autoFetch is set to true.

Parameters:
fetchCurrent - the fetchCurrent to set
Since:
3.0.2
See Also:
setFetchFileEvenIfObsolete(boolean)

isFetchCurrent

public boolean isFetchCurrent()
N.B. This feature won't work unless the structure wasn't found & autoFetch is set to true.

Returns:
the fetchCurrent

isStrictSCOP

public boolean isStrictSCOP()
Reports whether strict scop naming will be enforced, or whether this AtomCache should try to guess some simple variants on scop domains.

Returns:
true if scop names should be used strictly with no guessing

setStrictSCOP

public void setStrictSCOP(boolean strictSCOP)
When strictSCOP is enabled, SCOP domain identifiers (eg 'd1gbga_') are matched literally to the SCOP database. When disabled, some simple mistakes are corrected automatically. For instance, the invalid identifier 'd1gbg__' would be corrected to 'd1gbga_' automatically.

Parameters:
strictSCOP - Indicates whether strict scop names should be used.

getStructureForDomain

public Structure getStructureForDomain(ScopDomain domain)
                                throws IOException,
                                       StructureException
Returns the representation of a ScopDomain as a BioJava Structure object

Parameters:
domain - a scop domain
Returns:
a Structure object.
Throws:
IOException
StructureException

getAtoms

public Atom[] getAtoms(String name)
                throws IOException,
                       StructureException
Returns the CA atoms for the provided name. See getStructure(String) for supported naming conventions.

Parameters:
name -
Returns:
an array of Atoms.
Throws:
IOException
StructureException

getAtoms

public Atom[] getAtoms(String name,
                       boolean clone)
                throws IOException,
                       StructureException
Deprecated. does the same as getAtoms(String) ;

Returns the CA atoms for the provided name. See getStructure(String) for supported naming conventions.

Parameters:
name -
clone - flag to make sure that the atoms are getting coned
Returns:
an array of Atoms.
Throws:
IOException
StructureException

getStructure

public Structure getStructure(String name)
                       throws IOException,
                              StructureException
Request a Structure based on a name.
                Formal specification for how to specify the name:

                name     := pdbID
                               | pdbID '.' chainID
                               | pdbID '.' range
                               | scopID
                range         := '('? range (',' range)? ')'?
                               | chainID
                               | chainID '_' resNum '-' resNum
                pdbID         := [0-9][a-zA-Z0-9]{3}
                chainID       := [a-zA-Z0-9]
                scopID        := 'd' pdbID [a-z_][0-9_]
                resNum        := [-+]?[0-9]+[A-Za-z]?


                Example structures:
                1TIM     #whole structure
                4HHB.C     #single chain
                4GCR.A_1-83     #one domain, by residue number
                3AA0.A,B     #two chains treated as one structure
                d2bq6a1     #scop domain
                
With the additional set of rules:

Parameters:
name -
Returns:
a Structure object, or null if name appears improperly formated (eg too short, etc)
Throws:
IOException - The PDB file cannot be cached due to IO errors
StructureException - The name appeared valid but did not correspond to a structure. Also thrown by some submethods upon errors, eg for poorly formatted subranges.

getScopInstallation

public ScopDatabase getScopInstallation()

getFileParsingParams

public FileParsingParameters getFileParsingParams()

setFileParsingParams

public void setFileParsingParams(FileParsingParameters params)

getBiologicalUnit

public Structure getBiologicalUnit(String pdbId)
                            throws StructureException,
                                   IOException
Loads the default biological unit (*.pdb1.gz) file. If it is not available, the original PDB file will be loaded, i.e., for NMR structures, where the original files is also the biological assembly.

Parameters:
pdbId - the PDB ID
Returns:
a structure object
Throws:
IOException
StructureException
Since:
3.2

getBiologicalAssembly

public Structure getBiologicalAssembly(String pdbId,
                                       int bioAssemblyId,
                                       boolean bioAssemblyFallback)
                                throws StructureException,
                                       IOException
Loads the biological assembly for a given PDB ID and bioAssemblyId. If a bioAssemblyId > 0 is specified, the corresponding biological assembly file will be loaded. Note, the number of available biological unit files varies. Many entries don't have a biological assembly specified (i.e. NMR structures), many entries have only one biological assembly (bioAssemblyId=1), and a few structures have multiple biological assemblies. Set bioAssemblyFallback to true, to download the original PDB file in cases that a biological assembly file is not available.

Parameters:
pdbId - the PDB ID
bioAssemblyId - the ID of the biological assembly
bioAssemblyFallback - if true, try reading original PDB file in case the biological assembly file is not available
Returns:
a structure object
Throws:
IOException
StructureException
Since:
3.2

notifyShutdown

public void notifyShutdown()
Send a signal to the cache that the system is shutting down. Notifies underlying SerializableCache instances to flush themselves...


getPdpprovider

public PDPProvider getPdpprovider()

setPdpprovider

public void setPdpprovider(PDPProvider pdpprovider)