Class MMPServices


  • public class MMPServices
    extends java.lang.Object
    • Constructor Summary

      Constructors 
      Constructor Description
      MMPServices()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.util.List<java.lang.String> getCategoryNames​(java.lang.String datasetName)
      Returns the list of categories for each available numeric field
      java.util.List<java.lang.String> getChemicalSpace​(java.lang.String datasetName, java.lang.String[] keys, java.lang.String value, java.lang.String dataField)
      Gets the chemical space for a specific data set
      java.lang.String getChemicalSpaceDWAR​(java.lang.String datasetName, java.lang.String idCode, java.lang.String[] keys, java.lang.String dataField)
      Generates the DWAR file of the Chemical Space for a specific data set and a specific 'key'
      int getChemicalSpaceSize​(java.lang.String datasetName, java.lang.String key)
      Gets the size of the chemical space for a specific data set
      int getChemicalSpaceSize​(java.lang.String datasetName, java.lang.String[] keys)
      Gets the size of the chemical space for a specific data set
      java.util.List<java.lang.String> getDataFields​(java.lang.String datasetName)
      Returns a list of available (numerical) data fields for a specific data set
      java.lang.String getDatasetInformations​(java.util.ArrayList<java.lang.String> datasetNames)
      Returns general informations about a specific data set
      java.lang.String getIDCodeFromMolName​(java.lang.String datasetName, java.lang.String molName)
      Returns the idCode of a molecule from its name
      java.util.List<java.lang.String> getLongDataFields​(java.lang.String datasetName)
      Returns a list of long field names for each available numeric field
      java.lang.String getMMPsDWAR​(java.lang.String datasetName, java.lang.String idCode, java.lang.String[] keys, java.lang.String value1, java.lang.String value2, int replacementSize, java.util.List<java.lang.String> properties)
      Generates the DWAR file of Matched Molecular Pairs for a specific data set and specific transformation
      java.util.List<java.lang.String> getPercentiles5​(java.lang.String datasetName)
      Return the list of the 5% percentiles for each available numeric field
      java.util.List<java.lang.String> getPercentiles95​(java.lang.String datasetName)
      Return the list of the 95% percentiles for each available numeric field
      java.lang.String getTransformationsDWAR​(java.lang.String datasetName, java.lang.String idCode, java.lang.String[] keys, java.lang.String value1, int minAtoms, int maxAtoms, int environmentSize, java.util.List<java.lang.String> properties)
      Generates the DWAR file of the Transformations for a specific data set
      java.lang.String getTransformationsJSON​(java.lang.String datasetName, java.lang.String idCode, java.lang.String[] keys, java.lang.String value1, int minAtoms, int maxAtoms, java.lang.String sortBy)
      Generates the main JSON string for a seeded 'value'
      int getTransformationsSize​(java.lang.String datasetName, java.lang.String value1, int minAtoms, int maxAtoms)
      Gets the number of transformations for a specific data set, seed 'value' and deltas of heavy atoms
      java.util.List<java.lang.String[]> getTransformationsTable​(java.lang.String datasetName, java.lang.String[] keys, java.lang.String value1, int minAtoms, int maxAtoms)
      Returns a list of transformations
      java.lang.String readMMPFile​(java.io.BufferedReader br, boolean verbose)
      Reads a new MMP File
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • MMPServices

        public MMPServices()
                    throws java.io.IOException
        Throws:
        java.io.IOException
    • Method Detail

      • readMMPFile

        public java.lang.String readMMPFile​(java.io.BufferedReader br,
                                            boolean verbose)
                                     throws java.io.IOException,
                                            java.lang.Exception
        Reads a new MMP File
        Parameters:
        br - BufferedReader
        verbose - Verbose
        Returns:
        short name of the data set
        Throws:
        java.io.IOException
        java.lang.Exception
      • getChemicalSpaceSize

        public int getChemicalSpaceSize​(java.lang.String datasetName,
                                        java.lang.String key)
        Gets the size of the chemical space for a specific data set
        Parameters:
        datasetName - Short name of the data set
        key - idCode of the 'key' (constant part of the molecule)
        Returns:
        Size of the chemical space, -1 if the data set does not exist
      • getChemicalSpaceSize

        public int getChemicalSpaceSize​(java.lang.String datasetName,
                                        java.lang.String[] keys)
        Gets the size of the chemical space for a specific data set
        Parameters:
        datasetName - Short name of the data set
        keys - Array of 'keys' idCodes (constant part of the molecule, one for single cut, two for double cuts)
        Returns:
        Size of the chemical space, -1 if the data set does not exist
      • getChemicalSpace

        public java.util.List<java.lang.String> getChemicalSpace​(java.lang.String datasetName,
                                                                 java.lang.String[] keys,
                                                                 java.lang.String value,
                                                                 java.lang.String dataField)
        Gets the chemical space for a specific data set
        Parameters:
        datasetName - Short name of the data set
        keys - Array of 'keys' idCodes (constant part of the molecule)
        value - 'value' idCode (variable part of the molecule - not used yet). Can be null
        dataField - Name of the data field for which data should be returned. Can be null
        Returns:
        a List of tab-delimited [idCodes, moleculeName, data] entries
      • getMMPsDWAR

        public java.lang.String getMMPsDWAR​(java.lang.String datasetName,
                                            java.lang.String idCode,
                                            java.lang.String[] keys,
                                            java.lang.String value1,
                                            java.lang.String value2,
                                            int replacementSize,
                                            java.util.List<java.lang.String> properties)
        Generates the DWAR file of Matched Molecular Pairs for a specific data set and specific transformation
        Parameters:
        datasetName - Short name of the data set
        idCode - idCode of the seed molecule
        keys - Array of 'keys' idCodes (constant part of the molecule)
        value1 - seeded 'value' idCode (variable part of the molecule)
        value2 - target 'value' idCode (transformation)
        replacementSize - Difference in number of heavy atoms between seed and target fragments
        properties - List of data fields for which data should be returned
        Returns:
        a String containing the content of the whole DWAR file
      • getChemicalSpaceDWAR

        public java.lang.String getChemicalSpaceDWAR​(java.lang.String datasetName,
                                                     java.lang.String idCode,
                                                     java.lang.String[] keys,
                                                     java.lang.String dataField)
        Generates the DWAR file of the Chemical Space for a specific data set and a specific 'key'
        Parameters:
        datasetName - Short name of the data set
        idCode - idCode of the seed molecule
        keys - Array of 'keys' idCodes (constant part of the molecule)
        dataField - Name of the data field for which data should be returned. Can be null.
        Returns:
        a String containing the content of the whole DWAR file
      • getTransformationsSize

        public int getTransformationsSize​(java.lang.String datasetName,
                                          java.lang.String value1,
                                          int minAtoms,
                                          int maxAtoms)
        Gets the number of transformations for a specific data set, seed 'value' and deltas of heavy atoms
        Parameters:
        datasetName - Short name of the data set
        value1 - idCode of the seed 'value' (variable part of the molecule)
        minAtoms - minimal delta number of heavy atoms (compared to the seed fragment)
        maxAtoms - maximal delta number of heavy atoms (compared to the seed fragment)
        Returns:
        the number of transformations, -1 if the data set does not exist
      • getTransformationsTable

        public java.util.List<java.lang.String[]> getTransformationsTable​(java.lang.String datasetName,
                                                                          java.lang.String[] keys,
                                                                          java.lang.String value1,
                                                                          int minAtoms,
                                                                          int maxAtoms)
        Returns a list of transformations
        Parameters:
        datasetName - Short name of the data set
        keys - Array of 'keys' idCodes (constant part of the molecule)
        value1 - seeded 'value' idCode (variable part of the molecule)
        minAtoms - minimal delta number of heavy atoms (compared to the seed fragment)
        maxAtoms - maximal delta number of heavy atoms (compared to the seed fragment)
        Returns:
        List of String arrays ([seed, target, number of examples, transformed molecule exists])
      • getTransformationsJSON

        public java.lang.String getTransformationsJSON​(java.lang.String datasetName,
                                                       java.lang.String idCode,
                                                       java.lang.String[] keys,
                                                       java.lang.String value1,
                                                       int minAtoms,
                                                       int maxAtoms,
                                                       java.lang.String sortBy)
        Generates the main JSON string for a seeded 'value'
        Parameters:
        datasetName - Short name of the data set
        idCode - idCode of the whole seed molecule
        keys - Array of 'keys' idCodes (constant part of the molecule)
        value1 - seeded 'value' idCode (variable part of the molecule)
        minAtoms - minimal delta number of heavy atoms (compared to the seed fragment)
        maxAtoms - maximal delta number of heavy atoms (compared to the seed fragment)
        sortBy - SORT_BY_NUMBER_OF_EXAMPLES or SORT_BY_SIMILARITY
        Returns:
        a JSON string with all data
      • getTransformationsDWAR

        public java.lang.String getTransformationsDWAR​(java.lang.String datasetName,
                                                       java.lang.String idCode,
                                                       java.lang.String[] keys,
                                                       java.lang.String value1,
                                                       int minAtoms,
                                                       int maxAtoms,
                                                       int environmentSize,
                                                       java.util.List<java.lang.String> properties)
        Generates the DWAR file of the Transformations for a specific data set
        Parameters:
        datasetName - Short name of the data set
        idCode - idCode of the whole seed molecule
        keys - Array of 'keys' idCodes (constant part of the molecule)
        value1 - seeded 'value' idCode (variable part of the molecule)
        minAtoms - minimal delta number of heavy atoms (compared to the seed fragment)
        maxAtoms - maximal delta number of heavy atoms (compared to the seed fragment)
        environmentSize - Size of the local environment (0-5)
        properties - List of data fields for which data should be returned
        Returns:
        a String containing the content of the whole DWAR file
      • getIDCodeFromMolName

        public java.lang.String getIDCodeFromMolName​(java.lang.String datasetName,
                                                     java.lang.String molName)
        Returns the idCode of a molecule from its name
        Parameters:
        datasetName - Short name of the data set
        molName - Molecule name
        Returns:
        idCode string
      • getDataFields

        public java.util.List<java.lang.String> getDataFields​(java.lang.String datasetName)
        Returns a list of available (numerical) data fields for a specific data set
        Parameters:
        datasetName - Short name of the data set
        Returns:
        List of field names
      • getLongDataFields

        public java.util.List<java.lang.String> getLongDataFields​(java.lang.String datasetName)
        Returns a list of long field names for each available numeric field
        Parameters:
        datasetName - Short name of the data set
        Returns:
        List of long field names (or short names if long names are not available)
      • getCategoryNames

        public java.util.List<java.lang.String> getCategoryNames​(java.lang.String datasetName)
        Returns the list of categories for each available numeric field
        Parameters:
        datasetName - Short name of the data set
        Returns:
        List of categories, or 'other' if no categories are available
      • getPercentiles5

        public java.util.List<java.lang.String> getPercentiles5​(java.lang.String datasetName)
        Return the list of the 5% percentiles for each available numeric field
        Parameters:
        datasetName - Short name of the data set
        Returns:
        List of 5% percentiles
      • getPercentiles95

        public java.util.List<java.lang.String> getPercentiles95​(java.lang.String datasetName)
        Return the list of the 95% percentiles for each available numeric field
        Parameters:
        datasetName - Short name of the data set
        Returns:
        List of 95% percentiles
      • getDatasetInformations

        public java.lang.String getDatasetInformations​(java.util.ArrayList<java.lang.String> datasetNames)
        Returns general informations about a specific data set
        Parameters:
        datasetNames - Ordered list of data set names; required to ensure that the order is identical to the one in the settings file
        Returns:
        Tab-delimited [short data set name, number of molecules, data generation date, one random molecule name]