Package com.actelion.research.chem.io
Class DWARFileParser
- java.lang.Object
-
- com.actelion.research.chem.io.CompoundFileParser
-
- com.actelion.research.chem.io.DWARFileParser
-
- All Implemented Interfaces:
DescriptorConstants
,CompoundTableConstants
public class DWARFileParser extends CompoundFileParser implements DescriptorConstants, CompoundTableConstants
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description class
DWARFileParser.SpecialField
-
Field Summary
Fields Modifier and Type Field Description static int
MODE_BUFFER_HEAD_AND_TAIL
static int
MODE_COORDINATES_PREFER_2D
static int
MODE_COORDINATES_PREFER_3D
static int
MODE_COORDINATES_REQUIRE_2D
static int
MODE_COORDINATES_REQUIRE_3D
static int
MODE_EXTRACT_DETAILS
-
Fields inherited from class com.actelion.research.chem.io.CompoundFileParser
mReader
-
Fields inherited from interface com.actelion.research.chem.io.CompoundTableConstants
cAllowLogModeForNegativeOrZeroValues, cAutoStartMacro, cColumnName, cColumnNameRowList, cColumnProperty, cColumnProperty3DFragmentSplit, cColumnPropertyBinBase, cColumnPropertyBinIsDate, cColumnPropertyBinIsLog, cColumnPropertyBinSize, cColumnPropertyCalculated, cColumnPropertyCommentDepartment, cColumnPropertyCommentUploadStatus, cColumnPropertyCyclicDataMax, cColumnPropertyDataMax, cColumnPropertyDataMin, cColumnPropertyDescriptorVersion, cColumnPropertyDetailCount, cColumnPropertyDetailName, cColumnPropertyDetailSeparator, cColumnPropertyDetailSource, cColumnPropertyDetailType, cColumnPropertyDisplayGroup, cColumnPropertyEnd, cColumnPropertyFormula, cColumnPropertyGroupName, cColumnPropertyImagePath, cColumnPropertyIsClusterNo, cColumnPropertyIsDisplayable, cColumnPropertyIsFragment, cColumnPropertyLaunchAllowMultiple, cColumnPropertyLaunchCommand, cColumnPropertyLaunchCount, cColumnPropertyLaunchDecoration, cColumnPropertyLaunchName, cColumnPropertyLaunchOption, cColumnPropertyLookupCount, cColumnPropertyLookupDetailURL, cColumnPropertyLookupEncode, cColumnPropertyLookupFilter, cColumnPropertyLookupFilterRemoveMinus, cColumnPropertyLookupName, cColumnPropertyLookupURL, cColumnPropertyNaturalLigand, cColumnPropertyOpenExternalName, cColumnPropertyOpenExternalPath, cColumnPropertyOrbitType, cColumnPropertyParentColumn, cColumnPropertyProteinCavity, cColumnPropertyReactionPart, cColumnPropertyReferencedColumn, cColumnPropertyReferenceStrengthColumn, cColumnPropertyReferenceType, cColumnPropertyReferenceTypeRedundant, cColumnPropertyReferenceTypeTopDown, cColumnPropertyRelatedCatalystColumn, cColumnPropertyRelatedIdentifierColumn, cColumnPropertyShowNaturalLigand, cColumnPropertySpecialType, cColumnPropertyStart, cColumnPropertySuperpose, cColumnPropertySuperposeAlign, cColumnPropertySuperposeMolecule, cColumnPropertyUseThumbNail, cColumnRelationTypes, cColumnType2DCoordinates, cColumnType3DCoordinates, cColumnTypeAtomColorInfo, cColumnTypeFlagColors, cColumnTypeIDCode, cColumnTypeNegRecImage, cColumnTypeReactionMapping, cColumnTypeReactionObjects, cColumnTypeRXNCode, cColumnTypeTransformation, cColumnUnassignedCode, cColumnUnassignedItemText, cDataDependentPropertiesEnd, cDataDependentPropertiesStart, cDataTypeAutomatic, cDataTypeCode, cDataTypeDate, cDataTypeFloat, cDataTypeInteger, cDataTypeString, cDataTypeText, cDefaultDetailSeparator, cDetailDataEnd, cDetailDataStart, cDetailID, cDetailIndexSeparator, cEntrySeparator, cEntrySeparatorBytes, cExtensionNameFileExplanation, cExtensionNameMacroList, cFileExplanationEnd, cFileExplanationStart, cFlagColor, cHitlistData, cHitlistDataEnd, cHitlistDataStart, cHitlistName, cLineSeparator, cLineSeparatorByte, cMacroListEnd, cMacroListStart, cMaxDateOrDoubleCategoryCount, cMaxTextCategoryCount, cNativeFileCreated, cNativeFileHeaderEnd, cNativeFileHeaderStart, cNativeFileRowCount, cNativeFileVersion, cParentSpecialColumnTypes, cPropertiesEnd, cPropertiesStart, cRangeNotAvailable, cRangeSeparation, cReactionHiliteModeCode, cReactionHiliteModeMapping, cReactionHiliteModeNone, cReactionHiliteModeReactionCenter, cReactionHiliteModeText, cReactionPartDelimiter, cReactionPartProducts, cReactionPartReactants, cReactionPartReaction, cStructureHiliteModeCode, cStructureHiliteModeCurrentRow, cStructureHiliteModeFilter, cStructureHiliteModeNone, cStructureHiliteModeText, cSummaryModeCode, cSummaryModeMaximum, cSummaryModeMean, cSummaryModeMedian, cSummaryModeMinimum, cSummaryModeNormal, cSummaryModeSum, cSummaryModeText, cSuperposeAlignValueShape, cSuperposeValueReferenceRow, cTemplateTagName, cTextExclusionTypeContains, cTextExclusionTypeEndsWith, cTextExclusionTypeEquals, cTextExclusionTypeRegEx, cTextExclusionTypeStartsWith, cTextMultipleCategories, cViewConfigTagName, cViewNameEnd, cViewNameStart, NEWLINE_REGEX, NEWLINE_STRING, TAB_STRING
-
Fields inherited from interface com.actelion.research.chem.descriptor.DescriptorConstants
DESCRIPTOR_BINARY_SKELETONSPHERES, DESCRIPTOR_CenteredSkeletonFragments, DESCRIPTOR_EXTENDED_LIST, DESCRIPTOR_FFP512, DESCRIPTOR_Flexophore, DESCRIPTOR_FULL_FRAGMENT_SET, DESCRIPTOR_HashedCFp, DESCRIPTOR_IntegerVector, DESCRIPTOR_LIST, DESCRIPTOR_MAX_COMMON_SUBSTRUCT, DESCRIPTOR_OrganicFunctionalGroups, DESCRIPTOR_PFP512, DESCRIPTOR_PhysicoChemicalProperties, DESCRIPTOR_PTREE, DESCRIPTOR_ReactionFP, DESCRIPTOR_ShapeAlign, DESCRIPTOR_ShapeAlignSingleConf, DESCRIPTOR_SkeletonSpheres, DESCRIPTOR_SUBSTRUCT_QUERY_IN_BASE, DESCRIPTOR_TopoPPHistDist, DESCRIPTOR_TYPE_MOLECULE, DESCRIPTOR_TYPE_REACTION, DESCRIPTOR_TYPE_UNKNOWN
-
-
Constructor Summary
Constructors Constructor Description DWARFileParser(java.io.File file)
Constructs a DWARFileParser from a File with coordinate mode MODE_COORDINATES_PREFER_2D.DWARFileParser(java.io.File file, int mode)
Constructs a DWARFileParser from a File with the specified coordinate mode.DWARFileParser(java.io.Reader reader)
Constructs a DWARFileParser from a Reader with coordinate mode MODE_COORDINATES_PREFER_2D.DWARFileParser(java.io.Reader reader, int mode)
Constructs a DWARFileParser from a Reader with the specified coordinate mode.DWARFileParser(java.lang.String fileName)
Constructs a DWARFileParser from a file name with coordinate mode MODE_COORDINATES_PREFER_2D.DWARFileParser(java.lang.String fileName, int mode)
Constructs a DWARFileParser from a file name with the specified coordinate mode.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected boolean
advanceToNext()
Dont't call this method directly.int
getChildFieldIndex(java.lang.String parentColumnName, java.lang.String childType)
java.util.Properties
getColumnProperties(java.lang.String columnName)
Returns the original column properties of any source column by column name.java.lang.String
getCoordinates()
This returns encoded atom coordinates according to the defined mode.java.lang.String
getCoordinates2D()
java.lang.String
getCoordinates3D()
java.lang.Object
getDescriptor(java.lang.String shortName)
If the file source contains encoded descriptors, then overwrite this method to save the calculation time.java.util.HashMap<java.lang.String,byte[]>
getDetails()
Provided that the mode contains MODE_EXTRACT_DETAILS, then this method returns a map of all embedded detail objects of the DWAR file.java.lang.String
getFieldData(int no)
Returns the cell content of the current row.java.lang.String[]
getFieldNames()
Compiles all column names that contain alpha-numerical information.java.util.ArrayList<java.lang.String>
getHeadOrTail()
Provided that the mode contains MODE_BUFFER_HEAD_AND_TAIL, then this method returns a list of all header/footer rows of the DWAR file.java.lang.String
getIDCode()
Either this method and getCoordinates() or getMolecule() must be overwritten!!!java.lang.String
getIndex()
java.lang.String
getMoleculeName()
java.lang.String
getRow()
Returns the entire line containing all row dataint
getRowCount()
Depending on data source returns the total row count or -1 if unknownjava.lang.String
getSpecialFieldData(int fieldIndex)
int
getSpecialFieldIndex(java.lang.String columnName)
java.util.TreeMap<java.lang.String,DWARFileParser.SpecialField>
getSpecialFieldMap()
Returns a columnName->SpecialField map of all non-alphanumerical columns.java.lang.String
getStructureCoordinates3DColumnName()
boolean
hasStructureCoordinates()
boolean
hasStructureCoordinates2D()
boolean
hasStructureCoordinates3D()
boolean
hasStructures()
If you don't read any records after calling this method, don't forget to call close() to close the underlying file.-
Methods inherited from class com.actelion.research.chem.io.CompoundFileParser
close, createParser, getDescriptorHandlerFactory, getFieldIndex, getMolecule, isOpen, next, setDescriptorHandlerFactory
-
-
-
-
Field Detail
-
MODE_COORDINATES_PREFER_2D
public static final int MODE_COORDINATES_PREFER_2D
- See Also:
- Constant Field Values
-
MODE_COORDINATES_PREFER_3D
public static final int MODE_COORDINATES_PREFER_3D
- See Also:
- Constant Field Values
-
MODE_COORDINATES_REQUIRE_2D
public static final int MODE_COORDINATES_REQUIRE_2D
- See Also:
- Constant Field Values
-
MODE_COORDINATES_REQUIRE_3D
public static final int MODE_COORDINATES_REQUIRE_3D
- See Also:
- Constant Field Values
-
MODE_BUFFER_HEAD_AND_TAIL
public static final int MODE_BUFFER_HEAD_AND_TAIL
- See Also:
- Constant Field Values
-
MODE_EXTRACT_DETAILS
public static final int MODE_EXTRACT_DETAILS
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
DWARFileParser
public DWARFileParser(java.lang.String fileName)
Constructs a DWARFileParser from a file name with coordinate mode MODE_COORDINATES_PREFER_2D.- Parameters:
fileName
-
-
DWARFileParser
public DWARFileParser(java.io.File file)
Constructs a DWARFileParser from a File with coordinate mode MODE_COORDINATES_PREFER_2D.- Parameters:
file
-
-
DWARFileParser
public DWARFileParser(java.io.Reader reader)
Constructs a DWARFileParser from a Reader with coordinate mode MODE_COORDINATES_PREFER_2D.- Parameters:
reader
-
-
DWARFileParser
public DWARFileParser(java.lang.String fileName, int mode)
Constructs a DWARFileParser from a file name with the specified coordinate mode.- Parameters:
fileName
-mode
- one of 4 MODE_COORDINATE... modes
-
DWARFileParser
public DWARFileParser(java.io.File file, int mode)
Constructs a DWARFileParser from a File with the specified coordinate mode.- Parameters:
file
-mode
- one of 4 MODE_COORDINATE... modes
-
DWARFileParser
public DWARFileParser(java.io.Reader reader, int mode)
Constructs a DWARFileParser from a Reader with the specified coordinate mode.- Parameters:
reader
-mode
- one of 4 MODE_COORDINATE... modes
-
-
Method Detail
-
hasStructures
public boolean hasStructures()
If you don't read any records after calling this method, don't forget to call close() to close the underlying file.- Returns:
- whether the file contains chemical structures
-
hasStructureCoordinates
public boolean hasStructureCoordinates()
- Returns:
- whether the file contains chemical structures with explicit atom coordinates
-
hasStructureCoordinates2D
public boolean hasStructureCoordinates2D()
- Returns:
- whether the file contains chemical structures with explicit atom coordinates
-
hasStructureCoordinates3D
public boolean hasStructureCoordinates3D()
- Returns:
- whether the file contains chemical structures with explicit atom coordinates
-
getStructureCoordinates3DColumnName
public java.lang.String getStructureCoordinates3DColumnName()
-
getFieldNames
public java.lang.String[] getFieldNames()
Description copied from class:CompoundFileParser
Compiles all column names that contain alpha-numerical information. Columns containing chemistry objects, coordinates or descriptors don't appear in the list.- Specified by:
getFieldNames
in classCompoundFileParser
- Returns:
- columns name array in the order of appearance
-
getSpecialFieldIndex
public int getSpecialFieldIndex(java.lang.String columnName)
- Parameters:
columnName
-- Returns:
- field index for special fields, e.g. to be used for getSpecialFieldData()
-
getChildFieldIndex
public int getChildFieldIndex(java.lang.String parentColumnName, java.lang.String childType)
- Parameters:
parentColumnName
-childType
-- Returns:
- field index for special fields, e.g. to be used for getSpecialFieldData()
-
getRowCount
public int getRowCount()
Description copied from class:CompoundFileParser
Depending on data source returns the total row count or -1 if unknown- Specified by:
getRowCount
in classCompoundFileParser
- Returns:
- number of rows or -1
-
getHeadOrTail
public java.util.ArrayList<java.lang.String> getHeadOrTail()
Provided that the mode contains MODE_BUFFER_HEAD_AND_TAIL, then this method returns a list of all header/footer rows of the DWAR file. If this method is called before all rows have been read, then the header lines including column properties and the column title line are returned. If this method is called after all rows have been read, then all lines after the data table, i.e. the runtime properties, are returned.- Returns:
-
getDetails
public java.util.HashMap<java.lang.String,byte[]> getDetails()
Provided that the mode contains MODE_EXTRACT_DETAILS, then this method returns a map of all embedded detail objects of the DWAR file. This method must not be called before all rows have been read.- Returns:
-
getRow
public java.lang.String getRow()
Returns the entire line containing all row data- Returns:
-
advanceToNext
protected boolean advanceToNext()
Description copied from class:CompoundFileParser
Dont't call this method directly. Use next() instead.- Specified by:
advanceToNext
in classCompoundFileParser
- Returns:
- false if there is no next row
-
getIDCode
public java.lang.String getIDCode()
Description copied from class:CompoundFileParser
Either this method and getCoordinates() or getMolecule() must be overwritten!!!- Overrides:
getIDCode
in classCompoundFileParser
- Returns:
- the row content of the first column containing chemical structures
-
getCoordinates
public java.lang.String getCoordinates()
This returns encoded atom coordinates according to the defined mode. If the compound file does not contain atom coordinates, then null is returned. If mode is one of MODE_COORDINATES_REQUIRE... and the required coordinate dimensionality (2D or 3D) is not available then null is returned. If mode is one of MODE_COORDINATES_PREFER... and the preferred coordinate dimensionality (2D or 3D) is not available then coordinates in another dimensionality are returned.- Overrides:
getCoordinates
in classCompoundFileParser
- Returns:
- idcoords of first chemical structure column of the current row
-
getCoordinates2D
public java.lang.String getCoordinates2D()
-
getCoordinates3D
public java.lang.String getCoordinates3D()
-
getMoleculeName
public java.lang.String getMoleculeName()
- Specified by:
getMoleculeName
in classCompoundFileParser
- Returns:
- name/id of (primary) chemical structure of the current row
-
getDescriptor
public java.lang.Object getDescriptor(java.lang.String shortName)
Description copied from class:CompoundFileParser
If the file source contains encoded descriptors, then overwrite this method to save the calculation time.- Overrides:
getDescriptor
in classCompoundFileParser
- Returns:
- descriptor as int[] or whatever is the descriptors binary format
-
getIndex
public java.lang.String getIndex()
- Returns:
- the String encoded FragFp descriptor of the first column containing chemical structures
-
getFieldData
public java.lang.String getFieldData(int no)
Description copied from class:CompoundFileParser
Returns the cell content of the current row. Multi-line cell entries are separated by a '\n' character.- Specified by:
getFieldData
in classCompoundFileParser
- Parameters:
no
- refers to alpha-numerical columns only, as getFieldNames()- Returns:
-
getSpecialFieldMap
public java.util.TreeMap<java.lang.String,DWARFileParser.SpecialField> getSpecialFieldMap()
Returns a columnName->SpecialField map of all non-alphanumerical columns. SpecialField.type is one of the types defined in CompoundTableConstants: cColumnTypeIDCode,cColumnTypeRXNCode,cColumnType2DCoordinates,cColumnType3DCoordinates, cColumnTypeAtomColorInfo, and descriptor shortNames;- Returns:
- special fields
-
getSpecialFieldData
public java.lang.String getSpecialFieldData(int fieldIndex)
- Parameters:
fieldIndex
- is available from special-field-TreeMap by getSpecialFieldMap().get(columnName).fieldIndex- Returns:
- String encoded data content of special field, e.g. idcode
-
getColumnProperties
public java.util.Properties getColumnProperties(java.lang.String columnName)
Returns the original column properties of any source column by column name.- Parameters:
columnName
-- Returns:
-
-