Package weka.core.converters
Class CSVLoader
- java.lang.Object
-
- weka.core.converters.AbstractLoader
-
- weka.core.converters.AbstractFileLoader
-
- weka.core.converters.CSVLoader
-
- All Implemented Interfaces:
java.io.Serializable
,BatchConverter
,FileSourcedConverter
,Loader
,EnvironmentHandler
,OptionHandler
,RevisionHandler
public class CSVLoader extends AbstractFileLoader implements BatchConverter, OptionHandler
Reads a source that is in comma separated or tab separated format. Assumes that the first row in the file determines the number of and names of the attributes. Valid options are:-N <range> The range of attributes to force type to be NOMINAL. 'first' and 'last' are accepted as well. Examples: "first-last", "1,4,5-27,50-last" (default: -none-)
-S <range> The range of attribute to force type to be STRING. 'first' and 'last' are accepted as well. Examples: "first-last", "1,4,5-27,50-last" (default: -none-)
-D <range> The range of attribute to force type to be DATE. 'first' and 'last' are accepted as well. Examples: "first-last", "1,4,5-27,50-last" (default: -none-)
-format <date format> The date formatting string to use to parse date values. (default: "yyyy-MM-dd'T'HH:mm:ss")
-M <str> The string representing a missing value. (default: ?)
-E <enclosures> The enclosure character(s) to use for strings. Specify as a comma separated list (e.g. ",' (default: '"')
- Version:
- $Revision: 10372 $
- Author:
- Mark Hall (mhall@cs.waikato.ac.nz)
- See Also:
Loader
, Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
FILE_EXTENSION
the file extension.-
Fields inherited from class weka.core.converters.AbstractFileLoader
FILE_EXTENSION_COMPRESSED
-
Fields inherited from interface weka.core.converters.Loader
BATCH, INCREMENTAL, NONE
-
-
Constructor Summary
Constructors Constructor Description CSVLoader()
default constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String
dateAttributesTipText()
Returns the tip text for this property.java.lang.String
dateFormatTipText()
Returns the tip text for this property.java.lang.String
enclosureCharactersTipText()
Returns the tip text for this property.Instances
getDataSet()
Return the full data set.java.lang.String
getDateAttributes()
Returns the current attribute range to be forced to type date.java.lang.String
getDateFormat()
Get the format to use for parsing date values.java.lang.String
getEnclosureCharacters()
Get the character(s) to use/recognize as string enclosuresjava.lang.String
getFileDescription()
Returns a description of the file type.java.lang.String
getFileExtension()
Get the file extension used for arff files.java.lang.String[]
getFileExtensions()
Gets all the file extensions used for this type of file.java.lang.String
getMissingValue()
Returns the current placeholder for missing values.Instance
getNextInstance(Instances structure)
CSVLoader is unable to process a data set incrementally.java.lang.String
getNominalAttributes()
Returns the current attribute range to be forced to type nominal.java.lang.String[]
getOptions()
Gets the current settings of the Classifier.java.lang.String
getRevision()
Returns the revision string.java.lang.String
getStringAttributes()
Returns the current attribute range to be forced to type string.Instances
getStructure()
Determines and returns (if possible) the structure (internally the header) of the data set as an empty set of instances.java.lang.String
globalInfo()
Returns a string describing this attribute evaluator.java.util.Enumeration
listOptions()
Returns an enumeration describing the available options.static void
main(java.lang.String[] args)
Main method.java.lang.String
missingValueTipText()
Returns the tip text for this property.java.lang.String
nominalAttributesTipText()
Returns the tip text for this property.void
reset()
Resets the Loader ready to read a new data set or the same data set again.void
setDateAttributes(java.lang.String value)
Set the attribute range to be forced to type date.void
setDateFormat(java.lang.String value)
Set the format to use for parsing date values.void
setEnclosureCharacters(java.lang.String enclosure)
Set the character(s) to use/recognize as string enclosuresvoid
setMissingValue(java.lang.String value)
Sets the placeholder for missing values.void
setNominalAttributes(java.lang.String value)
Sets the attribute range to be forced to type nominal.void
setOptions(java.lang.String[] options)
Parses a given list of options.void
setSource(java.io.File file)
Resets the Loader object and sets the source of the data set to be the supplied File object.void
setSource(java.io.InputStream input)
Resets the Loader object and sets the source of the data set to be the supplied Stream object.void
setStringAttributes(java.lang.String value)
Sets the attribute range to be forced to type string.java.lang.String
stringAttributesTipText()
Returns the tip text for this property.-
Methods inherited from class weka.core.converters.AbstractFileLoader
getUseRelativePath, retrieveFile, runFileLoader, setEnvironment, setFile, setUseRelativePath, useRelativePathTipText
-
Methods inherited from class weka.core.converters.AbstractLoader
setRetrieval
-
-
-
-
Method Detail
-
getFileExtension
public java.lang.String getFileExtension()
Get the file extension used for arff files.- Specified by:
getFileExtension
in interfaceFileSourcedConverter
- Returns:
- the file extension
-
getFileDescription
public java.lang.String getFileDescription()
Returns a description of the file type.- Specified by:
getFileDescription
in interfaceFileSourcedConverter
- Returns:
- a short file description
-
getFileExtensions
public java.lang.String[] getFileExtensions()
Gets all the file extensions used for this type of file.- Specified by:
getFileExtensions
in interfaceFileSourcedConverter
- Returns:
- the file extensions
-
globalInfo
public java.lang.String globalInfo()
Returns a string describing this attribute evaluator.- Returns:
- a description of the evaluator suitable for displaying in the explorer/experimenter gui
-
listOptions
public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(java.lang.String[] options) throws java.lang.Exception
Parses a given list of options. Valid options are:-N <range> The range of attributes to force type to be NOMINAL. 'first' and 'last' are accepted as well. Examples: "first-last", "1,4,5-27,50-last" (default: -none-)
-S <range> The range of attribute to force type to be STRING. 'first' and 'last' are accepted as well. Examples: "first-last", "1,4,5-27,50-last" (default: -none-)
-D <range> The range of attribute to force type to be DATE. 'first' and 'last' are accepted as well. Examples: "first-last", "1,4,5-27,50-last" (default: -none-)
-format <date format> The date formatting string to use to parse date values. (default: "yyyy-MM-dd'T'HH:mm:ss")
-M <str> The string representing a missing value. (default: ?)
-E <enclosures> The enclosure character(s) to use for strings. Specify as a comma separated list (e.g. ",' (default: '"')
- Specified by:
setOptions
in interfaceOptionHandler
- Parameters:
options
- the list of options as an array of strings- Throws:
java.lang.Exception
- if an option is not supported
-
getOptions
public java.lang.String[] getOptions()
Gets the current settings of the Classifier.- Specified by:
getOptions
in interfaceOptionHandler
- Returns:
- an array of strings suitable for passing to setOptions
-
setNominalAttributes
public void setNominalAttributes(java.lang.String value)
Sets the attribute range to be forced to type nominal.- Parameters:
value
- the range
-
getNominalAttributes
public java.lang.String getNominalAttributes()
Returns the current attribute range to be forced to type nominal.- Returns:
- the range
-
nominalAttributesTipText
public java.lang.String nominalAttributesTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setStringAttributes
public void setStringAttributes(java.lang.String value)
Sets the attribute range to be forced to type string.- Parameters:
value
- the range
-
getStringAttributes
public java.lang.String getStringAttributes()
Returns the current attribute range to be forced to type string.- Returns:
- the range
-
stringAttributesTipText
public java.lang.String stringAttributesTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDateAttributes
public void setDateAttributes(java.lang.String value)
Set the attribute range to be forced to type date.- Parameters:
value
- the range
-
getDateAttributes
public java.lang.String getDateAttributes()
Returns the current attribute range to be forced to type date.- Returns:
- the range.
-
dateAttributesTipText
public java.lang.String dateAttributesTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDateFormat
public void setDateFormat(java.lang.String value)
Set the format to use for parsing date values.- Parameters:
value
- the format to use.
-
getDateFormat
public java.lang.String getDateFormat()
Get the format to use for parsing date values.- Returns:
- the format to use for parsing date values.
-
dateFormatTipText
public java.lang.String dateFormatTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
enclosureCharactersTipText
public java.lang.String enclosureCharactersTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setEnclosureCharacters
public void setEnclosureCharacters(java.lang.String enclosure)
Set the character(s) to use/recognize as string enclosures- Parameters:
enclosure
- the characters to use as string enclosures
-
getEnclosureCharacters
public java.lang.String getEnclosureCharacters()
Get the character(s) to use/recognize as string enclosures- Returns:
- the characters to use as string enclosures
-
setMissingValue
public void setMissingValue(java.lang.String value)
Sets the placeholder for missing values.- Parameters:
value
- the placeholder
-
getMissingValue
public java.lang.String getMissingValue()
Returns the current placeholder for missing values.- Returns:
- the placeholder
-
missingValueTipText
public java.lang.String missingValueTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setSource
public void setSource(java.io.InputStream input) throws java.io.IOException
Resets the Loader object and sets the source of the data set to be the supplied Stream object.- Specified by:
setSource
in interfaceLoader
- Overrides:
setSource
in classAbstractLoader
- Parameters:
input
- the input stream- Throws:
java.io.IOException
- if an error occurs
-
setSource
public void setSource(java.io.File file) throws java.io.IOException
Resets the Loader object and sets the source of the data set to be the supplied File object.- Specified by:
setSource
in interfaceLoader
- Overrides:
setSource
in classAbstractFileLoader
- Parameters:
file
- the source file.- Throws:
java.io.IOException
- if an error occurs
-
getStructure
public Instances getStructure() throws java.io.IOException
Determines and returns (if possible) the structure (internally the header) of the data set as an empty set of instances.- Specified by:
getStructure
in interfaceLoader
- Specified by:
getStructure
in classAbstractLoader
- Returns:
- the structure of the data set as an empty set of Instances
- Throws:
java.io.IOException
- if an error occurs
-
getDataSet
public Instances getDataSet() throws java.io.IOException
Return the full data set. If the structure hasn't yet been determined by a call to getStructure then method should do so before processing the rest of the data set.- Specified by:
getDataSet
in interfaceLoader
- Specified by:
getDataSet
in classAbstractLoader
- Returns:
- the structure of the data set as an empty set of Instances
- Throws:
java.io.IOException
- if there is no source or parsing fails
-
getNextInstance
public Instance getNextInstance(Instances structure) throws java.io.IOException
CSVLoader is unable to process a data set incrementally.- Specified by:
getNextInstance
in interfaceLoader
- Specified by:
getNextInstance
in classAbstractLoader
- Parameters:
structure
- ignored- Returns:
- never returns without throwing an exception
- Throws:
java.io.IOException
- always. CSVLoader is unable to process a data set incrementally.
-
reset
public void reset() throws java.io.IOException
Resets the Loader ready to read a new data set or the same data set again.- Specified by:
reset
in interfaceLoader
- Overrides:
reset
in classAbstractFileLoader
- Throws:
java.io.IOException
- if something goes wrong
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Returns:
- the revision
-
main
public static void main(java.lang.String[] args)
Main method.- Parameters:
args
- should contain the name of an input file.
-
-