Package weka.attributeSelection
Class LatentSemanticAnalysis
- java.lang.Object
-
- weka.attributeSelection.ASEvaluation
-
- weka.attributeSelection.UnsupervisedAttributeEvaluator
-
- weka.attributeSelection.LatentSemanticAnalysis
-
- All Implemented Interfaces:
java.io.Serializable
,AttributeEvaluator
,AttributeTransformer
,CapabilitiesHandler
,OptionHandler
,RevisionHandler
public class LatentSemanticAnalysis extends UnsupervisedAttributeEvaluator implements AttributeTransformer, OptionHandler
Performs latent semantic analysis and transformation of the data. Use in conjunction with a Ranker search. A low-rank approximation of the full data is found by specifying the number of singular values to use. The dataset may be transformed to give the relation of either the attributes or the instances (default) to the concept space created by the transformation. Valid options are:-N Normalize input data.
-R Rank approximation used in LSA. May be actual number of LSA attributes to include (if greater than 1) or a proportion of total singular values to account for (if between 0 and 1). A value less than or equal to zero means use all latent variables. (default = 0.95)
-A Maximum number of attributes to include in transformed attribute names. (-1 = include all)
- Version:
- $Revision: 11821 $
- Author:
- Amri Napolitano
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description LatentSemanticAnalysis()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
buildEvaluator(Instances data)
Initializes the singular values/vectors and performs the analysisInstance
convertInstance(Instance instance)
Transform an instance in original (unnormalized) formatdouble
evaluateAttribute(int att)
Evaluates the merit of a transformed attribute.Capabilities
getCapabilities()
Returns the capabilities of this evaluator.int
getMaximumAttributeNames()
Gets maximum number of attributes to include in transformed attribute names.boolean
getNormalize()
Gets whether or not input data is to be normalizedjava.lang.String[]
getOptions()
Gets the current settings of LatentSemanticAnalysisdouble
getRank()
Gets the desired matrix rank (or coverage proportion) for feature-space reductionjava.lang.String
getRevision()
Returns the revision string.java.lang.String
globalInfo()
Returns a string describing this attribute transformerjava.util.Enumeration
listOptions()
Returns an enumeration describing the available options.static void
main(java.lang.String[] argv)
Main method for testing this classjava.lang.String
maximumAttributeNamesTipText()
Returns the tip text for this propertyjava.lang.String
normalizeTipText()
Returns the tip text for this propertyjava.lang.String
rankTipText()
Returns the tip text for this propertyvoid
setMaximumAttributeNames(int newMaxAttributes)
Sets maximum number of attributes to include in transformed attribute names.void
setNormalize(boolean newNormalize)
Set whether input data will be normalized.void
setOptions(java.lang.String[] options)
Parses a given list of options.void
setRank(double newRank)
Sets the desired matrix rank (or coverage proportion) for feature-space reductionjava.lang.String
toString()
Returns a description of this attribute transformerInstances
transformedData(Instances data)
Transform the supplied data set (assumed to be the same format as the training data)Instances
transformedHeader()
Returns just the header for the transformed data (ie.-
Methods inherited from class weka.attributeSelection.ASEvaluation
clean, forName, makeCopies, postProcess
-
-
-
-
Method Detail
-
globalInfo
public java.lang.String globalInfo()
Returns a string describing this attribute transformer- Returns:
- a description of the evaluator suitable for displaying in the explorer/experimenter gui
-
listOptions
public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(java.lang.String[] options) throws java.lang.Exception
Parses a given list of options. Valid options are:-N Normalize input data.
-R Rank approximation used in LSA. May be actual number of LSA attributes to include (if greater than 1) or a proportion of total singular values to account for (if between 0 and 1). A value less than or equal to zero means use all latent variables. (default = 0.95)
-A Maximum number of attributes to include in transformed attribute names. (-1 = include all)
- Specified by:
setOptions
in interfaceOptionHandler
- Parameters:
options
- the list of options as an array of strings- Throws:
java.lang.Exception
- if an option is not supported
-
normalizeTipText
public java.lang.String normalizeTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setNormalize
public void setNormalize(boolean newNormalize)
Set whether input data will be normalized.- Parameters:
newNormalize
- true if input data is to be normalized
-
getNormalize
public boolean getNormalize()
Gets whether or not input data is to be normalized- Returns:
- true if input data is to be normalized
-
rankTipText
public java.lang.String rankTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setRank
public void setRank(double newRank)
Sets the desired matrix rank (or coverage proportion) for feature-space reduction- Parameters:
newRank
- the desired rank (or coverage) for feature-space reduction
-
getRank
public double getRank()
Gets the desired matrix rank (or coverage proportion) for feature-space reduction- Returns:
- the rank (or coverage) for feature-space reduction
-
maximumAttributeNamesTipText
public java.lang.String maximumAttributeNamesTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMaximumAttributeNames
public void setMaximumAttributeNames(int newMaxAttributes)
Sets maximum number of attributes to include in transformed attribute names.- Parameters:
newMaxAttributes
- the maximum number of attributes
-
getMaximumAttributeNames
public int getMaximumAttributeNames()
Gets maximum number of attributes to include in transformed attribute names.- Returns:
- the maximum number of attributes
-
getOptions
public java.lang.String[] getOptions()
Gets the current settings of LatentSemanticAnalysis- Specified by:
getOptions
in interfaceOptionHandler
- Returns:
- an array of strings suitable for passing to setOptions()
-
getCapabilities
public Capabilities getCapabilities()
Returns the capabilities of this evaluator.- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Overrides:
getCapabilities
in classASEvaluation
- Returns:
- the capabilities of this evaluator
- See Also:
Capabilities
-
buildEvaluator
public void buildEvaluator(Instances data) throws java.lang.Exception
Initializes the singular values/vectors and performs the analysis- Specified by:
buildEvaluator
in classASEvaluation
- Parameters:
data
- the instances to analyse/transform- Throws:
java.lang.Exception
- if analysis fails
-
transformedHeader
public Instances transformedHeader() throws java.lang.Exception
Returns just the header for the transformed data (ie. an empty set of instances. This is so that AttributeSelection can determine the structure of the transformed data without actually having to get all the transformed data through getTransformedData().- Specified by:
transformedHeader
in interfaceAttributeTransformer
- Returns:
- the header of the transformed data.
- Throws:
java.lang.Exception
- if the header of the transformed data can't be determined.
-
transformedData
public Instances transformedData(Instances data) throws java.lang.Exception
Transform the supplied data set (assumed to be the same format as the training data)- Specified by:
transformedData
in interfaceAttributeTransformer
- Returns:
- the transformed training data
- Throws:
java.lang.Exception
- if transformed data can't be returned
-
evaluateAttribute
public double evaluateAttribute(int att) throws java.lang.Exception
Evaluates the merit of a transformed attribute. This is defined to be the square of the singular value for the latent variable corresponding to the transformed attribute.- Specified by:
evaluateAttribute
in interfaceAttributeEvaluator
- Parameters:
att
- the attribute to be evaluated- Returns:
- the merit of a transformed attribute
- Throws:
java.lang.Exception
- if attribute can't be evaluated
-
convertInstance
public Instance convertInstance(Instance instance) throws java.lang.Exception
Transform an instance in original (unnormalized) format- Specified by:
convertInstance
in interfaceAttributeTransformer
- Parameters:
instance
- an instance in the original (unnormalized) format- Returns:
- a transformed instance
- Throws:
java.lang.Exception
- if instance can't be transformed
-
toString
public java.lang.String toString()
Returns a description of this attribute transformer- Overrides:
toString
in classjava.lang.Object
- Returns:
- a String describing this attribute transformer
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classASEvaluation
- Returns:
- the revision
-
main
public static void main(java.lang.String[] argv)
Main method for testing this class- Parameters:
argv
- should contain the command line arguments to the evaluator/transformer (see AttributeSelection)
-
-