Package smile.data

Class Dataset<E>

  • Type Parameters:
    E - the type of data objects.
    All Implemented Interfaces:
    java.lang.Iterable<Datum<E>>
    Direct Known Subclasses:
    AttributeDataset

    public class Dataset<E>
    extends java.lang.Object
    implements java.lang.Iterable<Datum<E>>
    A set of data objects.
    Author:
    Haifeng Li
    • Constructor Summary

      Constructors 
      Constructor Description
      Dataset()
      Constructor.
      Dataset​(java.lang.String name)
      Constructor.
      Dataset​(java.lang.String name, Attribute response)
      Constructor.
      Dataset​(Attribute response)
      Constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      Datum<E> add​(E x)
      Add a datum item into the dataset.
      Datum<E> add​(E x, double y)
      Add a datum item into the dataset.
      Datum<E> add​(E x, double y, double weight)
      Add a datum item into the dataset.
      Datum<E> add​(E x, int y)
      Add a datum item into the dataset.
      Datum<E> add​(E x, int y, double weight)
      Add a datum item into the dataset.
      Datum<E> add​(Datum<E> x)
      Add a datum item into the dataset.
      java.util.List<Datum<E>> data()
      Returns the data set.
      Datum<E> get​(int i)
      Returns the element at the specified position in this dataset.
      java.lang.String getDescription()
      Returns the detailed dataset description.
      java.lang.String getName()
      Returns the dataset name.
      java.util.Iterator<Datum<E>> iterator()
      Returns an iterator over the elements in this dataset in proper sequence.
      int[] labels()
      Returns the class labels.
      Datum<E> remove​(int i)
      Removes the element at the specified position in this dataset.
      AttributeVector response()
      Returns the response attribute vector.
      Attribute responseAttribute()
      Returns the attribute of the response variable.
      void setDescription​(java.lang.String description)
      Sets the detailed dataset description.
      void setName​(java.lang.String name)
      Sets the dataset name.
      int size()
      Returns the size of dataset.
      double[] toArray​(double[] a)
      Returns an array containing the response variable of the elements in this dataset in proper sequence (from first to last element).
      int[] toArray​(int[] a)
      Returns an array containing the class labels of the elements in this dataset in proper sequence (from first to last element).
      E[] toArray​(E[] a)
      Returns an array containing all of the elements in this dataset in proper sequence (from first to last element); the runtime type of the returned array is that of the specified array.
      java.lang.String[] toArray​(java.lang.String[] a)
      Returns an array containing the string names of the elements in this dataset in proper sequence (from first to last element).
      java.sql.Timestamp[] toArray​(java.sql.Timestamp[] a)
      Returns an array containing the timestamps of the elements in this dataset in proper sequence (from first to last element).
      double[] y()
      Returns the response values.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface java.lang.Iterable

        forEach, spliterator
    • Field Detail

      • DATASET_HAS_NO_RESPONSE

        protected static final java.lang.String DATASET_HAS_NO_RESPONSE
        See Also:
        Constant Field Values
      • RESPONSE_NOT_NOMINAL

        protected static final java.lang.String RESPONSE_NOT_NOMINAL
        See Also:
        Constant Field Values
      • RESPONSE_NOT_NUMERIC

        protected static final java.lang.String RESPONSE_NOT_NUMERIC
        See Also:
        Constant Field Values
      • name

        protected java.lang.String name
        The name of dataset.
      • description

        protected java.lang.String description
        The optional detailed description of dataset.
      • response

        protected Attribute response
        The attribute property of response variable. null means no response variable.
      • data

        protected java.util.List<Datum<E>> data
        The data objects.
    • Constructor Detail

      • Dataset

        public Dataset()
        Constructor.
      • Dataset

        public Dataset​(java.lang.String name)
        Constructor.
        Parameters:
        name - the name of dataset.
      • Dataset

        public Dataset​(Attribute response)
        Constructor.
        Parameters:
        response - the attribute type of response variable.
      • Dataset

        public Dataset​(java.lang.String name,
                       Attribute response)
        Constructor.
        Parameters:
        name - the name of dataset.
        response - the attribute type of response variable.
    • Method Detail

      • getName

        public java.lang.String getName()
        Returns the dataset name.
      • setName

        public void setName​(java.lang.String name)
        Sets the dataset name.
      • setDescription

        public void setDescription​(java.lang.String description)
        Sets the detailed dataset description.
      • getDescription

        public java.lang.String getDescription()
        Returns the detailed dataset description.
      • responseAttribute

        public Attribute responseAttribute()
        Returns the attribute of the response variable. null means no response variable in this dataset.
        Returns:
        the attribute of the response variable. null means no response variable in this dataset.
      • response

        public AttributeVector response()
        Returns the response attribute vector. null means no response variable in this dataset.
        Returns:
        the response attribute vector. null means no response variable in this dataset.
      • size

        public int size()
        Returns the size of dataset.
      • data

        public java.util.List<Datum<E>> data()
        Returns the data set.
      • add

        public Datum<E> add​(Datum<E> x)
        Add a datum item into the dataset.
        Parameters:
        x - a datum item.
        Returns:
        the added datum item.
      • add

        public Datum<E> add​(E x)
        Add a datum item into the dataset.
        Parameters:
        x - a datum item.
        Returns:
        the added datum item.
      • add

        public Datum<E> add​(E x,
                            int y)
        Add a datum item into the dataset.
        Parameters:
        x - a datum item.
        y - the class label of the datum.
        Returns:
        the added datum item.
      • add

        public Datum<E> add​(E x,
                            int y,
                            double weight)
        Add a datum item into the dataset.
        Parameters:
        x - a datum item.
        y - the class label of the datum.
        weight - the weight of datum. The particular meaning of weight depends on applications and machine learning algorithms. Although there are on explicit requirements on the weights, in general, they should be positive.
        Returns:
        the added datum item.
      • add

        public Datum<E> add​(E x,
                            double y)
        Add a datum item into the dataset.
        Parameters:
        x - a datum item.
        y - the real-valued response for regression.
        Returns:
        the added datum item.
      • add

        public Datum<E> add​(E x,
                            double y,
                            double weight)
        Add a datum item into the dataset.
        Parameters:
        x - a datum item.
        weight - the weight of datum. The particular meaning of weight depends on applications and machine learning algorithms. Although there are on explicit requirements on the weights, in general, they should be positive.
        Returns:
        the added datum item.
      • remove

        public Datum<E> remove​(int i)
        Removes the element at the specified position in this dataset.
        Parameters:
        i - the index of the element to be removed.
        Returns:
        the element previously at the specified position.
      • get

        public Datum<E> get​(int i)
        Returns the element at the specified position in this dataset.
        Parameters:
        i - the index of the element to be returned.
      • iterator

        public java.util.Iterator<Datum<E>> iterator()
        Returns an iterator over the elements in this dataset in proper sequence.
        Specified by:
        iterator in interface java.lang.Iterable<E>
        Returns:
        an iterator over the elements in this dataset in proper sequence
      • y

        public double[] y()
        Returns the response values.
      • labels

        public int[] labels()
        Returns the class labels.
      • toArray

        public E[] toArray​(E[] a)
        Returns an array containing all of the elements in this dataset in proper sequence (from first to last element); the runtime type of the returned array is that of the specified array. If the dataset fits in the specified array, it is returned therein. Otherwise, a new array is allocated with the runtime type of the specified array and the size of this dataset.

        If the dataset fits in the specified array with room to spare (i.e., the array has more elements than the dataset), the element in the array immediately following the end of the dataset is set to null.

        Parameters:
        a - the array into which the elements of this dataset are to be stored, if it is big enough; otherwise, a new array of the same runtime type is allocated for this purpose.
        Returns:
        an array containing the elements of this list.
      • toArray

        public int[] toArray​(int[] a)
        Returns an array containing the class labels of the elements in this dataset in proper sequence (from first to last element). Unknown labels will be saved as Integer.MIN_VALUE. If the dataset fits in the specified array, it is returned therein. Otherwise, a new array is allocated with the size of this dataset.

        If the dataset fits in the specified array with room to spare (i.e., the array has more elements than the dataset), the element in the array immediately following the end of the dataset is set to Integer.MIN_VALUE.

        Parameters:
        a - the array into which the class labels of this dataset are to be stored, if it is big enough; otherwise, a new array is allocated for this purpose.
        Returns:
        an array containing the class labels of this dataset.
      • toArray

        public double[] toArray​(double[] a)
        Returns an array containing the response variable of the elements in this dataset in proper sequence (from first to last element). If the dataset fits in the specified array, it is returned therein. Otherwise, a new array is allocated with the size of this dataset.

        If the dataset fits in the specified array with room to spare (i.e., the array has more elements than the dataset), the element in the array immediately following the end of the dataset is set to Double.NaN.

        Parameters:
        a - the array into which the response variable of this dataset are to be stored, if it is big enough; otherwise, a new array is allocated for this purpose.
        Returns:
        an array containing the response variable of this dataset.
      • toArray

        public java.lang.String[] toArray​(java.lang.String[] a)
        Returns an array containing the string names of the elements in this dataset in proper sequence (from first to last element). If the dataset fits in the specified array, it is returned therein. Otherwise, a new array is allocated with the size of this dataset.

        If the dataset fits in the specified array with room to spare (i.e., the array has more elements than the dataset), the element in the array immediately following the end of the dataset is set to null.

        Parameters:
        a - the array into which the string names of the elements in this dataset are to be stored, if it is big enough; otherwise, a new array is allocated for this purpose.
        Returns:
        an array containing the string names of the elements in this dataset.
      • toArray

        public java.sql.Timestamp[] toArray​(java.sql.Timestamp[] a)
        Returns an array containing the timestamps of the elements in this dataset in proper sequence (from first to last element). If the dataset fits in the specified array, it is returned therein. Otherwise, a new array is allocated with the size of this dataset.

        If the dataset fits in the specified array with room to spare (i.e., the array has more elements than the dataset), the element in the array immediately following the end of the dataset is set to null.

        Parameters:
        a - the array into which the timestamps of the elements in this dataset are to be stored, if it is big enough; otherwise, a new array is allocated for this purpose.
        Returns:
        an array containing the timestamps of the elements in this dataset.