Package smile.util

Class SmileUtils


  • public class SmileUtils
    extends java.lang.Object
    Some useful functions.
    Author:
    Haifeng Li
    • Method Detail

      • sort

        public static int[][] sort​(Attribute[] attributes,
                                   double[][] x)
        Sorts each variable and returns the index of values in ascending order. Only numeric attributes will be sorted. Note that the order of original array is NOT altered.
        Parameters:
        x - a set of variables to be sorted. Each row is an instance. Each column is a variable.
        Returns:
        the index of values in ascending order
      • learnGaussianRadialBasis

        public static GaussianRadialBasis learnGaussianRadialBasis​(double[][] x,
                                                                   double[][] centers)
        Learns Gaussian RBF function and centers from data. The centers are chosen as the centroids of K-Means. Let dmax be the maximum distance between the chosen centers, the standard deviation (i.e. width) of Gaussian radial basis function is dmax / sqrt(2*k), where k is number of centers. This choice would be close to the optimal solution if the data were uniformly distributed in the input space, leading to a uniform distribution of centroids.
        Parameters:
        x - the training dataset.
        centers - an array to store centers on output. Its length is used as k of k-means.
        Returns:
        a Gaussian RBF function with parameter learned from data.
      • learnGaussianRadialBasis

        public static GaussianRadialBasis[] learnGaussianRadialBasis​(double[][] x,
                                                                     double[][] centers,
                                                                     int p)
        Learns Gaussian RBF function and centers from data. The centers are chosen as the centroids of K-Means. The standard deviation (i.e. width) of Gaussian radial basis function is estimated by the p-nearest neighbors (among centers, not all samples) heuristic. A suggested value for p is 2.
        Parameters:
        x - the training dataset.
        centers - an array to store centers on output. Its length is used as k of k-means.
        p - the number of nearest neighbors of centers to estimate the width of Gaussian RBF functions.
        Returns:
        Gaussian RBF functions with parameter learned from data.
      • learnGaussianRadialBasis

        public static GaussianRadialBasis[] learnGaussianRadialBasis​(double[][] x,
                                                                     double[][] centers,
                                                                     double r)
        Learns Gaussian RBF function and centers from data. The centers are chosen as the centroids of K-Means. The standard deviation (i.e. width) of Gaussian radial basis function is estimated as the width of each cluster multiplied with a given scaling parameter r.
        Parameters:
        x - the training dataset.
        centers - an array to store centers on output. Its length is used as k of k-means.
        r - the scaling parameter.
        Returns:
        Gaussian RBF functions with parameter learned from data.
      • learnGaussianRadialBasis

        public static <T> GaussianRadialBasis learnGaussianRadialBasis​(T[] x,
                                                                       T[] centers,
                                                                       Metric<T> distance)
        Learns Gaussian RBF function and centers from data. The centers are chosen as the medoids of CLARANS. Let dmax be the maximum distance between the chosen centers, the standard deviation (i.e. width) of Gaussian radial basis function is dmax / sqrt(2*k), where k is number of centers. In this way, the radial basis functions are not too peaked or too flat. This choice would be close to the optimal solution if the data were uniformly distributed in the input space, leading to a uniform distribution of medoids.
        Parameters:
        x - the training dataset.
        centers - an array to store centers on output. Its length is used as k of CLARANS.
        distance - the distance functor.
        Returns:
        a Gaussian RBF function with parameter learned from data.
      • learnGaussianRadialBasis

        public static <T> GaussianRadialBasis[] learnGaussianRadialBasis​(T[] x,
                                                                         T[] centers,
                                                                         Metric<T> distance,
                                                                         int p)
        Learns Gaussian RBF function and centers from data. The centers are chosen as the medoids of CLARANS. The standard deviation (i.e. width) of Gaussian radial basis function is estimated by the p-nearest neighbors (among centers, not all samples) heuristic. A suggested value for p is 2.
        Parameters:
        x - the training dataset.
        centers - an array to store centers on output. Its length is used as k of CLARANS.
        distance - the distance functor.
        p - the number of nearest neighbors of centers to estimate the width of Gaussian RBF functions.
        Returns:
        Gaussian RBF functions with parameter learned from data.
      • learnGaussianRadialBasis

        public static <T> GaussianRadialBasis[] learnGaussianRadialBasis​(T[] x,
                                                                         T[] centers,
                                                                         Metric<T> distance,
                                                                         double r)
        Learns Gaussian RBF function and centers from data. The centers are chosen as the medoids of CLARANS. The standard deviation (i.e. width) of Gaussian radial basis function is estimated as the width of each cluster multiplied with a given scaling parameter r.
        Parameters:
        x - the training dataset.
        centers - an array to store centers on output. Its length is used as k of CLARANS.
        distance - the distance functor.
        r - the scaling parameter.
        Returns:
        Gaussian RBF functions with parameter learned from data.