Class CharUtilities


  • public class CharUtilities
    extends java.lang.Object
    This class provides utilities to distinguish various kinds of Unicode whitespace and to get character widths in a given FontState.
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      protected CharUtilities()
      Utility class: Constructor prevents instantiating when subclassed.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static java.lang.String charToNCRef​(int c)
      Convert a single unicode scalar value to an XML numeric character reference.
      static int classOf​(int c)
      Return the appropriate CharClass constant for the type of the passed character.
      static java.lang.Iterable<java.lang.Integer> codepointsIter​(java.lang.CharSequence s)
      Creates an iterator to iter a CharSequence codepoints.
      static java.lang.Iterable<java.lang.Integer> codepointsIter​(java.lang.CharSequence s, int beginIndex, int endIndex)
      Creates an iterator to iter a sub-CharSequence codepoints.
      static boolean containsSurrogatePairAt​(java.lang.CharSequence chars, int index)
      Tells whether there is a surrogate pair starting from the given index in the CharSequence.
      static java.lang.String format​(int c)
      Format character for debugging output, which it is prefixed with "0x", padded left with '0' and either 4 or 6 hex characters in width according to whether it is in the BMP or not.
      static int incrementIfNonBMP​(int codePoint)
      Returns 1 if codePoint not in the BMP.
      static boolean isAdjustableSpace​(int c)
      Method to determine if the character is an adjustable space.
      static boolean isAlphabetic​(int c)
      Indicates whether a character is classified as "Alphabetic" by the Unicode standard.
      static boolean isAnySpace​(int c)
      Determines if the character represents any kind of space.
      static boolean isBmpCodePoint​(int codePoint)
      Determine whether the specified character (Unicode code point) is in then Basic Multilingual Plane (BMP).
      static boolean isBreakableSpace​(int c)
      Helper method to determine if the character is a space with normal behavior.
      static boolean isExplicitBreak​(int c)
      Indicates whether the given character is an explicit break-character
      static boolean isFixedWidthSpace​(int c)
      Method to determine if the character is a (breakable) fixed-width space.
      static boolean isNonBreakableSpace​(int c)
      Method to determine if the character is a nonbreaking space.
      static boolean isSameSequence​(java.lang.CharSequence cs1, java.lang.CharSequence cs2)
      Determine if two character sequences contain the same characters.
      static boolean isSurrogatePair​(char ch)
      Determine if the given characters is part of a surrogate pair.
      static boolean isZeroWidthSpace​(int c)
      Method to determine if the character is a zero-width space.
      static java.lang.String padLeft​(java.lang.String s, int width, char pad)
      Pad a string S on left out to width W using padding character PAD.
      static java.lang.String toNCRefs​(java.lang.String s)
      Convert a string to a sequence of ASCII or XML numeric character references.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • CharUtilities

        protected CharUtilities()
        Utility class: Constructor prevents instantiating when subclassed.
    • Method Detail

      • classOf

        public static int classOf​(int c)
        Return the appropriate CharClass constant for the type of the passed character.
        Parameters:
        c - character to inspect
        Returns:
        the determined character class
      • isBreakableSpace

        public static boolean isBreakableSpace​(int c)
        Helper method to determine if the character is a space with normal behavior. Normal behavior means that it's not non-breaking.
        Parameters:
        c - character to inspect
        Returns:
        True if the character is a normal space
      • isZeroWidthSpace

        public static boolean isZeroWidthSpace​(int c)
        Method to determine if the character is a zero-width space.
        Parameters:
        c - the character to check
        Returns:
        true if the character is a zero-width space
      • isFixedWidthSpace

        public static boolean isFixedWidthSpace​(int c)
        Method to determine if the character is a (breakable) fixed-width space.
        Parameters:
        c - the character to check
        Returns:
        true if the character has a fixed-width
      • isNonBreakableSpace

        public static boolean isNonBreakableSpace​(int c)
        Method to determine if the character is a nonbreaking space.
        Parameters:
        c - character to check
        Returns:
        True if the character is a nbsp
      • isAdjustableSpace

        public static boolean isAdjustableSpace​(int c)
        Method to determine if the character is an adjustable space.
        Parameters:
        c - character to check
        Returns:
        True if the character is adjustable
      • isAnySpace

        public static boolean isAnySpace​(int c)
        Determines if the character represents any kind of space.
        Parameters:
        c - character to check
        Returns:
        True if the character represents any kind of space
      • isAlphabetic

        public static boolean isAlphabetic​(int c)
        Indicates whether a character is classified as "Alphabetic" by the Unicode standard.
        Parameters:
        c - the character
        Returns:
        true if the character is "Alphabetic"
      • isExplicitBreak

        public static boolean isExplicitBreak​(int c)
        Indicates whether the given character is an explicit break-character
        Parameters:
        c - the character to check
        Returns:
        true if the character represents an explicit break
      • charToNCRef

        public static java.lang.String charToNCRef​(int c)
        Convert a single unicode scalar value to an XML numeric character reference. If in the BMP, four digits are used, otherwise 6 digits are used.
        Parameters:
        c - a unicode scalar value
        Returns:
        a string representing a numeric character reference
      • toNCRefs

        public static java.lang.String toNCRefs​(java.lang.String s)
        Convert a string to a sequence of ASCII or XML numeric character references.
        Parameters:
        s - a java string (encoded in UTF-16)
        Returns:
        a string representing a sequence of numeric character reference or ASCII characters
      • padLeft

        public static java.lang.String padLeft​(java.lang.String s,
                                               int width,
                                               char pad)
        Pad a string S on left out to width W using padding character PAD.
        Parameters:
        s - string to pad
        width - width of field to add padding
        pad - character to use for padding
        Returns:
        padded string
      • format

        public static java.lang.String format​(int c)
        Format character for debugging output, which it is prefixed with "0x", padded left with '0' and either 4 or 6 hex characters in width according to whether it is in the BMP or not.
        Parameters:
        c - character code
        Returns:
        formatted character string
      • isSameSequence

        public static boolean isSameSequence​(java.lang.CharSequence cs1,
                                             java.lang.CharSequence cs2)
        Determine if two character sequences contain the same characters.
        Parameters:
        cs1 - first character sequence
        cs2 - second character sequence
        Returns:
        true if both sequences have same length and same character sequence
      • isBmpCodePoint

        public static boolean isBmpCodePoint​(int codePoint)
        Determine whether the specified character (Unicode code point) is in then Basic Multilingual Plane (BMP). Such code points can be represented using a single char.
        Parameters:
        codePoint - the character (Unicode code point) to be tested
        Returns:
        true if the specified code point is between Character#MIN_VALUE and Character#MAX_VALUE} inclusive; false otherwise
        See Also:
        from Java 1.7
      • incrementIfNonBMP

        public static int incrementIfNonBMP​(int codePoint)
        Returns 1 if codePoint not in the BMP. This function is particularly useful in for loops over strings where, in presence of surrogate pairs, you need to skip one loop.
        Parameters:
        codePoint - 1 if codePoint > 0xFFFF, 0 otherwise
        Returns:
        1 if codePoint > 0xFFFF, 0 otherwise
      • isSurrogatePair

        public static boolean isSurrogatePair​(char ch)
        Determine if the given characters is part of a surrogate pair.
        Parameters:
        ch - character to be checked
        Returns:
        true if ch is an high surrogate or a low surrogate
      • containsSurrogatePairAt

        public static boolean containsSurrogatePairAt​(java.lang.CharSequence chars,
                                                      int index)
        Tells whether there is a surrogate pair starting from the given index in the CharSequence. If the character at index is an high surrogate then the character at index+1 is checked to be a low surrogate. If a malformed surrogate pair is encountered then an IllegalArgumentException is thrown.
         high surrogate [0xD800 - 0xDC00]
         low surrogate [0xDC00 - 0xE000]
         
        Parameters:
        chars - CharSequence to check
        index - index in the CharSequqnce where to start the check
        Returns:
        true if there is a well-formed surrogate pair at index
        Throws:
        java.lang.IllegalArgumentException - if there wrong usage of surrogate pairs
      • codepointsIter

        public static java.lang.Iterable<java.lang.Integer> codepointsIter​(java.lang.CharSequence s)
        Creates an iterator to iter a CharSequence codepoints.
        Parameters:
        s - CharSequence to iter
        Returns:
        codepoint iterator for the given CharSequence.
        See Also:
        codepointsIter(CharSequence, int, int)
      • codepointsIter

        public static java.lang.Iterable<java.lang.Integer> codepointsIter​(java.lang.CharSequence s,
                                                                           int beginIndex,
                                                                           int endIndex)
        Creates an iterator to iter a sub-CharSequence codepoints.
        Parameters:
        s - CharSequence to iter
        beginIndex - lower range
        endIndex - upper range
        Returns:
        codepoint iterator for the given sub-CharSequence.
        See Also:
        Bug JDK-5003547