Class UnicodeCompressor
- java.lang.Object
-
- com.ibm.icu.text.UnicodeCompressor
-
public final class UnicodeCompressor extends Object
A compression engine implementing the Standard Compression Scheme for Unicode (SCSU) as outlined in Unicode Technical Report #6.The SCSU works by using dynamically positioned windows consisting of 128 consecutive characters in Unicode. During compression, characters within a window are encoded in the compressed stream as the bytes 0x7F - 0xFF. The SCSU provides transparency for the characters (bytes) between U+0000 - U+00FF. The SCSU approximates the storage size of traditional character sets, for example 1 byte per character for ASCII or Latin-1 text, and 2 bytes per character for CJK ideographs.
USAGE
The static methods on UnicodeCompressor may be used in a straightforward manner to compress simple strings:
String s = ... ; // get string from somewhere byte [] compressed = UnicodeCompressor.compress(s);
The static methods have a fairly large memory footprint. For finer-grained control over memory usage, UnicodeCompressor offers more powerful APIs allowing iterative compression:
// Compress an array "chars" of length "len" using a buffer of 512 bytes // to the OutputStream "out" UnicodeCompressor myCompressor = new UnicodeCompressor(); final static int BUFSIZE = 512; byte [] byteBuffer = new byte [ BUFSIZE ]; int bytesWritten = 0; int [] unicharsRead = new int [1]; int totalCharsCompressed = 0; int totalBytesWritten = 0; do { // do the compression bytesWritten = myCompressor.compress(chars, totalCharsCompressed, len, unicharsRead, byteBuffer, 0, BUFSIZE); // do something with the current set of bytes out.write(byteBuffer, 0, bytesWritten); // update the no. of characters compressed totalCharsCompressed += unicharsRead[0]; // update the no. of bytes written totalBytesWritten += bytesWritten; } while(totalCharsCompressed < len); myCompressor.reset(); // reuse compressor
- Author:
- Stephen F. Booth
- See Also:
UnicodeDecompressor
-
-
Field Summary
Fields Modifier and Type Field Description static int
ARMENIANINDEX
static int
COMPRESSIONOFFSET
static int
GREEKINDEX
static int
HALFWIDTHKATAKANAINDEX
static int
HIRAGANAINDEX
static int
INVALIDCHAR
static int
INVALIDWINDOW
static int
IPAEXTENSIONINDEX
static int
KATAKANAINDEX
static int
LATININDEX
static int
MAXINDEX
static int
NUMSTATICWINDOWS
static int
NUMWINDOWS
static int
RESERVEDINDEX
static int
SCHANGE0
static int
SCHANGE1
static int
SCHANGE2
static int
SCHANGE3
static int
SCHANGE4
static int
SCHANGE5
static int
SCHANGE6
static int
SCHANGE7
static int
SCHANGEU
static int
SDEFINE0
static int
SDEFINE1
static int
SDEFINE2
static int
SDEFINE3
static int
SDEFINE4
static int
SDEFINE5
static int
SDEFINE6
static int
SDEFINE7
static int
SDEFINEX
static int
SINGLEBYTEMODE
static int[]
sOffsets
Static compression window offsetsstatic int[]
sOffsetTable
For window offset mappingstatic int
SQUOTE0
static int
SQUOTE1
static int
SQUOTE2
static int
SQUOTE3
static int
SQUOTE4
static int
SQUOTE5
static int
SQUOTE6
static int
SQUOTE7
static int
SQUOTEU
static int
SRESERVED
static int
UCHANGE0
static int
UCHANGE1
static int
UCHANGE2
static int
UCHANGE3
static int
UCHANGE4
static int
UCHANGE5
static int
UCHANGE6
static int
UCHANGE7
static int
UDEFINE0
static int
UDEFINE1
static int
UDEFINE2
static int
UDEFINE3
static int
UDEFINE4
static int
UDEFINE5
static int
UDEFINE6
static int
UDEFINE7
static int
UDEFINEX
static int
UNICODEMODE
static int
UQUOTEU
static int
URESERVED
-
Constructor Summary
Constructors Constructor Description UnicodeCompressor()
Create a UnicodeCompressor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static byte[]
compress(char[] buffer, int start, int limit)
Compress a Unicode character array into a byte array.int
compress(char[] charBuffer, int charBufferStart, int charBufferLimit, int[] charsRead, byte[] byteBuffer, int byteBufferStart, int byteBufferLimit)
Compress a Unicode character array into a byte array.static byte[]
compress(String buffer)
Compress a string into a byte array.void
reset()
Reset the compressor to its initial state.
-
-
-
Field Detail
-
COMPRESSIONOFFSET
public static final int COMPRESSIONOFFSET
- See Also:
- Constant Field Values
-
NUMWINDOWS
public static final int NUMWINDOWS
- See Also:
- Constant Field Values
-
NUMSTATICWINDOWS
public static final int NUMSTATICWINDOWS
- See Also:
- Constant Field Values
-
INVALIDWINDOW
public static final int INVALIDWINDOW
- See Also:
- Constant Field Values
-
INVALIDCHAR
public static final int INVALIDCHAR
- See Also:
- Constant Field Values
-
SINGLEBYTEMODE
public static final int SINGLEBYTEMODE
- See Also:
- Constant Field Values
-
UNICODEMODE
public static final int UNICODEMODE
- See Also:
- Constant Field Values
-
MAXINDEX
public static final int MAXINDEX
- See Also:
- Constant Field Values
-
RESERVEDINDEX
public static final int RESERVEDINDEX
- See Also:
- Constant Field Values
-
LATININDEX
public static final int LATININDEX
- See Also:
- Constant Field Values
-
IPAEXTENSIONINDEX
public static final int IPAEXTENSIONINDEX
- See Also:
- Constant Field Values
-
GREEKINDEX
public static final int GREEKINDEX
- See Also:
- Constant Field Values
-
ARMENIANINDEX
public static final int ARMENIANINDEX
- See Also:
- Constant Field Values
-
HIRAGANAINDEX
public static final int HIRAGANAINDEX
- See Also:
- Constant Field Values
-
KATAKANAINDEX
public static final int KATAKANAINDEX
- See Also:
- Constant Field Values
-
HALFWIDTHKATAKANAINDEX
public static final int HALFWIDTHKATAKANAINDEX
- See Also:
- Constant Field Values
-
SDEFINEX
public static final int SDEFINEX
- See Also:
- Constant Field Values
-
SRESERVED
public static final int SRESERVED
- See Also:
- Constant Field Values
-
SQUOTEU
public static final int SQUOTEU
- See Also:
- Constant Field Values
-
SCHANGEU
public static final int SCHANGEU
- See Also:
- Constant Field Values
-
SQUOTE0
public static final int SQUOTE0
- See Also:
- Constant Field Values
-
SQUOTE1
public static final int SQUOTE1
- See Also:
- Constant Field Values
-
SQUOTE2
public static final int SQUOTE2
- See Also:
- Constant Field Values
-
SQUOTE3
public static final int SQUOTE3
- See Also:
- Constant Field Values
-
SQUOTE4
public static final int SQUOTE4
- See Also:
- Constant Field Values
-
SQUOTE5
public static final int SQUOTE5
- See Also:
- Constant Field Values
-
SQUOTE6
public static final int SQUOTE6
- See Also:
- Constant Field Values
-
SQUOTE7
public static final int SQUOTE7
- See Also:
- Constant Field Values
-
SCHANGE0
public static final int SCHANGE0
- See Also:
- Constant Field Values
-
SCHANGE1
public static final int SCHANGE1
- See Also:
- Constant Field Values
-
SCHANGE2
public static final int SCHANGE2
- See Also:
- Constant Field Values
-
SCHANGE3
public static final int SCHANGE3
- See Also:
- Constant Field Values
-
SCHANGE4
public static final int SCHANGE4
- See Also:
- Constant Field Values
-
SCHANGE5
public static final int SCHANGE5
- See Also:
- Constant Field Values
-
SCHANGE6
public static final int SCHANGE6
- See Also:
- Constant Field Values
-
SCHANGE7
public static final int SCHANGE7
- See Also:
- Constant Field Values
-
SDEFINE0
public static final int SDEFINE0
- See Also:
- Constant Field Values
-
SDEFINE1
public static final int SDEFINE1
- See Also:
- Constant Field Values
-
SDEFINE2
public static final int SDEFINE2
- See Also:
- Constant Field Values
-
SDEFINE3
public static final int SDEFINE3
- See Also:
- Constant Field Values
-
SDEFINE4
public static final int SDEFINE4
- See Also:
- Constant Field Values
-
SDEFINE5
public static final int SDEFINE5
- See Also:
- Constant Field Values
-
SDEFINE6
public static final int SDEFINE6
- See Also:
- Constant Field Values
-
SDEFINE7
public static final int SDEFINE7
- See Also:
- Constant Field Values
-
UCHANGE0
public static final int UCHANGE0
- See Also:
- Constant Field Values
-
UCHANGE1
public static final int UCHANGE1
- See Also:
- Constant Field Values
-
UCHANGE2
public static final int UCHANGE2
- See Also:
- Constant Field Values
-
UCHANGE3
public static final int UCHANGE3
- See Also:
- Constant Field Values
-
UCHANGE4
public static final int UCHANGE4
- See Also:
- Constant Field Values
-
UCHANGE5
public static final int UCHANGE5
- See Also:
- Constant Field Values
-
UCHANGE6
public static final int UCHANGE6
- See Also:
- Constant Field Values
-
UCHANGE7
public static final int UCHANGE7
- See Also:
- Constant Field Values
-
UDEFINE0
public static final int UDEFINE0
- See Also:
- Constant Field Values
-
UDEFINE1
public static final int UDEFINE1
- See Also:
- Constant Field Values
-
UDEFINE2
public static final int UDEFINE2
- See Also:
- Constant Field Values
-
UDEFINE3
public static final int UDEFINE3
- See Also:
- Constant Field Values
-
UDEFINE4
public static final int UDEFINE4
- See Also:
- Constant Field Values
-
UDEFINE5
public static final int UDEFINE5
- See Also:
- Constant Field Values
-
UDEFINE6
public static final int UDEFINE6
- See Also:
- Constant Field Values
-
UDEFINE7
public static final int UDEFINE7
- See Also:
- Constant Field Values
-
UQUOTEU
public static final int UQUOTEU
- See Also:
- Constant Field Values
-
UDEFINEX
public static final int UDEFINEX
- See Also:
- Constant Field Values
-
URESERVED
public static final int URESERVED
- See Also:
- Constant Field Values
-
sOffsetTable
public static final int[] sOffsetTable
For window offset mapping
-
sOffsets
public static final int[] sOffsets
Static compression window offsets
-
-
Constructor Detail
-
UnicodeCompressor
public UnicodeCompressor()
Create a UnicodeCompressor. Sets all windows to their default values.- See Also:
reset()
-
-
Method Detail
-
compress
public static byte[] compress(String buffer)
Compress a string into a byte array.- Parameters:
buffer
- The string to compress.- Returns:
- A byte array containing the compressed characters.
- See Also:
compress(char [], int, int)
-
compress
public static byte[] compress(char[] buffer, int start, int limit)
Compress a Unicode character array into a byte array.- Parameters:
buffer
- The character buffer to compress.start
- The start of the character run to compress.limit
- The limit of the character run to compress.- Returns:
- A byte array containing the compressed characters.
- See Also:
compress(String)
-
compress
public int compress(char[] charBuffer, int charBufferStart, int charBufferLimit, int[] charsRead, byte[] byteBuffer, int byteBufferStart, int byteBufferLimit)
Compress a Unicode character array into a byte array. This function will only consume input that can be completely output.- Parameters:
charBuffer
- The character buffer to compress.charBufferStart
- The start of the character run to compress.charBufferLimit
- The limit of the character run to compress.charsRead
- A one-element array. If not null, on return the number of characters read from charBuffer.byteBuffer
- A buffer to receive the compressed data. This buffer must be at minimum four bytes in size.byteBufferStart
- The starting offset to which to write compressed data.byteBufferLimit
- The limiting offset for writing compressed data.- Returns:
- The number of bytes written to byteBuffer.
-
reset
public void reset()
Reset the compressor to its initial state.
-
-