Package org.w3c.tidy

Class EncodingUtils


  • public final class EncodingUtils
    extends java.lang.Object
    Version:
    $Revision: 622 $ ($Author: fgiust $)
    Author:
    Fabrizio Giustina
    • Field Detail

      • UNICODE_BOM_BE

        public static final int UNICODE_BOM_BE
        the big-endian (default) UNICODE BOM.
        See Also:
        Constant Field Values
      • UNICODE_BOM

        public static final int UNICODE_BOM
        the default (big-endian) UNICODE BOM.
        See Also:
        Constant Field Values
      • UNICODE_BOM_LE

        public static final int UNICODE_BOM_LE
        the little-endian UNICODE BOM.
        See Also:
        Constant Field Values
      • UNICODE_BOM_UTF8

        public static final int UNICODE_BOM_UTF8
        the UTF-8 UNICODE BOM.
        See Also:
        Constant Field Values
      • FSM_ASCII

        public static final int FSM_ASCII
        states for ISO 2022 A document in ISO-2022 based encoding uses some ESC sequences called "designator" to switch character sets. The designators defined and used in ISO-2022-JP are: "ESC" + "(" + ? for ISO646 variants "ESC" + "$" + ? and "ESC" + "$" + "(" + ? for multibyte character sets. State ASCII.
        See Also:
        Constant Field Values
      • MAX_UTF8_FROM_UCS4

        public static final int MAX_UTF8_FROM_UCS4
        Max UTF-88 valid char value.
        See Also:
        Constant Field Values
      • MAX_UTF16_FROM_UCS4

        public static final int MAX_UTF16_FROM_UCS4
        Max UTF-16 value.
        See Also:
        Constant Field Values
      • LOW_UTF16_SURROGATE

        public static final int LOW_UTF16_SURROGATE
        utf16 low surrogate.
        See Also:
        Constant Field Values
      • UTF16_SURROGATES_BEGIN

        public static final int UTF16_SURROGATES_BEGIN
        UTF-16 surrogates begin.
        See Also:
        Constant Field Values
      • UTF16_LOW_SURROGATE_BEGIN

        public static final int UTF16_LOW_SURROGATE_BEGIN
        UTF-16 surrogate pair areas: low surrogates begin.
        See Also:
        Constant Field Values
      • UTF16_LOW_SURROGATE_END

        public static final int UTF16_LOW_SURROGATE_END
        UTF-16 surrogate pair areas: low surrogates end.
        See Also:
        Constant Field Values
      • UTF16_HIGH_SURROGATE_BEGIN

        public static final int UTF16_HIGH_SURROGATE_BEGIN
        UTF-16 surrogate pair areas: high surrogates begin.
        See Also:
        Constant Field Values
      • UTF16_HIGH_SURROGATE_END

        public static final int UTF16_HIGH_SURROGATE_END
        UTF-16 surrogate pair areas: high surrogates end.
        See Also:
        Constant Field Values
      • HIGH_UTF16_SURROGATE

        public static final int HIGH_UTF16_SURROGATE
        UTF-16 high surrogate.
        See Also:
        Constant Field Values
    • Method Detail

      • decodeWin1252

        protected static int decodeWin1252​(int c)
        Function for conversion from Windows-1252 to Unicode.
        Parameters:
        c - char to decode
        Returns:
        decoded char
      • decodeMacRoman

        protected static int decodeMacRoman​(int c)
        Function to convert from MacRoman to Unicode.
        Parameters:
        c - char to decode
        Returns:
        decoded char