Class InputStreamSource

  • All Implemented Interfaces:
    java.io.Closeable, java.io.Serializable, java.lang.AutoCloseable, java.lang.Readable

    public class InputStreamSource
    extends Source
    A source of characters based on an InputStream such as from a URLConnection.
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int BUFFER_SIZE
      An initial buffer size.
      protected char[] mBuffer
      The characters read so far.
      protected java.lang.String mEncoding
      The character set in use.
      protected int mLevel
      The number of valid bytes in the buffer.
      protected int mMark
      The bookmark.
      protected int mOffset
      The offset of the next byte returned by read().
      protected java.io.InputStreamReader mReader
      The converter from bytes to characters.
      protected java.io.InputStream mStream
      The stream of bytes.
      • Fields inherited from class org.htmlparser.lexer.Source

        EOF
      • Fields inherited from class java.io.Reader

        lock
    • Constructor Summary

      Constructors 
      Constructor Description
      InputStreamSource​(java.io.InputStream stream)
      Create a source of characters using the default character set.
      InputStreamSource​(java.io.InputStream stream, java.lang.String charset)
      Create a source of characters.
      InputStreamSource​(java.io.InputStream stream, java.lang.String charset, int size)
      Create a source of characters.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int available()
      Get the number of available characters.
      void close()
      Does nothing.
      void destroy()
      Close the source.
      protected void fill​(int min)
      Fetch more characters from the underlying reader.
      char getCharacter​(int offset)
      Retrieve a character again.
      void getCharacters​(char[] array, int offset, int start, int end)
      Retrieve characters again.
      void getCharacters​(java.lang.StringBuffer buffer, int offset, int length)
      Append characters already read into a StringBuffer.
      java.lang.String getEncoding()
      Get the encoding being used to convert characters.
      java.io.InputStream getStream()
      Get the input stream being used.
      java.lang.String getString​(int offset, int length)
      Retrieve a string.
      void mark​(int readAheadLimit)
      Mark the present position in the source.
      boolean markSupported()
      Tell whether this source supports the mark() operation.
      int offset()
      Get the position (in characters).
      int read()
      Read a single character.
      int read​(char[] cbuf)
      Read characters into an array.
      int read​(char[] cbuf, int off, int len)
      Read characters into a portion of an array.
      boolean ready()
      Tell whether this source is ready to be read.
      void reset()
      Reset the source.
      void setEncoding​(java.lang.String character_set)
      Begins reading from the source with the given character set.
      long skip​(long n)
      Skip characters.
      void unread()
      Undo the read of a single character.
      • Methods inherited from class java.io.Reader

        nullReader, read, transferTo
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • BUFFER_SIZE

        public static int BUFFER_SIZE
        An initial buffer size. Has a default value of {16384}.
      • mStream

        protected transient java.io.InputStream mStream
        The stream of bytes. Set to null when the source is closed.
      • mEncoding

        protected java.lang.String mEncoding
        The character set in use.
      • mReader

        protected transient java.io.InputStreamReader mReader
        The converter from bytes to characters.
      • mBuffer

        protected char[] mBuffer
        The characters read so far.
      • mLevel

        protected int mLevel
        The number of valid bytes in the buffer.
      • mOffset

        protected int mOffset
        The offset of the next byte returned by read().
      • mMark

        protected int mMark
        The bookmark.
    • Constructor Detail

      • InputStreamSource

        public InputStreamSource​(java.io.InputStream stream)
                          throws java.io.UnsupportedEncodingException
        Create a source of characters using the default character set.
        Parameters:
        stream - The stream of bytes to use.
        Throws:
        java.io.UnsupportedEncodingException - If the default character set is unsupported.
      • InputStreamSource

        public InputStreamSource​(java.io.InputStream stream,
                                 java.lang.String charset)
                          throws java.io.UnsupportedEncodingException
        Create a source of characters.
        Parameters:
        stream - The stream of bytes to use.
        charset - The character set used in encoding the stream.
        Throws:
        java.io.UnsupportedEncodingException - If the character set is unsupported.
      • InputStreamSource

        public InputStreamSource​(java.io.InputStream stream,
                                 java.lang.String charset,
                                 int size)
                          throws java.io.UnsupportedEncodingException
        Create a source of characters.
        Parameters:
        stream - The stream of bytes to use.
        charset - The character set used in encoding the stream.
        size - The initial character buffer size.
        Throws:
        java.io.UnsupportedEncodingException - If the character set is unsupported.
    • Method Detail

      • getStream

        public java.io.InputStream getStream()
        Get the input stream being used.
        Returns:
        The current input stream.
      • getEncoding

        public java.lang.String getEncoding()
        Get the encoding being used to convert characters.
        Specified by:
        getEncoding in class Source
        Returns:
        The current encoding.
      • setEncoding

        public void setEncoding​(java.lang.String character_set)
                         throws ParserException
        Begins reading from the source with the given character set. If the current encoding is the same as the requested encoding, this method is a no-op. Otherwise any subsequent characters read from this page will have been decoded using the given character set.

        Some magic happens here to obtain this result if characters have already been consumed from this source. Since a Reader cannot be dynamically altered to use a different character set, the underlying stream is reset, a new Source is constructed and a comparison made of the characters read so far with the newly read characters up to the current position. If a difference is encountered, or some other problem occurs, an exception is thrown.

        Specified by:
        setEncoding in class Source
        Parameters:
        character_set - The character set to use to convert bytes into characters.
        Throws:
        ParserException - If a character mismatch occurs between characters already provided and those that would have been returned had the new character set been in effect from the beginning. An exception is also thrown if the underlying stream won't put up with these shenanigans.
      • fill

        protected void fill​(int min)
                     throws java.io.IOException
        Fetch more characters from the underlying reader. Has no effect if the underlying reader has been drained.
        Parameters:
        min - The minimum to read.
        Throws:
        java.io.IOException - If the underlying reader read() throws one.
      • close

        public void close()
                   throws java.io.IOException
        Does nothing. It's supposed to close the source, but use destroy() instead.
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Specified by:
        close in class Source
        Throws:
        java.io.IOException - not used
        See Also:
        destroy()
      • read

        public int read()
                 throws java.io.IOException
        Read a single character. This method will block until a character is available, an I/O error occurs, or the end of the stream is reached.
        Specified by:
        read in class Source
        Returns:
        The character read, as an integer in the range 0 to 65535 (0x00-0xffff), or EOF if the end of the stream has been reached
        Throws:
        java.io.IOException - If an I/O error occurs.
      • read

        public int read​(char[] cbuf,
                        int off,
                        int len)
                 throws java.io.IOException
        Read characters into a portion of an array. This method will block until some input is available, an I/O error occurs, or the end of the stream is reached.
        Specified by:
        read in class Source
        Parameters:
        cbuf - Destination buffer
        off - Offset at which to start storing characters
        len - Maximum number of characters to read
        Returns:
        The number of characters read, or EOF if the end of the stream has been reached
        Throws:
        java.io.IOException - If an I/O error occurs.
      • read

        public int read​(char[] cbuf)
                 throws java.io.IOException
        Read characters into an array. This method will block until some input is available, an I/O error occurs, or the end of the stream is reached.
        Specified by:
        read in class Source
        Parameters:
        cbuf - Destination buffer.
        Returns:
        The number of characters read, or EOF if the end of the stream has been reached.
        Throws:
        java.io.IOException - If an I/O error occurs.
      • reset

        public void reset()
                   throws java.lang.IllegalStateException
        Reset the source. Repositions the read point to begin at zero.
        Specified by:
        reset in class Source
        Throws:
        java.lang.IllegalStateException - If the source has been closed.
      • markSupported

        public boolean markSupported()
        Tell whether this source supports the mark() operation.
        Specified by:
        markSupported in class Source
        Returns:
        true.
      • mark

        public void mark​(int readAheadLimit)
                  throws java.io.IOException
        Mark the present position in the source. Subsequent calls to reset() will attempt to reposition the source to this point.
        Specified by:
        mark in class Source
        Parameters:
        readAheadLimit - Not used.
        Throws:
        java.io.IOException - If the source is closed.
      • ready

        public boolean ready()
                      throws java.io.IOException
        Tell whether this source is ready to be read.
        Specified by:
        ready in class Source
        Returns:
        true if the next read() is guaranteed not to block for input, false otherwise. Note that returning false does not guarantee that the next read will block.
        Throws:
        java.io.IOException - If the source is closed.
      • skip

        public long skip​(long n)
                  throws java.io.IOException,
                         java.lang.IllegalArgumentException
        Skip characters. This method will block until some characters are available, an I/O error occurs, or the end of the stream is reached. Note: n is treated as an int
        Specified by:
        skip in class Source
        Parameters:
        n - The number of characters to skip.
        Returns:
        The number of characters actually skipped
        Throws:
        java.lang.IllegalArgumentException - If n is negative.
        java.io.IOException - If an I/O error occurs.
      • unread

        public void unread()
                    throws java.io.IOException
        Undo the read of a single character.
        Specified by:
        unread in class Source
        Throws:
        java.io.IOException - If the source is closed or no characters have been read.
      • getCharacter

        public char getCharacter​(int offset)
                          throws java.io.IOException
        Retrieve a character again.
        Specified by:
        getCharacter in class Source
        Parameters:
        offset - The offset of the character.
        Returns:
        The character at offset.
        Throws:
        java.io.IOException - If the offset is beyond offset() or the source is closed.
      • getCharacters

        public void getCharacters​(char[] array,
                                  int offset,
                                  int start,
                                  int end)
                           throws java.io.IOException
        Retrieve characters again.
        Specified by:
        getCharacters in class Source
        Parameters:
        array - The array of characters.
        offset - The starting position in the array where characters are to be placed.
        start - The starting position, zero based.
        end - The ending position (exclusive, i.e. the character at the ending position is not included), zero based.
        Throws:
        java.io.IOException - If the start or end is beyond offset() or the source is closed.
      • getString

        public java.lang.String getString​(int offset,
                                          int length)
                                   throws java.io.IOException
        Retrieve a string.
        Specified by:
        getString in class Source
        Parameters:
        offset - The offset of the first character.
        length - The number of characters to retrieve.
        Returns:
        A string containing the length characters at offset.
        Throws:
        java.io.IOException - If the offset or (offset + length) is beyond offset() or the source is closed.
      • getCharacters

        public void getCharacters​(java.lang.StringBuffer buffer,
                                  int offset,
                                  int length)
                           throws java.io.IOException
        Append characters already read into a StringBuffer.
        Specified by:
        getCharacters in class Source
        Parameters:
        buffer - The buffer to append to.
        offset - The offset of the first character.
        length - The number of characters to retrieve.
        Throws:
        java.io.IOException - If the offset or (offset + length) is beyond offset() or the source is closed.
      • destroy

        public void destroy()
                     throws java.io.IOException
        Close the source. Once a source has been closed, further read, ready, mark, reset, skip, unread, getCharacter or getString invocations will throw an IOException. Closing a previously-closed source, however, has no effect.
        Specified by:
        destroy in class Source
        Throws:
        java.io.IOException - If an I/O error occurs
      • offset

        public int offset()
        Get the position (in characters).
        Specified by:
        offset in class Source
        Returns:
        The number of characters that have already been read, or EOF if the source is closed.
      • available

        public int available()
        Get the number of available characters.
        Specified by:
        available in class Source
        Returns:
        The number of characters that can be read without blocking or zero if the source is closed.