Interface Tokenizer

  • All Known Implementing Classes:
    TokenizerImpl

    public interface Tokenizer
    Chops a string or text file into Token instances.
    • Method Summary

      All Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      java.lang.String getErrorDescription()
      If hasErrors returns true, returns a description of the error encountered.
      Token getNextToken()
      Returns the next token.
      boolean hasErrors()
      Returns true if there were errors while reading tokens.
      boolean hasMoreTokens()
      Returns true if there are more tokens, false otherwise.
      boolean isBreak()
      Determines if the current token should start a new sentence.
      void setInputReader​(java.io.Reader reader)
      Sets the input reader.
      void setInputText​(java.lang.String textToTokenize)
      Sets the text to be tokenized by this tokenizer.
      void setPostpunctuationSymbols​(java.lang.String symbols)
      Sets the postpunctuation symbols of this Tokenizer to the given symbols.
      void setPrepunctuationSymbols​(java.lang.String symbols)
      Sets the prepunctuation symbols of this Tokenizer to the given symbols.
      void setSingleCharSymbols​(java.lang.String symbols)
      Sets the single character symbols of this Tokenizer to the given symbols.
      void setWhitespaceSymbols​(java.lang.String symbols)
      Sets the whitespace symbols of this Tokenizer to the given symbols.
    • Method Detail

      • setInputText

        void setInputText​(java.lang.String textToTokenize)
        Sets the text to be tokenized by this tokenizer.
        Parameters:
        textToTokenize - the text to tokenize
      • setInputReader

        void setInputReader​(java.io.Reader reader)
        Sets the input reader.
        Parameters:
        reader - the input source
      • getNextToken

        Token getNextToken()
        Returns the next token.
        Returns:
        the next token if it exists; otherwise null
      • hasMoreTokens

        boolean hasMoreTokens()
        Returns true if there are more tokens, false otherwise.
        Returns:
        true if there are more tokens; otherwise false
      • hasErrors

        boolean hasErrors()
        Returns true if there were errors while reading tokens.
        Returns:
        true if there were errors; otherwise false
      • getErrorDescription

        java.lang.String getErrorDescription()
        If hasErrors returns true, returns a description of the error encountered. Otherwise returns null.
        Returns:
        a description of the last error that occurred
      • setWhitespaceSymbols

        void setWhitespaceSymbols​(java.lang.String symbols)
        Sets the whitespace symbols of this Tokenizer to the given symbols.
        Parameters:
        symbols - the whitespace symbols
      • setSingleCharSymbols

        void setSingleCharSymbols​(java.lang.String symbols)
        Sets the single character symbols of this Tokenizer to the given symbols.
        Parameters:
        symbols - the single character symbols
      • setPrepunctuationSymbols

        void setPrepunctuationSymbols​(java.lang.String symbols)
        Sets the prepunctuation symbols of this Tokenizer to the given symbols.
        Parameters:
        symbols - the prepunctuation symbols
      • setPostpunctuationSymbols

        void setPostpunctuationSymbols​(java.lang.String symbols)
        Sets the postpunctuation symbols of this Tokenizer to the given symbols.
        Parameters:
        symbols - the postpunctuation symbols
      • isBreak

        boolean isBreak()
        Determines if the current token should start a new sentence.
        Returns:
        true if a new sentence should be started