Class LexiconImpl
- java.lang.Object
-
- com.sun.speech.freetts.lexicon.LexiconImpl
-
- All Implemented Interfaces:
Lexicon
- Direct Known Subclasses:
CMULexicon
public abstract class LexiconImpl extends java.lang.Object implements Lexicon
Provides an implementation of a Lexicon.This implementation will either read from a straight ASCII file or a binary file. When reading from an ASCII file, you can specify when the input line is tokenized: load, lookup, or never. If you specify 'load', the entire file will be parsed when it is loaded. If you specify 'lookup', the file will be loaded, but the parsing for each line will be delayed until it is referenced and the parsed form will be saved away. If you specify 'never', the lines will parsed each time they are referenced. The default is 'never'. To specify the load type, set the system property as follows:
-Dcom.sun.speech.freetts.lexicon.LexTokenize=load
If a binary file is used, you can also specify whether the new IO package is used. The new IO package is new for JDK1.4, and can greatly improve the speed of loading files. To enable new IO, use the following system property (it is enabled by default):
-Dcom.sun.speech.freetts.useNewIO=true
The implementation also allows users to define their own addenda that will be used in addition to the system addenda. If the user defines their own addenda, it values will be added to the system addenda, overriding any existing elements in the system addenda. To define a user addenda, the user needs to set the following property:
-Dcom.sun.speeech.freetts.lexicon.userAddenda=<URLToUserAddenda>
Where <URLToUserAddenda> is a URL pointing to an ASCII file containing addenda entries.[[[TODO: support multiple homographs with the same part of speech.]]]
-
-
Field Summary
Fields Modifier and Type Field Description protected boolean
tokenizeOnLoad
If true, the phone string is replaced with the phone array in the hashmap when the phone array is loaded.protected boolean
tokenizeOnLookup
If true, the phone string is replaced with the phone array in the hashmap when the phone array is first looked up.
-
Constructor Summary
Constructors Constructor Description LexiconImpl()
Class constructor for an empty Lexicon.LexiconImpl(java.net.URL compiledURL, java.net.URL addendaURL, java.net.URL letterToSoundURL, boolean binary)
Create a new LexiconImpl by reading from the given URLS.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addAddendum(java.lang.String word, java.lang.String partOfSpeech, java.lang.String[] phones)
Adds a word to the addenda.boolean
compare(LexiconImpl other)
Tests to see if this lexicon is identical to the other for debugging purposes.protected java.util.Map
createLexicon(java.io.InputStream is, boolean binary, int estimatedSize)
Reads the given input stream as lexicon data and returns the results in aMap
.void
dumpBinary(java.lang.String path)
Dumps this lexicon (just the compiled form).protected static java.lang.String
fixPartOfSpeech(java.lang.String partOfSpeech)
Fixes the part of speech if it isnull
.protected java.lang.String[]
getPhones(java.lang.String phones)
Turns the phoneString
into aString[]
, using " " as the delimiter.java.lang.String[]
getPhones(java.lang.String word, java.lang.String partOfSpeech)
Gets the phone list for a given word.java.lang.String[]
getPhones(java.lang.String word, java.lang.String partOfSpeech, boolean useLTS)
Gets the phone list for a given word.protected java.lang.String[]
getPhones(java.util.Map lexicon, java.lang.String wordAndPartOfSpeech)
Gets a phone list for a word from a given lexicon.protected java.lang.String[]
getPhones(java.util.Map lexicon, java.lang.String word, java.lang.String partOfSpeech)
Gets a phone list for a word from a given lexicon.boolean
isLoaded()
Determines if this lexicon is loaded.void
load()
Loads the data for this lexicon.protected java.util.Map
loadTextLexicon(java.io.InputStream is, int estimatedSize)
Reads the given input stream as text lexicon data and returns the results in aMap
.protected void
parseAndAdd(java.util.Map lexicon, java.lang.String line)
Creates a word from the given input line and add it to the lexicon.void
removeAddendum(java.lang.String word, java.lang.String partOfSpeech)
Removes a word from the addenda.protected void
setLexiconParameters(java.net.URL compiledURL, java.net.URL addendaURL, java.net.URL letterToSoundURL, boolean binary)
Sets the lexicon parameters-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.sun.speech.freetts.lexicon.Lexicon
isSyllableBoundary
-
-
-
-
Field Detail
-
tokenizeOnLoad
protected boolean tokenizeOnLoad
If true, the phone string is replaced with the phone array in the hashmap when the phone array is loaded. The side effects of this are quicker lookups, but more memory usage and a longer startup time.
-
tokenizeOnLookup
protected boolean tokenizeOnLookup
If true, the phone string is replaced with the phone array in the hashmap when the phone array is first looked up. The side effects Set by cmufilelex.tokenize=lookup.
-
-
Constructor Detail
-
LexiconImpl
public LexiconImpl(java.net.URL compiledURL, java.net.URL addendaURL, java.net.URL letterToSoundURL, boolean binary)
Create a new LexiconImpl by reading from the given URLS.- Parameters:
compiledURL
- a URL pointing to the compiled lexiconaddendaURL
- a URL pointing to lexicon addendaletterToSoundURL
- a LetterToSound to use if a word cannot be found in the compiled form or the addendabinary
- iftrue
, the input streams are binary; otherwise, they are text.
-
LexiconImpl
public LexiconImpl()
Class constructor for an empty Lexicon.
-
-
Method Detail
-
setLexiconParameters
protected void setLexiconParameters(java.net.URL compiledURL, java.net.URL addendaURL, java.net.URL letterToSoundURL, boolean binary)
Sets the lexicon parameters- Parameters:
compiledURL
- a URL pointing to the compiled lexiconaddendaURL
- a URL pointing to lexicon addendaletterToSoundURL
- a URL pointing to the LetterToSound to usebinary
- iftrue
, the input streams are binary; otherwise, they are text.
-
isLoaded
public boolean isLoaded()
Determines if this lexicon is loaded.
-
load
public void load() throws java.io.IOException
Loads the data for this lexicon. If the
-
createLexicon
protected java.util.Map createLexicon(java.io.InputStream is, boolean binary, int estimatedSize) throws java.io.IOException
Reads the given input stream as lexicon data and returns the results in aMap
.- Parameters:
is
- the input streambinary
- iftrue
, the data is binaryestimatedSize
- the estimated size of the lexicon- Throws:
java.io.IOException
- if errors are encountered while reading the data
-
loadTextLexicon
protected java.util.Map loadTextLexicon(java.io.InputStream is, int estimatedSize) throws java.io.IOException
Reads the given input stream as text lexicon data and returns the results in aMap
.- Parameters:
is
- the input streamestimatedSize
- the estimated number of entries of the lexicon- Throws:
java.io.IOException
- if errors are encountered while reading the data
-
parseAndAdd
protected void parseAndAdd(java.util.Map lexicon, java.lang.String line)
Creates a word from the given input line and add it to the lexicon.- Parameters:
lexicon
- the lexiconline
- the input text
-
getPhones
public java.lang.String[] getPhones(java.lang.String word, java.lang.String partOfSpeech)
Gets the phone list for a given word. If a phone list cannot be found, returnsnull
. The format is lexicon dependent. If the part of speech does not matter, pass innull
.
-
getPhones
public java.lang.String[] getPhones(java.lang.String word, java.lang.String partOfSpeech, boolean useLTS)
Gets the phone list for a given word. If a phone list cannot be found,null
is returned. ThepartOfSpeech
is implementation dependent, butnull
always matches.
-
getPhones
protected java.lang.String[] getPhones(java.util.Map lexicon, java.lang.String word, java.lang.String partOfSpeech)
Gets a phone list for a word from a given lexicon. If a phone list cannot be found, returnsnull
. The format is lexicon dependent. If the part of speech does not matter, pass innull
.- Parameters:
lexicon
- the lexiconword
- the word to findpartOfSpeech
- the part of speech- Returns:
- the list of phones for word or
null
-
getPhones
protected java.lang.String[] getPhones(java.util.Map lexicon, java.lang.String wordAndPartOfSpeech)
Gets a phone list for a word from a given lexicon. If a phone list cannot be found, returnsnull
.- Parameters:
lexicon
- the lexiconwordAndPartOfSpeech
- word and part of speech concatenated together- Returns:
- the list of phones for word or
null
-
getPhones
protected java.lang.String[] getPhones(java.lang.String phones)
Turns the phoneString
into aString[]
, using " " as the delimiter.- Parameters:
phones
- the phones- Returns:
- the phones split into an array
-
addAddendum
public void addAddendum(java.lang.String word, java.lang.String partOfSpeech, java.lang.String[] phones)
Adds a word to the addenda.- Specified by:
addAddendum
in interfaceLexicon
- Parameters:
word
- the word to findpartOfSpeech
- the part of speechphones
- the phones for the word
-
removeAddendum
public void removeAddendum(java.lang.String word, java.lang.String partOfSpeech)
Removes a word from the addenda.- Specified by:
removeAddendum
in interfaceLexicon
- Parameters:
word
- the word to removepartOfSpeech
- the part of speech
-
dumpBinary
public void dumpBinary(java.lang.String path)
Dumps this lexicon (just the compiled form). Lexicon will be dumped to two binary files PATH_compiled.bin and PATH_addenda.bin- Parameters:
path
- the root path to dump it to
-
compare
public boolean compare(LexiconImpl other)
Tests to see if this lexicon is identical to the other for debugging purposes.- Parameters:
other
- the other lexicon to compare to- Returns:
- true if lexicons are identical
-
fixPartOfSpeech
protected static java.lang.String fixPartOfSpeech(java.lang.String partOfSpeech)
Fixes the part of speech if it isnull
. The default representation of anull
part of speech is the number "0".
-
-