Package edu.msu.cme.rdp.classifier
Class TrainingInfo
- java.lang.Object
-
- edu.msu.cme.rdp.classifier.TrainingInfo
-
public class TrainingInfo extends java.lang.Object
The TrainingInfo holds all the training information and taxonomy hierarchy information.
-
-
Constructor Summary
Constructors Constructor Description TrainingInfo()
Creates new TrainingInfo.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Classifier
createClassifier()
Creates a new Classifier if all the train information have been completed, throws exception if not.void
createGenusWordProbList(java.io.Reader reader)
Reads in the index of the genus treenode and conditional probability that genus contains a word.void
createLogWordPriorArr(java.io.Reader reader)
Reads in the log value of the word prior probability and saves to an array LogWordPriorArr.void
createProbIndexArr(java.io.Reader reader)
Reads in start index of the conditional probability of each genus, saves to an array wordConditionalProbIndexArr.void
createTree(java.io.Reader reader)
Reads in the tree information from a reader and create all the HierarchyTrees.void
generateWordPairDiffArr(int[] word, int beginIndex)
For a given word w1 and the reverse complement word w2, calculates the difference between the log word prior of w1 and w2 and saves to an array.HierarchyTree
getGenusNodebyIndex(int i)
Returns a genus node from the genusNodeList at the specified position.int
getGenusNodeListSize()
Returns the number of the genus nodes.HierarchyVersion
getHierarchyInfo()
Returns the info of the taxonomy hierarchy from of the training file.java.lang.String
getHierarchyVersion()
Returns the version of the taxonomical hierarchy.float
getLogLeaveCount(int i)
Returns the log value of (number of leaves + 1) of a genusfloat
getLogWordPrior(int wordIndex)
Returns the log value of the prior probability of a word.HierarchyTree
getRootTree()
Returns the root of the trees.int
getStartIndex(int wordIndex)
Returns the start index of GenusIndexWordConditionalProb in the array for the specified wordIndex.int
getStopIndex(int wordIndex)
Returns the stop index of GenusIndexWordConditionalProb in the array for the specified wordIndex.java.lang.String
getTrainRank()
GenusWordConditionalProb
getWordConditionalProbObject(int posIndex)
Returns a GenusIndexWordConditionalProb from the genusIndex_wordConditionalProbList at the specified postion in the list.float
getWordPairPriorDiff(int wordIndex)
Returns the difference between given word and its reverse complement word.boolean
isSeqReversed(int[] wordIndexArr, int wordCount)
boolean
isSeqReversed(ClassifierSequence seq)
Returns true if the sequence is in reverse orientation.
-
-
-
Method Detail
-
createTree
public void createTree(java.io.Reader reader) throws java.io.IOException, TrainingDataException
Reads in the tree information from a reader and create all the HierarchyTrees. Note: the tree information has to be read after at least one of the other three files because we need to set the version information.- Throws:
java.io.IOException
TrainingDataException
-
createLogWordPriorArr
public void createLogWordPriorArr(java.io.Reader reader) throws java.io.IOException, TrainingDataException
Reads in the log value of the word prior probability and saves to an array LogWordPriorArr.- Throws:
java.io.IOException
TrainingDataException
-
generateWordPairDiffArr
public void generateWordPairDiffArr(int[] word, int beginIndex)
For a given word w1 and the reverse complement word w2, calculates the difference between the log word prior of w1 and w2 and saves to an array. Repeats for every possible word of size 8.
-
createGenusWordProbList
public void createGenusWordProbList(java.io.Reader reader) throws java.io.IOException, TrainingDataException
Reads in the index of the genus treenode and conditional probability that genus contains a word. Saves the data into a list genus_wordConditionalProbList.- Throws:
java.io.IOException
TrainingDataException
-
createProbIndexArr
public void createProbIndexArr(java.io.Reader reader) throws java.io.IOException, TrainingDataException
Reads in start index of the conditional probability of each genus, saves to an array wordConditionalProbIndexArr.- Throws:
java.io.IOException
TrainingDataException
-
createClassifier
public Classifier createClassifier()
Creates a new Classifier if all the train information have been completed, throws exception if not.
-
getRootTree
public HierarchyTree getRootTree()
Returns the root of the trees.
-
getTrainRank
public java.lang.String getTrainRank()
- Returns:
- the rank the classifier was trained on
-
getGenusNodeListSize
public int getGenusNodeListSize()
Returns the number of the genus nodes.
-
getGenusNodebyIndex
public HierarchyTree getGenusNodebyIndex(int i)
Returns a genus node from the genusNodeList at the specified position.
-
getLogWordPrior
public float getLogWordPrior(int wordIndex)
Returns the log value of the prior probability of a word.
-
getWordPairPriorDiff
public float getWordPairPriorDiff(int wordIndex)
Returns the difference between given word and its reverse complement word.
-
getLogLeaveCount
public float getLogLeaveCount(int i)
Returns the log value of (number of leaves + 1) of a genus
-
getStartIndex
public int getStartIndex(int wordIndex)
Returns the start index of GenusIndexWordConditionalProb in the array for the specified wordIndex.
-
getStopIndex
public int getStopIndex(int wordIndex)
Returns the stop index of GenusIndexWordConditionalProb in the array for the specified wordIndex.
-
getWordConditionalProbObject
public GenusWordConditionalProb getWordConditionalProbObject(int posIndex)
Returns a GenusIndexWordConditionalProb from the genusIndex_wordConditionalProbList at the specified postion in the list.
-
getHierarchyVersion
public java.lang.String getHierarchyVersion()
Returns the version of the taxonomical hierarchy.
-
getHierarchyInfo
public HierarchyVersion getHierarchyInfo()
Returns the info of the taxonomy hierarchy from of the training file.
-
isSeqReversed
public boolean isSeqReversed(ClassifierSequence seq) throws java.io.IOException
Returns true if the sequence is in reverse orientation. Sums the difference between all the overlapping words from the query sequence and the reverse complements of those word. If the summation is less that zero, the query sequence is in reverse orientation.- Throws:
java.io.IOException
-
isSeqReversed
public boolean isSeqReversed(int[] wordIndexArr, int wordCount)
-
-