Interface Categorizer
-
- All Known Implementing Classes:
CategorizerImpl
public interface CategorizerCategorizer class aims to recognize the category of the sentence based on the training Model used. The data used to train the model is based on the categories_[lang].txt The file is dynamic and the admin can download the file, edit and save it back to the ressources folder.
-
-
Nested Class Summary
Nested Classes Modifier and Type Interface Description static classCategorizer.Languageenumeration class with two valuesCategorizer.Language.ENandCategorizer.Language.FR.
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Modifier and Type Method Description CategorizerImpl.CategoryResultgetCategory(String[] tokens, Categorizer.Language lang, String sentence)this method is used to get the category of the tokenized sentence using theTokenizerclass.static CategorizerImplgetInstance()gets the instance of the class, if the instance is null it calls the constructor to create a new instanceopennlp.tools.doccat.DoccatModeltrain(String lang)This method takes a lot of time to execute since it prepares the training Model for the categorizer based on the training Data provided in theCategorizerImpl()constructor} For optimization purposes, we Have used a singleton
-
-
-
Method Detail
-
getInstance
static CategorizerImpl getInstance()
gets the instance of the class, if the instance is null it calls the constructor to create a new instance- Returns:
- Categorizer the instance of the class.
-
train
opennlp.tools.doccat.DoccatModel train(String lang)
This method takes a lot of time to execute since it prepares the training Model for the categorizer based on the training Data provided in theCategorizerImpl()constructor} For optimization purposes, we Have used a singleton- Parameters:
lang- a string value representing the language in which the model will be trained. It also references the file to be used to train the Model- Returns:
- instance of
DoccatModelwhich is the trained model. - See Also:
ObjectStream,DoccatModel,TrainingParameters,DoccatFactory,DocumentSampleStream,DocumentSample,DocumentCategorizerME,MarkableFileInputStreamFactory,PlainTextByLineStream
-
getCategory
CategorizerImpl.CategoryResult getCategory(String[] tokens, Categorizer.Language lang, String sentence)
this method is used to get the category of the tokenized sentence using theTokenizerclass.- Parameters:
tokens- an array of tokens retrieved from the tokenizerlang- theCategorizer.Languagein which the tokens are insentence- the sentenced (a single sentence from the user's input) from which the tokens are retrieved.- Returns:
- an instance of
CategorizerImpl.CategoryResultrepresenting the result to be displayed to the user.
-
-