We use language tool for lemmatising through DKPro.
DKPro uses LanguageTools as below:
// Let LanguageTool analyze the tokens
List rawTaggedTokens = lang.getTagger().tag(tokenText);
AnalyzedSentence as = new AnalyzedSentence(rawTaggedTokens.toArray(new AnalyzedTokenReadings[rawTaggedTokens.size()]));
as = lang.getDisambiguator().disambiguate(as);
We have seen several times that “lower” is lemmatised wrongly to “lowe”, and “species” to “specie”. “Lowe” isnt a valid word and “specie” has a very difference meaning than “species”.
“There was a significant increase in phagocytic Activity of WBC as indicated by the lower PI in AD rats compared to that of control and sham-operated rats in both the 15 and 21-day studies.”
The following Analysed tokens is returned for lower:
“PG activate microglia by binding to their EP receptor. Activated microglia release ROS, reactive nitrogen species and neurotoxic cytokines which cause secondary neurodegeneration resulting in the increased number of plaques, as observed in the 21-day study.”
The following Analysed tokens is returned for species:
Are there any logic explanation to why “specie” and “lowe” are returned as tokens, or is it correct that it identifies some bugs in language tools?