Getting base of word from corrected input

Hello,
I got following problem: I would like to get bases of words that are returned from LanguageTool correction, I’m using morfologik for this, but some of words are not present there. Is there possibility to take cores directly from LanguageTool during correction process ?

I’m not sure I understand your question, could you maybe give an example? If a word isn’t in the internal dictionary, LT won’t know anything about it…

I have some sentence -> “Ala ma kóta”. I need to correct it, because there can be some mistakes done within it, so I’m using LT for it.
Next step is to obtain core words (lemmas) for text classification -> I’m using here external morfologik library. In this part morfologik is not recognizing some basic words i.e. “nie”, but LT has no problem to process it (correction or just skip).
In this moment I’m just curious if LT is extracting cores for its own correction purposes, so i can use it instead of my step with morfologik library :slight_smile:

Hope it is can help :slight_smile:

You can use Text Analysis - LanguageTool to see the internal analysis of LT. LT also uses morfologik for finding the base forms of words. But if LT can process a word that doesn’t always mean it knows the base form, as the error detection rule maybe works without the base form.

Ok, many thanks for help :slight_smile:

I have found there is a function “getAnalyzedSentence” that return basic info about analyzed string. What I’m gonna to do is “analyze” corrected sentence to obtain lemmas and then process them further. What do you think about such idea?

That’s a valid approach. You could also use getTagger() of your language class (e.g. “new English().getTagger()”) and then call tag() but the result should be the same.

Thanks :slight_smile: