Back to LanguageTool Homepage - Privacy - Imprint

Getting base of word from corrected input

(Tomek) #1

I got following problem: I would like to get bases of words that are returned from LanguageTool correction, I'm using morfologik for this, but some of words are not present there. Is there possibility to take cores directly from LanguageTool during correction process ?

(Daniel Naber) #2

I'm not sure I understand your question, could you maybe give an example? If a word isn't in the internal dictionary, LT won't know anything about it...

(Tomek) #3

I have some sentence -> "Ala ma kóta". I need to correct it, because there can be some mistakes done within it, so I'm using LT for it.
Next step is to obtain core words (lemmas) for text classification -> I'm using here external morfologik library. In this part morfologik is not recognizing some basic words i.e. "nie", but LT has no problem to process it (correction or just skip).
In this moment I'm just curious if LT is extracting cores for its own correction purposes, so i can use it instead of my step with morfologik library :smile:

Hope it is can help :smile:

(Daniel Naber) #4

You can use to see the internal analysis of LT. LT also uses morfologik for finding the base forms of words. But if LT can process a word that doesn't always mean it knows the base form, as the error detection rule maybe works without the base form.

(Tomek) #5

Ok, many thanks for help :smile:

I have found there is a function "getAnalyzedSentence" that return basic info about analyzed string. What I'm gonna to do is "analyze" corrected sentence to obtain lemmas and then process them further. What do you think about such idea?

(Daniel Naber) #6

That's a valid approach. You could also use getTagger() of your language class (e.g. "new English().getTagger()") and then call tag() but the result should be the same.

(Tomek) #7

Thanks :smile: