Hi. LanguageTool is great! Thanks to everyone who works on it.
I’m wondering if it’s possible for LanguageTool to tell me all the infected forms of a word? Alternatively, can it tell me the root of a word or if a word is an infected form of some base word?
For example, in Spanish, can LanguageTool tell me all the conjugated forms of the very “ir”? Or, that “voy” is an inflected form of the verb “ir”?
The same question goes for English, can I know the inflected forms of the verb “to look”? Or that foxes is an inflected form of the noun “fox”?
LT does both internally, at least for a lot of languages, including Spanish and English. But there’s no way in the user interface to easily access that information. If you need the data, you can export it to text files as described at Developing a tagger dictionary - LanguageTool Wiki
If you use the comand line, using the -v option (verbose option),
then LanguageTool tells you the POS tags and lemma of each token,
as well as what disambiguator rules kick in. Example:
$ echo "The foxes" | java -jar languagetool-standalone/target/LanguageTool-2.7-SNAPSHOT/LanguageTool-2.7-SNAPSHOT/languagetool-commandline.jar -c utf-8 -l en-US -v
Expected text language: English (US)
Working on STDIN...
1108 rules activated for language English (US)
<S> The[the/DT,B-NP-plural] foxes[fox/NNS,fox/VBZ,</S>,E-NP-plural]<P/>
Disambiguator log:
Time: 2338ms for 1 sentences (0.4 sentences/sec)
Dominique, sorry for the late reply. What you wrote was helpful, but I really wanted to do it through Java code so it seems I can with something like this…
JLanguageTool testTool = new JLanguageTool(language);
try
{
AnalyzedSentence sentence = testTool.getAnalyzedSentence(“The dog went running through the park.”);
AnalyzedTokenReadings[] tokens = sentence.getTokensWithoutWhitespace();
for (AnalyzedTokenReadings token : tokens)
{
List aTokenList = token.getReadings();
SENT_START :
DT : the
NN : dog
VBD : go
JJ : running
NN:U : running
VBG : run
IN : through
JJ : through
RP : through
DT : the
NN : park
. : .
SENT_END : .
This works for me somewhat, but want I really want is to get all inflected forms for some token.
Thanks! I can certainly use Synthesizers for some things. It would be nice if I could get the POS tag for each of the inflected forms returned by synthesize(). Is there any way to do that? Also, why do some languages (e.g. Portuguese) not have a Synthesizer available?
There’s no direct way, but instead of getting all tags at once with “.*” you can get the tags one by one. All known tags for English are in this file: ./languagetool-language-modules/en/target/classes/org/languagetool/resource/en/english_tags.txt (similar for other languages).
Some languages have no synthesizer because their maintainers (if there is a maintainer) haven’t added one yet.