Back to LanguageTool Homepage - Privacy - Imprint

Content of multiwords.txt file


#1

What should content of file “multiword.txt” be?

If a language has cases (7 in Serbian), must I list multiwords in all seven cases so that disambiguator can disambiguate them properly? Or there is a different approach?

Thanks for help


#2

Is anyone willing to help?


(jaumeortola) #3

If you want to use multiwords.txt, the answer is yes, you must write 7 lines, one for each case.

Alternatively, you can write a rule in disambiguation.xml. Depending on what do you want, you could get it using one or two rules.

Give us an example, and we can try to write here these rules.


#4

Thank you @jaumeortola for replying. Here is an example:

In Serbian language word „црвена“ is adjective and means “red”. Word „звезда“ is common noun and means “star”. However, together they form personal noun „Црвена Звезда“ (“Red Star”), a football club in Belgrade.

I want that „Црвена Звезда“ is properly tagged as personal noun in all seven cases of Serbian language. Hence my question: must I write in multiword.txt something like:

Црвена Звезда personal_noun_nominative_tag
Црвене Звезде personal_noun_genitive_tag

Црвеној Звезди personal_noun_locative_tag
?
Thanks again for help.


(jaumeortola) #5

Yes, this seems to be the best solution. Using disambiguation.xml wouldn’t be a good solution.

If you need to do this frequently with a lot of expressions, I would consider writing a specialized tagger method in Java.