LanguageTool 2022+


2022 has begun, and I expect it to be a changing year for open-source.

I will dedicate all my knowledge to open-source, so that in 2023 it will be the primary choice for users.

In 2022, I will be revising all Portuguese grammar rules and adding new rules to LanguageTool.

If you have suggestions for rules or missing/incorrect dictionary words, don’t hesitate to tell them.

I have developed an advanced linguistic tool, Proofing Tool GUI (PTG), which also has basic support for LanguageTool:

It has a tab named “LanguageTool”, which is still experimental, but useful… for example so that you can know the structure of grammar.xml files.

Also, in the last two menus in PTG, you can sort and remove duplicates from wordlists/text files.

For example, if you have a part in spelling.txt and added.txt with proper names, you can open the file with a text editor, cut the words from it and paste them in the sort gadget of PTG, then you can go to the gadget and cut and paste them in the “delete duplicates” part of PTG, then paste in the original added/spelling text file.

It is as simple as this.

“The future is now!”

Kind regards from,
>Marco A.G.Pinto

Hi @marcoagpinto, great! I am surely willing to contribute and help in whatever I can, I have some ideas for rules, we may discuss it soon, for now, January is my vacations month, but after I’ll be ready to work.

I, too, have ideas for tons of rules, and I have also started revising the current ones.

See the document:
Rules Structure 20220101.odt

I am changing the position of the rules, revising, improving, cleaning, better message suggestions, etc.

As they are ready, I highlight them with a pinkish or orange colour.

I will need some help figuring out the proper place for tons of them (see the first page, which shows the structure of the grammar.xml).


Kind regards,

I hope you had nice holidays.

I have spent the last week working on a rule.

Today I need to release the British speller and won’t dedicate time to LanguageTool.

In attachment is the current status of the grammar.xml in .ODT (open document format) for Writer.
Rules Structure 20220202.odt

Hi @marcoagpinto holidays were great!
Now, back to work, there is really a lot to do. Where to start? I am interested in the nos/nós distinction and in the esta/está, but if there is something more specific or urgent to do, tell me and I can start to check it.

What is wrong with “esta/está”?

I just need to adjust its first rule because it stopped recognising one of the sentences I had as an example.

Then I found out that I had another “esta/está” rule, so I need to check if the one released today detects the same as the older one.

The “nos/nós” gives tons of false positives.

I also wanted to improve one or two of the rules we worked together months ago. I wrote some notes on how to fix false positives or improve detection, I just need to find what I wrote and the rules.

Also, sometimes I am bad at naming rules, so I labelled them: “Verbo + Adjetivo + Substantivo → Verbo blah blah blah”, now I want to add at the start of the label: “Simplificar: blah blah blah” :slight_smile:

So much to do… :slight_smile:

I need to rewrite most of the older rules to make them more accurate and flexible since some of them were too narrow detecting mistakes.

That is why I was waiting for 2022, so that I can add to the rules that they are “1-JAN-2022+” versions where I have applied all my past knowledge.


ahhhh… here are the results of the “esta/está” rule released today:

I have uploaded the .txt file in a comment.

There are some false positives because the sentences either have typos such as accents missing or spelling errors which causes issues.

Ahhhh… Daniel Naber changed the forum settings so that .txt files can be attached.

Here are the results of this rule:
afternew76.txt

I found the results pretty impressive! Some errors are due to the LT: in ‘esta postagem’, ‘postagem’ isn’t receiving any tag; in ‘esta sim’, ‘sim’ is also tagged as noun. I didn’t see the entire file but I can do it and send other cases like this to you.


I have just added postagem + postagens:

Regarding “sim”, Priberam says it is also a noun:

Ok! As for ‘sim’, yes it can be a noun as many many words and the LT tagger shows all tags and consider them equally. If there could be way to have a preferrable tag… Because ‘sim’ is 99.9% of times an adverb, only rarely it is a noun. This would solve a lot of problems, including ones with lemmas, that have the same kind of situation: some words lead to two or more lemmas when only one of them is the most used.

The disambiguator’s task is to state which word is which.

Only @jaumeortola knows how to code the disambiguator, I don’t have a clue.

I only changed it once and broke the build.

