2022 has begun, and I expect it to be a changing year for open-source.
I will dedicate all my knowledge to open-source, so that in 2023 it will be the primary choice for users.
In 2022, I will be revising all Portuguese grammar rules and adding new rules to LanguageTool.
If you have suggestions for rules or missing/incorrect dictionary words, don’t hesitate to tell them.
I have developed an advanced linguistic tool, Proofing Tool GUI (PTG), which also has basic support for LanguageTool: https://proofingtoolgui.org
It has a tab named “LanguageTool”, which is still experimental, but useful… for example so that you can know the structure of grammar.xml files.
Also, in the last two menus in PTG, you can sort and remove duplicates from wordlists/text files.
For example, if you have a part in spelling.txt and added.txt with proper names, you can open the file with a text editor, cut the words from it and paste them in the sort gadget of PTG, then you can go to the gadget and cut and paste them in the “delete duplicates” part of PTG, then paste in the original added/spelling text file.
Hi @marcoagpinto, great! I am surely willing to contribute and help in whatever I can, I have some ideas for rules, we may discuss it soon, for now, January is my vacations month, but after I’ll be ready to work.
Hi @marcoagpinto holidays were great!
Now, back to work, there is really a lot to do. Where to start? I am interested in the nos/nós distinction and in the esta/está, but if there is something more specific or urgent to do, tell me and I can start to check it.
I just need to adjust its first rule because it stopped recognising one of the sentences I had as an example.
Then I found out that I had another “esta/está” rule, so I need to check if the one released today detects the same as the older one.
The “nos/nós” gives tons of false positives.
I also wanted to improve one or two of the rules we worked together months ago. I wrote some notes on how to fix false positives or improve detection, I just need to find what I wrote and the rules.
Also, sometimes I am bad at naming rules, so I labelled them: “Verbo + Adjetivo + Substantivo → Verbo blah blah blah”, now I want to add at the start of the label: “Simplificar: blah blah blah”
So much to do…
I need to rewrite most of the older rules to make them more accurate and flexible since some of them were too narrow detecting mistakes.
That is why I was waiting for 2022, so that I can add to the rules that they are “1-JAN-2022+” versions where I have applied all my past knowledge.
I found the results pretty impressive! Some errors are due to the LT: in ‘esta postagem’, ‘postagem’ isn’t receiving any tag; in ‘esta sim’, ‘sim’ is also tagged as noun. I didn’t see the entire file but I can do it and send other cases like this to you.
Ok! As for ‘sim’, yes it can be a noun as many many words and the LT tagger shows all tags and consider them equally. If there could be way to have a preferrable tag… Because ‘sim’ is 99.9% of times an adverb, only rarely it is a noun. This would solve a lot of problems, including ones with lemmas, that have the same kind of situation: some words lead to two or more lemmas when only one of them is the most used.