Released. I made it an integrated extension and with hyphenator, thesaurus and both dictionary versions. Try it here:
It has changed roughly 30k dictionary entries, so some may need a fix. Extensive review is required before integration in LO, but I believe that in a week we can integrate in LanguageTool for wide usage testing.
It would have to be set as a pt-AO locale to avoid reporting conflict. The pre-AO version is also more conservative. It is limited to improvements in gender and number variations.
Good enough?
Any news on this, or on LibreOffice add-on testing?
NOTE: All feedback is welcome. This thread is not an “internal discussion” so all users testing the new dictionary version are welcome to provide constructive criticism.
Sorry… I spent the day coding to fix some issues in the PhD project (software).
Moments ago I installed the OXT in LO 5.3 and here are most of the words that appear as typos in the thesis (most probably already suggested by me to Minho University) (notice that I used M$ Word 2016 pre-AO to write it):
Jorge Canelhas
co-orientado
Teresa Baptista
Isabel Pita
SeaMonkey
Edma
Ortins de Bettencourt
Assembly Z80
shareware
freeware
Commodore
PCs
hobby
para Windows, Linux e Mac
beta-tester
desenvolvedor + desenvolvedores
Hunspell
SeaMonkey
far-se-á
disruptir
hacking
autoproclamada
insurgências
sobrelotados
botnets
zombies
multiagente
polímata
XYZ
subnacionais
Osama Bin Laden
destabilizando
Sadam Hussein
franchisings
proibitivos
hackers
cracks
crackers
Khadafi
sobrelotadas
raides aéreos
Análise Bayesiana
grupal
interorganizacional
know-how
sememas
interrelacionadas
difundível
ataques DoS/DDoS
interconectividade
interconectados
SPAM (check if it can be written in lowercase)
token + tokens
hash
interrelacionam
experienciámos
cluster
inspiracional
feedback
robots + robot
pseudo-inteligência
co-fundador
superinteligência
pseudocódigo
GUI (Interface gráfico)
pixéis (não sei se leva acento)
semiautónomos
impassáveis
aleatoriedade
circunjacente
impassável
passável
baseamo-nos
ID
gadget + gadgets
recomputado
DirectX
Pentium (processador)
SSD (disco)
CPU
rastreamento
background
screenshot
lossless
co-senos
subopções
reset
AutoCAD
pixelização
Excel
cache
browser + browsers
Bin Laden
Here are most of the words that appear as typos… again, notice that some may be pre-AO.
As you can see, even I didn’t want to, I had to use M$ Office because the pt_PT speller doesn’t recognise tons of words.
@tiagosantos
I have also tons of false positive grammar suggestions reported by LanguageTool, which I will make a list when I have some more free time.
Thanks for dedicating all this effort and time to the projects.
Well, those are words not included. Not actual dictionary errors. The revision has to be on false negatives, i.e. word derivations included that are incorrect.
Using your examples. If ‘rastreamenta’ (feminin of rastreamento) was considered correct, that would be a dicionary error.
Moreover, most words are foreign words or proper names (foreign or rare). See our replacement tables or the VOP for more information. If you really require them, you can add them to spelling.txt here on LT. Most are barbarisms that you should avoid, and that have no place in a proofreader. See:
Sure. Just try to show the “regular one”. We have daily regression testing, so we can avoid odd test cases like the ones in:
agreement issues with proper names, and false negatives in brands.
Dedidate your time to finish your PhD. I already have people looking into it. Thanks.
I can not reproduce this issue. Try ticking the category first and see if it works. I believe this will be enough.
However, if it doesn’t solve, this has to be a Java implementation related bug, not specific to Portuguese.
In this case, the best way is to make a general bug report in GitHub. It might be useful later.
Could you create a rule for “demais” and “de mais”?
I will give you an example of the rings translation document:
“Os anéis ajustáveis são tamanho único. Se, após ajustares a dentição ao limite, os anéis continuarem pequenos demais, remove então cuidadosamente a barra junto do lado liso com uma lâmina afiada e pule a junta com uma lima para as unhas (vê o diagrama).”
I tried using “de mais” and “demais” and I get no suggestion.
I tried the FLIP on-line from Priberam and they too don’t make any suggestion.
PS-> You were right regarding the Capitalization rules. I have been able to remove hits in months names and such.
Many thanks for this rule, Marco.
The logic is almost correct. The problem is that inflected=‘yes’ only applies to the words inside the token parameter, not inside the tag, nor with the postags.
This: <token inflected='yes' postag="VMN0000"></token> should become <token postag_regexp='yes' postag="V.+"/>.
I believe this will do it. To improve, you can also generelise the Noun. For example, in the first subrule: <token postag="NCMP000"></token> use <token postag_regexp='yes' postag="N.MP.+"/> and it will work with diminutives, aumentatives, superlatives and proper nouns.
By the way, I was able to add one case from the ‘de mais’ ‘demais’ rule, but I believe that is a rare confusion case. Check git for details.
Test if it works as you expected and, if you wish I will push it after testing to the repo.
That is a great one. There is already a group that addresses non sexist redundancies, but I believe it doesn’t address that situation. That rule would be a great complement.