False positives in hifenation is a complaint since:
Unless there is a way to selectively disabling a list of terms in the post-reform-compounds.txt (for pt-BR) you need to revert all changes to it since 82083f9
While checking the new code in grammar.xml, I get an exception in one of your rules:
Running pattern rule tests for Portuguese… The Portuguese rule: HIFENIZADOR_VERBOS_1[1] (exception in token [1]), token [1], contains “como|para|casa” that is not marked as regular expression but probably is one.
I did the test after implementing Yakov’s “há n tempo atrás” rule improvement.
@Daniel
If in the past, simple unfitteness could be an acceptable excuse, now it is quite obvious to anyone following this that Marco is intentionally hindering the project.
Sorry to keep pulling you into this, but, I believe there is a need for arbitrage here.
Removing all the two term verbal form compounds isn’t the answer.
We shall see with today’s diff if the purge I did solved the problem.
And the Brazilian guys usually have the second part of the verb before the verb, so I believe my fix will work.
@dnaber@tiagosantos
I am not hindering the project, I just want it to work as good and accurate as possible. There is no point in having tons of rules if they produce tons of false positives.
This is not about what I want.
I have nothing against mass additions to the files. Had you done such work in the past and I might not be here in the first place.
But, your solution is not “fixable” because it excludes users of Brazilian Portuguese.
I am from Portugal. Speak portuguese. Brazilian users also have CGooGR and Lightproof, both solutions that I had considered porting from scratch to European Portuguese.
Despite all that, common language assets must be developed accomodating both language variants and Brazilian Portuguese user base is 20 times larger than the one from Portugal.
You might have noticed that I created a directory for pt-PT. It is not active yet, but once I figure out how it will be. After that, solutions like yours can become more palatable, if done right.
Please, let me avoid these type of Monty Pythonesque situations again.
That commit is from the 05th November.
If we consider the detections introduced by rule HÁ-ATRÁS, there was 43 hits on the regression test of 5th November. But they don’t change results since they change only suggestions and whitespaces in the pattern.
So… 113 matches that you do not even know from where they come from and you still think it is a good solution having 120k or random words added? It seams to me that it is not reviewable even by you since you can not find the errors it introduces and I believe it is not up to me, to find them, is it?
I told you before, and I reaffirm:
Put your name in the file, if you want to, but revert your changes.
They do not had nothing that is not done in a easier to maintain way, by a rule.
More, if you find any way to compress my rules into a lower number of total rules, please, tell me and I will also do it.
@tiagosantos
Tiago, I am very stressed and can’t think properly right now.
Could you please revert my changes in the 120k compounds then?
In January I will try to add compounds to it but only words and not verbs.
I will have to analyse the words in the pt_PT speller to see which ones are not verbs and it will take a long time.
Right now I need to dedicate more time to the PhD project since next week I will be a few days in the North with the PhD coordinator and won’t have the chance to do much.
Tiago, I noticed the other day that one of the redundancy rules you added already existed added by me months ago, the “hemorragia de sangue” which means there are two rules for the same.
Sure Marco. I will just replace the word list and leave the header as you saw fit.
I am certain you will add many more words in due time. The verbs are already covered.
This is an extra, so focus on the PhD. No need to do much. Just consistent improvement over time. We will get there.
No worries. I will review coverage and comment out the redundant one.