[pt] Some false positives in my thesis - 2016-12-16

Tiago, I am not sure when I will be able to do it myself, feel free to change/improve as you wish.

:slight_smile:

Kind regards,

Please also consider that there’s a feature freeze from tomorrow, i.e. no bigger changes should be committed after that to make sure the release is stable.

@marcoagpinto
Many thanks. I will see what I can do in the meantime. If you see some controversial change, just tell.

@dnaber
I am aware of that date. I had planned to do review work on the XML parts on that period. That is one of the reasons I have been making massive commits lately.
I plan to standardized rules display, group rules (so the options areeasier to navigate and make more sense), make some rule name improvements, standardize message and marker tag usage and improve categorization. If I see ways of improving existing rules by reducing false positives, I believe this would fit the tasks for this period, so I would also do it.
I was also planning on having the morphological dictionary, while maintaining the content of the correction files added.txt and removed.txt. Since I will diff the version in use with the new version from freeling, it will be possible to evaluate the impact of this change. Viable?

This should be enough.

I leave it up to you to decide what’s a bug fix or low risk change. Just keep in mind that we don’t need to put everything in this release, the next release will already be in three months. And we should really avoid bad bugs, because making a bug fix release like 3.6.1 is quite some work for me, I’d like to avoid it.

Thank you for the vote of confidence.
I consider all these changes just the “polishing” part. I am not expecting regressions, but on this period I will be more rigorous with the changes I push. The “strategy” I used on this release was:

  • make big changes first;
  • allow testing, complains or bug reports to arise;
  • adjust accordingly;
  • and finally, tighten everything, and polish loose ends. This is the part I reserved for the ‘feature freeze’, since, literally, no new feature will be implemented.
  • I will stop making any type of changes 2 days before release. This should safeguard any odd regression like tabs instead of spaces, a lost signal that blocks a section, changes in automated tests, etc.

@marcoagpinto

I have finished the Freeling fork, and updated the binaries. I began testing and so far, all seams good. If you have the opportunity, test the files in FreeLing/LanguageTool/pt at master · TiagoSantos81/FreeLing · GitHub
You can see the history to review the changes and source file manipulations.

The readable data list in use in the new POS dictionary and Synthetizer is in portuguese.dict.txt. You can use gitk DAG function to view the diffs between commits, since I made a base dump of the dictionary used in LT for comparison.

Unless you raise any relevant issues with this new file or amendments are needed, I will push this version on Saturday. Note that I have adapted the Freeling tags so that we do not have to change LT rules. All build and rule tests pass when dict files from commit af6711d are used in LT.

1 Like

Good work, @tiagosantos

:slight_smile:

I can only test in the field after I have a nightly with the changes, so that I can open my thesis again and this time scroll through the whole 291 pages.

:slight_smile:

Ok. Then I push the changes tomorrow. I already pushed strings and rule group improvements today, and I need to confirm if there are any regressions. This change sets the deadline for morphological dictionary review as Saturday 24th.

Off topic: I suggest using the LT browser add-on, it would have spotted this error (them/then) :slight_smile:

It would. The plugin is really awesome! Added to the TODO list.:slight_smile:

@marcoagpinto

The dictionary update and required changes to LT have been pushed. Next nightly should have the changes for you to test. I recommend making a dump of the dictionary, so that you can double-check its contents.

@tiagosantos

The following words give a false positive:
“UNIVERSIDADE TRÁS-OS-MONTES E ALTO DOURO”

It suggesting changing “trás” to “traz”.

Fixed. In these cases, the best way is to add a restricted antipattern to the affected rule. For one example, see:

@tiagosantos
@dnaber

LanguageTool complains that “DE” doesn’t start in uppercase (see screenshot).

Can it be fixed before the official release date?

Thanks!

The release will be tomorrow morning, i.e. in a few hours, so please stop changing stuff…

This is related to the generic upper case rule, so this false positive is here for at least the last couple of years.
Anyway, as Daniel referred, it is already too late to fix anything, except if it is to fix something that breaks the build. On that sense, everything seams perfect and ready for release.

Odd. I tested the latest daily build, and I can not reproduce that issue.
No false positives for any capitulated words, including that specific title. Everything seams to be working as intended in that rule.
Please upload a document with that false positive for analysis, and register this issue in Github bug tracker, as a reminder for later work.

thesis_cover_bug_20161228.zip (29.1 KB)

@tiagosantos

Here it is. Can you confirm the issue?

Thanks!

Kind regards,

You have “upper-case” in the character format, and LT hasn’t this format information. What LT sees is “de”.

In this case the proper thing to do is use a line break (Ctrl+Enter) after “Universidade” and after “de”, instead of a paragraph break.

Thank you Jaume.