No suggestions for typos in French dictionary

hello!
we’re trying the PoC for the French dictionary and, although typo mistakes are highlighted as such, we get no suggestions to replace them.
We’ve tried the same with Spanish and English and it works fine.

Is there any problem with the French tool?

Thanks

Hi Patricia, this is a known limitation for French. Maybe @Dominique_PELLE can comment if he has plans to fix it.

Regards
Daniel

Thanks, Daniel.
I will try to contact Dominique.

P.

For spell checking, some languages use FSA dictionaries and others use Hunspell. French uses the Hunspell dictionary from the www.dicollecte.org project. Unfortunately, spelling suggestion with Hunspell were deemed too slow to enable them. I did not want to switch to FSA, because the quality of spell checking with Hunspell was better than with FSA. The quality of spell checking and speed of LanguageTool matter more (to me at least) than suggestions. I don’t remember now what were the limitations with FSA. I think it may be Hunspell ICONV directive which was not supported, among perhaps other things.

Now thinking about it. I wonder whether it would be possible to propose spelling suggestions lazily, when the user requests them for a single word, instead of finding suggestions for all typos as LanguageTool currently does now. It would be much faster. It would not be applicable to the command line version of LanguageTool, but lazy suggestions would make sense in theory for the LibreOffice plugin or for the online web demo. I don’t how doable this is in practice.

Dominique

I don’t think lazy suggestions are possible in LibreOffice, as we have to return an object with all information at once (error positions, suggestions etc.) But we had the quality problem for German too, mostly because German has compounds. We solved it by using a combination of both hunspell and FSA. Anyway, the better approach would be to use FSA only. Maybe the situation has improved, morfologik had a new release just yesterday.

Apparently, there is still no improvement about this. Is there a way to enable suggestions for French despite the time it may take ? Or should I fork the project ?

Hunspell 1.6.0 announced many performance optimizations on the suggest() function.
This was the only relevant change since 2011 (last languagetool hunspell libs update), so a hunspell-native-libs may yield good results, and allow suggestions for languages that are concerned about performance.

2017-09-03: Hunspell 1.6.2 release:

  • Library changes: no. Same as 1.6.1.
  • Command line tool:
    • Added German translation
    • Fixed bug with wrong output encoding, not respecting system locale.

2017-03-25: Hunspell 1.6.1 release:

  • Library changes:
    • Performance improvements in suggest()
    • Fixes regressions for Hungarian related to compounding.
    • Fixes regressions for Korean related to ICONV.
  • Command line tool:
    • Added Tajik translation
    • Fix regarding serching of OOo dicts installed in user folder
  • Manpages:
    • Fix microsoft-cp1251 to cp1251. Dicts should not use the first.
    • Typos.

2016-12-22: Hunspell 1.6.0 release:

  • Library changes:
    • Performance improvement in ngsuggest(), suggestions should be faster.
    • Revert MAXWORDLEN to 100 as in 1.3.3 for performance reasons.
    • MAXWORDLEN can be set during build time with -D defines.
    • Fix crash when word with 102 consecutive X is spelled.
  • Command line tool:
    • -D shows all loaded dictionares insted of only the first.
    • -D properly lists all available dictionaries on Windows.

In dicollecte, ICONV is only used to make hunspell typographical symbol aware. It is a feature that is hardly used by languagetool users.

In dicollecte, ICONV is only used to make hunspell typographical symbol aware.
It is a feature that is hardly used by languagetool users.

I disagree. I prefer to have a good spelling checker than having suggestions.

Now does anybody knows how much faster have spelling suggestions become
with the latest Hunspell? I suppose we should upgrade Hunspell in LanguageTool.
And if it’s fast enough, we can consider re-enabling spelling suggestions.

Or should have have an option to enable spelling suggestions?

I don’t have hard numbers, but I think the changes didn’t help that much. You can easily try this on the command line I guess without any LT integration by just sending a text with many misspellings to hunspell (or a German text when hunspell is expecting French text etc).

An option in the API would be really really great! I think that this way LanguageTool would be able to adapt to all situations.

1 Like

The easiest way to have an option would probably be to just have two rules, one turned off by default. Then to turn on the other rule, you’d need to turn it on and also turn off the rule that’s on by default. This way we could use the turn on/off parameters that the API already has.

The easiest way to have an option would probably be to just have two rules, one turned off by default.

It would be a bit awkward that we can then enable both rules (spelling with and without suggestions)
Well, if both rules are enabled, LT could internally enable only the rule about spelling with suggestions.

What about JLanguageTool.enableSuggestions(), or JLanguageTool.setSuggestionsEnabled(boolean enable)?

Most languages have suggestions enabled and I don’t think anybody wants to turn them off. So having a special case for French isn’t so elegant. That’s why I suggested the “2 rules approach” above. Not elegant either, but at least it’s doable with the code we already have.

I see. It makes sense. Ok. So, JLanguageTool.disableRule(“SUGGESTIONS”), something like that?

Something like this - but we still need to implement both rules first:

JLanguageTool.disableRule("SPELLING_RULE_WITHOUT_SUGGESTIONS");
JLanguageTool.enableRule("SPELLING_RULE_WITH_SUGGESTIONS");

And it would still be a special case for French. I’d suggest going the same way as for (almost) all the other languages, namely using Morfologik for getting suggestions. But it still needs to be done, and I don’t have time for that now.