Back to LanguageTool Homepage - Privacy - Imprint

Suggesting words for the dictionary


(Michael) #1

Is there a reason why I can suggest words for inclusion into the dictionary when checking German text but not when I am checking text in English?


(Lodewijk Arie van Brienen) #2

Two possibilities:
A. It is an experimental feature.
B. The English dictionary is believed complete enough not to need it.

I hope it is the former.


(Daniel Naber) #3

We have about 30 suggestions or so per day for German, so we'd probably have the same amount for English. We'd need a native speaker to work through the suggestions regularly. I'm not sure if I have asked @Mike_Unwalla before...


(Tiago F. Santos) #4

C. This feature requires maintainer vetoing of suggestions and that is a very time-consuming task.
As such, it is activated manually for each language, if there is a maintainer capable and willing to do suggestions triaging. Unfortunally, there is no such thing as a 'complete enough' dictionary for extant languages.


(Lodewijk Arie van Brienen) #5

What I mean with 'complete enough' is that the number of missing words is so small that few ever have problems with it.
And those that do have can easily report them via the forum without being lost in a sea of similar posts.

One of the determining factors for the effectiveness of a feature is the ratio of reduced to introduced overhead.
It's possible that with English the ratio falls primarily on introduced overhead (EG: 1:100),
while with German it falls primarily on reduced overhead (EG: 100:1)


(Tiago F. Santos) #6

So, adding a new word to the dictionary introduces more overhead than adding a recognition pattern for borderline false positives?
If double standards are not on the table, I guess we won't see more of that kind of reporting in the future.


(Mike Unwalla) #7

You haven't.

I cannot commit to a daily check. Some weeks, I spend many hours on LT, some weeks none. In principle, I'm happy to review the suggestions from users. But, I will then have less time to fix the other suggestions/corrections that users send.

A page on which users can supply suggestions is a good idea.I don't know how the system works for German. This is what I would like to see for English:

For each suggestion, a user should supply a minimum of one example sentence and a reputable source (Webster's, Longman, Shorter Oxford, ...). Ideally, on the page, have a status indicator for each term (submitted, accepted, rejected). For the rejected, give a reason. (This is similar to the method used for dealing with comments during the development of an ISO standard. It shows that we take comments seriously.)

Sort the suggestions alphabetically.

Do not let a user suggest a word that previously was suggested. Just give a message 'previously suggested' and link to the existing suggestion (which will show its status).


(Michael) #8

The current system for German is much more simple than your suggestion. If a word is not recognised by the spell checker you can submit it via the context menu. There is no possibility to submit sources, comments or examples and there is no feedback for the user.


(Lodewijk Arie van Brienen) #9

I was talking about possibilities, not certainties.
and overhead can have (and often has) multiple layers:
A. Overhead for the user.
B. Overhead for the programmer/maintainer/crew
C. Overhead for the system.

A stable system has at least stable distribution of overhead
(and this is, most likely, not a case of good for the goose, good for the gander)


(Tiago F. Santos) #10

This makes perfect sense in for profit software development. I fully understand the law of diminishing returns.
Just like most LGPL/GPL based projects, this is an incremental community based project.

I believe there is no formal long term development plan, but loosely speaking, Agile model fits better how these projects use to work in practice:

Daily regression tests and such...


(Lodewijk Arie van Brienen) #11

You're thinking of financial overhead, while i was talking about the much broader resource-overhead.

EG: What worth is a 'feature' if it requires 24 hours of maintenance for one minute of use?
especially when there are other means with better maintenance-to-use ratios?


(Tiago F. Santos) #12

I am thinking in all aspects.
But let me indulge you focusing on that point.

If there is nothing else that can be reasonably done to improve the dictionary, with a reasonable expenditure of human resources, that means that LanguageTool already has a feature that has not been superseded by any other ventures that have to spend financial resources to achieve it.
That would mean that it is already the segment leader on that feature.

Controversial, but quite flattering. I suppose.

Using the specific dictionary case:
Each forum reader spends x minutes posting their custom ignore list.
What is the estimated differential in development time, total user time (=reporting minus time saved searching for confirmation of unknown spellings) and system time processing the extended dictionaries?


#13

Is there a reason why human resources are even needed. As long as you require sources such as Webster's, Longman, Shorter Oxford. Then you might as well as automate the whole thing and when the user puts in the url, it would just parse the page.


(Lodewijk Arie van Brienen) #14

A. Without human intervention such automation can make really stupid decisions.

B. Humans (in general) have a much better track-record for explaining why something was rejected.

C. (cont. B) with fully computer-automated system false rejections would be much harder to overturn.


#15

That is the case when dealing with variables (especially other places that have use input). But if you are limiting yourself to select reputable sources, there is no stupid decision to make, all the stuff are already pre-vetted for you.

There would also be no rejections in that case. Because if you validate the url with javascript, the only rejection the user will get is "Invalid URL" and that would be instant.


(Lodewijk Arie van Brienen) #16

That's not the kind of rejection I meant.
The kind I meant often ends with: REQUEST DENIED.

When you have the 'luck' that the automated system gives a reason, it often is:
A. too generic (EG: there was an error in the data ... where was the error? What kind?)
or B. more cryptic than a GURU MEDITATION.


(hi pandas) #17

Hi!!! Iā€™m Hi pandas and I can include words into the dictionary in English, but you should try spelling a word wrong, and then seeing if Language Tool fixes it!


(Jan Schreiber) #18

Now what? Please describe the behavior you want to see in more detail.
Thanks for your report! User feedback is very important for us.