I have been coding a feature into Proofing Tool GUI that will allow to have dictionaries and extract POS data from them into a .txt file, ready to copy/paste into added.txt .
If a noun can be both countable and uncountable, it is NN:UN. Example: oil.
But, I read the explanation in tagset.txt again: NN Noun, singular or mass: bicycle, earthquake, zipper
I think the text is not correct. I think it should be: Noun, singular count noun: bicycle, earthquake, zipper
@danielnaber, @tiff, can you clarify the meaning of NN? Is the explanation in tagset.txt correct?
@marco, I looked at GB_uncountable_addedtxt_20200525.txt and GB_uncountable_spellingtxt_20200525.txt, which are in GB_uncountable_20200525.zip.
I do not understand what you are doing. What does “adding morphologic information to the GB speller” mean? What information do you add and where do you add it?
File GB_uncountable_addedtxt_20200525.txt contains POS information in the same structure that we use in added.txt. But, added.txt is for all variants of English, not only for en-GB.
Why is it necessary to add POS or spelling for actinium? The POS is in LT and is not recently added (the POS is in LT 4.8). LT 4.8 correctly gives no spelling warning for actinium in any of the language variants.
This screen shot of LT 4.8 shows that most words in GB_uncountable_spellingtxt_20200525.txt do not give a spelling warning for BrE:
The Tagger Result dialog shows that some of the words already have the postag NN:U.
I am the maintainer of the British speller for ~7 years.
For several months that, while I search for possessives and plurals in words already in the dictionary (and new ones), I have been adding extra information for the words in the .dic, such as “Noun: Uncountable blah blah” based on Wiktionary.
Now I have coded a feature into my Hunspell tool “Proofing Tool GUI” that allows to extract words with defined POS information.
In the zip above, In Proofing Tool GUI I simply added a “source POS” (.dic) and a “target POS” (.txt - LanguageTool) and extract both in added.txt and spelling.txt format.
For each word that gave an error for BrE, I checked the word on www.lexico.com and www.merriam-webster.com. If I thought that the word is applicable to all variants of English, I added it to spelling.txt. If I wasn’t sure, I added it to spelling_en-GB.txt.
I didn’t add ‘benzedrine’, because that is derived from a proper noun. (LT spell check suggests Benzedrine.) When we have a rule for benzedrine/Benzedrine, we can add the lower-case spelling.