Ignore hashtags, user mentions

LanguageTool now ignores URLs and email addresses (i.e. they are not underlined as spelling mistakes). Would it be useful to do the same with hashtags (#hashtag) and user mentions (@username). What do you think?

This issue was mentioned some time ago here:

1 Like

Yes, also see:

I think having a global disambiguation.xml (that applies to all languages) sounds like a nice solution.

I agree, it sounds nice. So we have: file names, domain names (like languagetool.org), hashtags and user mentions.

Iā€™m wonderig if we will have false negatives in some language. For example, domain names like ā€œaaaa.esā€ could be an error, although an unusual one (a missing white space plus a missing capitalization). But ā€œesā€ is a common word in several languages.

I created these rules in Catalan in order to test them: [ca] ignore filenames, hashtags, user mentions... Ā· languagetool-org/languagetool@d9f5070 Ā· GitHub

They seem to work well, without problems.

A global disambiguation.xml file would be easier to maintain (and can be used for other rules, proper nouns, etc.). But it will not allow fine-tuning for each language, so it has to contain only rules fully acceptable to all languages.

I can try to implement the global file in October.

One of the things I see happening a lot, are tags that are incorrect, existing of two parts: #hash tag or #hash-tag (In Dutch, the - is a valid word char, but not a valid tag char for Twitterā€¦

Warming up this old thread.

If I run the sentence ā€œHallo @sprache, #sprache ist wichtig!ā€ with language ā€œdeā€, it does not find any issues. But if I use ā€œautoā€ it complains about ā€œspracheā€ in each case. Is this intended?

curl -X POST --header ā€˜Content-Type: application/x-www-form-urlencodedā€™ --header ā€˜Accept: application/jsonā€™ -d ā€˜text=Hallo%20%40sprache%2C%20%23sprache%20ist%20wichtig!&language=auto&enabledOnly=falseā€™ ā€˜https://api.languagetoolplus.com/v2/checkā€™

With language=de you donā€™t get spelling errors. You need language=de-DE (or auto) to get them. That is on purpose.

That means that hashtags and mentions are not ignored in German. (But they are ignored somewhat in the web page languagetool.org, arenā€™t they? @tiff)

Should we implement the global disambiguation.xml we talked about? @dnaber

Ah thx.

Interesting that it doesnā€™t alert me that the language ā€œdeā€ doesnā€™t exist and that I should use ā€œde-DEā€. But indeed if I use ā€œde-DEā€ I get the same feedback as for ā€œautoā€.

Awaiting the feedback if this will be handled via the global disambiguation.xml or if I should handle it on my side.

Thatā€™s nice! But I wonder if there could be a list of more popular addresses to correct in these cases. So, if I type user@gamil.com it wouldnā€™t be ignored as there would be an exception for gmail; same for Google, Microsoft, Apple, and so on - and even langaugetool.org, I mean languagetool.org

1 Like

Thatā€™s a good idea. Thank you. I have opened an issue: [all languages] find errors in popular emails or web addresses Ā· Issue #5890 Ā· languagetool-org/languagetool Ā· GitHub

1 Like

Semi-related, I guess in hashtags the main thing to ignore is case and replacing spaces with dashes or omitting them entirely. but if I write #safethewales ā€¦ I would still want it to tell me that it should be #savesthewhales potentially

Do you mean #save_the_whales?

https://www.hashtags.org/featured/what-characters-can-a-hashtag-include/

opps yes :slight_smile:

Maybe, when looking into this, there is also a way to ignore :hug: and the like?

1 Like