Porting Language Tool to Python

Hi, I’m keen on porting some of the Language Tool functionality to Python (open sourced, probably MIT license).

More specifically, I’d like to match on tokens based on its text, POS tag, etc. given the rules in grammar.xml. I was also going to translate the grammar.xml to grammar.yml for readability. There will be some enhancements like adding dependency tags to match on (so possibly another grammar.yml).

I wasn’t sure what was permitted with your license. Would it fine to copy over the grammar.xml to my repo (with the LT license) and have a README that link to the github repo?

Frankly, as the developer of LT I’m not a big fan of this idea. Like many Open Source projects, LT could be so much better if we had more contributors. But we don’t. Now if the very limited resources we have get spread over even more versions of LT, I think that would not help improve LT. Also, we offer a very easy to use HTTP server that can be started with a single command and that returns JSON which can be used from any programming language.

Another aspect is that LT relies on quite some external libraries, and they will probably not all exist for Python. So these features will be missing. We also have complex Java-based rules which either will need to be ported or they will simply be missing. Plus statistics-based detection of errors. Plus ongoing work on neural networks.

Of course the license allows you to take the LT files and use them in your project, no matter what language you use. Using .yml instead of .xml might be a good idea anyway (also for the Java version).

I understand.

Though I’m thinking porting only the grammar.xml and not the entire program.

For example the NLP package spaCy has a matcher API. I could write some rules for matching on grammar, but I can avoid the cold start problem by porting over the LT rules.

So it’s less an LT port but a grammar matching package for spaCy augmented greatly with LT rules (grammar.xml). There will be differences too, e.g., spaCy and LT return different sets of properties to match on.

I agree if it’s just using LT in Python, using the server would be best.

If there’s interest I could contribute a yml translation script.

Also didn’t know about the neural network feature. Looks good.

Could you provide an example how a rule in yml would look like?

Here’s something that I’m working on (for “as follow” rule).

typos: 
  as_follow_as_follows: 
    corrections: 
      - Do you mean "as follows"?
    description: ~
    examples: 
      - We can elaborate this distinction as follow.
    patterns: 
      # direct string match
      - as follow
      # match on list of dictionaries
      - 
        - 
          LOWER: as
          POS: ADP
        - 
          LOWER: follow

Not to resurrect a dead thread, but I am interested in this as well. My Python skills are much better than my Java, and I think that is true for many in the NLP community. Also, it seems like, at least for the word2vec component, some parts of LT are done in Python anyway.

Given the timing of this question and its contents, I assume that OP is the author of the spacy-grammar extension, which seems like a good foundation.

1 Like

Fidus Writer is a python program using languagetool through the API. But it’s a pain to package due to the Java dependency. It’s not quite problematic enough to do much about it, but if there is a Python version available some day,. I’m sure we’ll use that instead.

Sorry to revive this yet again but I made something to address these problems:

NLPRule is a library to parse and run LanguageTool rules from grammar.xml and disambiguation.xml. Currently all disambiguation and about 80% grammar rules of English and German rules are supported.

It is written in Rust for speed but has bindings for Python.

I’m actively maintaining (and probably enhancing) it. I’m not quite happy with the speed yet but I wouldn’t be surprised if it is already faster than LanguageTool.