Back to LanguageTool Homepage - Privacy - Imprint

Porting Language Tool to Python


#1

Hi, I'm keen on porting some of the Language Tool functionality to Python (open sourced, probably MIT license).

More specifically, I'd like to match on tokens based on its text, POS tag, etc. given the rules in grammar.xml. I was also going to translate the grammar.xml to grammar.yml for readability. There will be some enhancements like adding dependency tags to match on (so possibly another grammar.yml).

I wasn't sure what was permitted with your license. Would it fine to copy over the grammar.xml to my repo (with the LT license) and have a README that link to the github repo?


(Daniel Naber) #2

Frankly, as the developer of LT I'm not a big fan of this idea. Like many Open Source projects, LT could be so much better if we had more contributors. But we don't. Now if the very limited resources we have get spread over even more versions of LT, I think that would not help improve LT. Also, we offer a very easy to use HTTP server that can be started with a single command and that returns JSON which can be used from any programming language.

Another aspect is that LT relies on quite some external libraries, and they will probably not all exist for Python. So these features will be missing. We also have complex Java-based rules which either will need to be ported or they will simply be missing. Plus statistics-based detection of errors. Plus ongoing work on neural networks.

Of course the license allows you to take the LT files and use them in your project, no matter what language you use. Using .yml instead of .xml might be a good idea anyway (also for the Java version).


#3

I understand.

Though I'm thinking porting only the grammar.xml and not the entire program.

For example the NLP package spaCy has a matcher API. I could write some rules for matching on grammar, but I can avoid the cold start problem by porting over the LT rules.

So it's less an LT port but a grammar matching package for spaCy augmented greatly with LT rules (grammar.xml). There will be differences too, e.g., spaCy and LT return different sets of properties to match on.

I agree if it's just using LT in Python, using the server would be best.

If there's interest I could contribute a yml translation script.

Also didn't know about the neural network feature. Looks good.


(Daniel Naber) #4

Could you provide an example how a rule in yml would look like?


#5

Here's something that I'm working on (for "as follow" rule).

typos: 
  as_follow_as_follows: 
    corrections: 
      - Do you mean "as follows"?
    description: ~
    examples: 
      - We can elaborate this distinction as follow.
    patterns: 
      # direct string match
      - as follow
      # match on list of dictionaries
      - 
        - 
          LOWER: as
          POS: ADP
        - 
          LOWER: follow