Back to LanguageTool Homepage - Privacy - Imprint

Need Help


(Mility) #1

Hi Daniel,
You said that improves error detection of some words that are easily confused, and I'm very interested in the technical details, and see http://wiki.languagetool.org/finding-errors-using-big-data, but still confused. I found EnglishConfusionProbabilityRule does not work in stand-alone version, I want to konw how to detected those words if they are confused in stand-alone version use the big-data.

Also, I can not open the page of rule editor today, what's wrong with it?

Thanks
Regards
Mility


(Daniel Naber) #2

The confusion rule currently only works in server mode and for the command-line version (http://wiki.languagetool.org/command-line-options, see option "--languagemodel").


(PeterLawrence) #3

Yes the standalone version does not support NGrams, by default.
However, I found that if you "hack the code" and add the following (inserting the correct path to your NGram Database)

   try {
	langTool.activateLanguageModelRules(new File("\\PathToNGramDirectory"));
	System.out.println("language Model enabled - NGramStats");
} catch (IOException e) {
	// TODO Auto-generated catch block
	e.printStackTrace();
}

I've added this code to the showOptions() routine, since I don't want it enabled by default.
Hence, the NGram rule is only enabled when I go to the option dialogue box.

In version 2.8 I integrated the enabling of the NGram database fully into the setting options, however I've not migrated the code to version 3 as yet.


(PeterLawrence) #4

Just wondering if it might be worth considering, if the ngram functionality could be integrated into the core functionality of the languagetool code. Since I feel it could be useful if some of the n-gram functionality could be used in conjunction with the standard java/xml rules.
For example if a rule identifies a possible error it could use the confusion probability rule to compare the original text with any possible alternatives identified. However, for this to be useful you’ll probably need a bit more than the 3n-gram version.
An example application could be detecting a missing definite/indefinite article (determiner)


(Mility) #5

Thanks, I am agree with you. If we could achieve this goal, we could detect more grammar errors.


(Daniel Naber) #6

I think what you're suggesting is to sort the rule's suggestions with the ngram probabilities, is that correct? Yes, that would be useful.


(PeterLawrence) #7

Yes one option would be to sort the rule's suggestions.
However, with a rule which might generate incorrect suggestions. You could reject the rule if the suggested correction had a low ngram probability.

for example
Did you mean the ?

Accept rule the if the 3ngram probability is more than say 0.4, or the difference in probability between the original text and the suggested correction is greater than a specified value.


(PeterLawrence) #8

Hi just spotted that you've made it possible to configure the ngram directory via the GUI, back in June.
One comment is that if you move your ngram folder it's a little tricky open the GUI, so I've added a try catch block around the activateLanguageModelRules call in reloadLanguageTool

if (config.getNgramDirectory() != null) {
try {
languageTool.activateLanguageModelRules(config.getNgramDirectory());
}
catch (IOException e) {
JOptionPane.showMessageDialog(null, "IO error while loading ngram database.\n" + e.getMessage());
}
catch (RuntimeException e) {
JOptionPane.showMessageDialog(null, "Error while loading ngram database.\n" + e.getMessage());
}
}


(Daniel Naber) #9

Thanks, that makes sense. Could you commit that change?


(PeterLawrence) #10

OK will look into tomorrow. I assume you mean sent a pull request.


(Daniel Naber) #11

You can commit this directly, no need for a pull request. You might actually catch "Exception" instead of the more specific sub classes.


(Mility) #12
<rule id="PREFER_TO_VBG" name="prefer to vbg(vb)">    
		 <pattern>
		  <token>prefer</token>
		  <token>to</token>
		  <marker>
		  <token postag='VBG'></token>
		  </marker>
		 </pattern>
		 <message>Did you mean <suggestion><match no="3" postag="VB"/></suggestion>?</message>
		 <example correction=''>Some other people prefer to <marker>changing</marker> job.</example>
		 <example>Some other people prefer to change job.</example>
		</rule>

	<!-- English rule, 2015-07-08 -->

<rule id="SOME_NEW_FIND" name="some new find(finds)">    
		 <pattern>
		  <token regexp='yes'>some</token>
		  <token >new</token>
		  <marker>
		  <token>find</token>
		  </marker>
		 </pattern>
		 <message>Did you mean  <suggestion><match no="3" postag="NNS"/></suggestion>?</message>
		 <example correction=''>They can create some new <marker>find</marker> in theories and factories and achieve success in this job.</example>
		 <example>They can create some new finds in theories and factories and achieve success in this job.</example>
		</rule>

(PeterLawrence) #13

I've pushed my changes to master.
Sorry about the mistake with formatting initially, I didn't spot that eclipse automatic changes the indentation on pasting. Note to oneself is to change eclipse's default indentation settings next time.


(Daniel Naber) #14

Thanks, I've added the first rule. About the second one, I'm not so sure. First, it's very specific and I also get a lot of matches for "some new find" in Google and I don't think they are all wrong.