Need Help

Mility · June 10, 2015, 2:44am

Hi Daniel,
You said that improves error detection of some words that are easily confused, and I’m very interested in the technical details, and see Finding errors using Big Data - LanguageTool Wiki, but still confused. I found EnglishConfusionProbabilityRule does not work in stand-alone version, I want to konw how to detected those words if they are confused in stand-alone version use the big-data.

Also, I can not open the page of rule editor today, what’s wrong with it?

Thanks
Regards
Mility

dnaber · June 10, 2015, 7:57am

The confusion rule currently only works in server mode and for the command-line version (Command-Line Options - LanguageTool Wiki, see option “–languagemodel”).

PeterLawrence · June 10, 2015, 9:10am

Yes the standalone version does not support NGrams, by default.
However, I found that if you “hack the code” and add the following (inserting the correct path to your NGram Database)

   try {
	langTool.activateLanguageModelRules(new File("\\PathToNGramDirectory"));
	System.out.println("language Model enabled - NGramStats");
} catch (IOException e) {
	// TODO Auto-generated catch block
	e.printStackTrace();
}

I’ve added this code to the showOptions() routine, since I don’t want it enabled by default.
Hence, the NGram rule is only enabled when I go to the option dialogue box.

In version 2.8 I integrated the enabling of the NGram database fully into the setting options, however I’ve not migrated the code to version 3 as yet.

PeterLawrence · June 10, 2015, 1:39pm

Just wondering if it might be worth considering, if the ngram functionality could be integrated into the core functionality of the languagetool code. Since I feel it could be useful if some of the n-gram functionality could be used in conjunction with the standard java/xml rules.
For example if a rule identifies a possible error it could use the confusion probability rule to compare the original text with any possible alternatives identified. However, for this to be useful you’ll probably need a bit more than the 3n-gram version.
An example application could be detecting a missing definite/indefinite article (determiner)

Mility · June 10, 2015, 1:58pm

Thanks, I am agree with you. If we could achieve this goal, we could detect more grammar errors.

dnaber · June 10, 2015, 4:56pm

I think what you’re suggesting is to sort the rule’s suggestions with the ngram probabilities, is that correct? Yes, that would be useful.

PeterLawrence · June 10, 2015, 5:51pm

Yes one option would be to sort the rule’s suggestions.
However, with a rule which might generate incorrect suggestions. You could reject the rule if the suggested correction had a low ngram probability.

for example
Did you mean the ?

Accept rule the if the 3ngram probability is more than say 0.4, or the difference in probability between the original text and the suggested correction is greater than a specified value.

PeterLawrence · July 7, 2015, 11:55am

Hi just spotted that you’ve made it possible to configure the ngram directory via the GUI, back in June.
One comment is that if you move your ngram folder it’s a little tricky open the GUI, so I’ve added a try catch block around the activateLanguageModelRules call in reloadLanguageTool

if (config.getNgramDirectory() != null) {
try {
languageTool.activateLanguageModelRules(config.getNgramDirectory());
}
catch (IOException e) {
JOptionPane.showMessageDialog(null, “IO error while loading ngram database.\n” + e.getMessage());
}
catch (RuntimeException e) {
JOptionPane.showMessageDialog(null, “Error while loading ngram database.\n” + e.getMessage());
}
}

dnaber · July 7, 2015, 4:55pm

Thanks, that makes sense. Could you commit that change?

PeterLawrence · July 7, 2015, 9:28pm

OK will look into tomorrow. I assume you mean sent a pull request.

dnaber · July 7, 2015, 9:44pm

You can commit this directly, no need for a pull request. You might actually catch “Exception” instead of the more specific sub classes.

Mility · July 8, 2015, 2:31am

<rule id="PREFER_TO_VBG" name="prefer to vbg(vb)">    
		 <pattern>
		  <token>prefer</token>
		  <token>to</token>
		  <marker>
		  <token postag='VBG'></token>
		  </marker>
		 </pattern>
		 <message>Did you mean <suggestion><match no="3" postag="VB"/></suggestion>?</message>
		 <example correction=''>Some other people prefer to <marker>changing</marker> job.</example>
		 <example>Some other people prefer to change job.</example>
		</rule>

	<!-- English rule, 2015-07-08 -->
	
<rule id="SOME_NEW_FIND" name="some new find(finds)">    
		 <pattern>
		  <token regexp='yes'>some</token>
		  <token >new</token>
		  <marker>
		  <token>find</token>
		  </marker>
		 </pattern>
		 <message>Did you mean  <suggestion><match no="3" postag="NNS"/></suggestion>?</message>
		 <example correction=''>They can create some new <marker>find</marker> in theories and factories and achieve success in this job.</example>
		 <example>They can create some new finds in theories and factories and achieve success in this job.</example>
		</rule>

PeterLawrence · July 8, 2015, 12:14pm

I’ve pushed my changes to master.
Sorry about the mistake with formatting initially, I didn’t spot that eclipse automatic changes the indentation on pasting. Note to oneself is to change eclipse’s default indentation settings next time.

dnaber · July 10, 2015, 12:05pm

Thanks, I’ve added the first rule. About the second one, I’m not so sure. First, it’s very specific and I also get a lot of matches for “some new find” in Google and I don’t think they are all wrong.