I think that we need a policy about whether to disambiguate incorrect grammar. Consider the following sentences:
- Incorrect grammar: Those machines can squashed.
- Correct grammar: Those machines can the tomatoes very quickly.
- Correct grammar: Those machines can squashed tomatoes; we sell the unsquashed tomatoes at a premium price.
In all cases, E_NP_VBP[1] gives the postag VBP to the word ‘can’.
For the first sentence, do we parse ‘can’ as VBP or do we say that the sentence is incorrect and thus to parse ‘can’ as VBP does not make sense?
My preference is not to parse incorrect grammar. During the past few months, if I found a disambiguation for text that is incorrect grammar, I have tried to change the disambiguator such that it does not disambiguate the text. In general, is this a sensible strategy?
I wrote ‘in general’, because rulegroup DID_BASEFORM rule 1 finds this incorrect sentence:
- A proposed northern bypass of Birmingham will designated as I-422.
It finds the sentence because disambiguation WILL_MD gives the POS MD to ‘will’ although the sentence is incorrect grammar.
I have not changed WILL_MD, because when I do, DID_BASEFORM rule 1 does not find the incorrect text. Is not finding incorrect text a good reason not to change the disambiguation?
What I am trying to say (and clarify in my mind) is that if we apply postags to incorrect text, then (despite the counter-example of ‘will designated’) how can we expect the grammar rules that use postags to give a correct analysis of incorrect text?