Hi
Would be possible to allow for several tags
in the same rule?
It seems that we can only give one.
I'd like to be able to use several (at least 2):
* one to make sure that part of regexp matches a postag
* another one to make sure that part of the regexp does not match a postag
I tried
On 2015-10-10 06:16, Dominique Pellé wrote:
> I'm not sure I understand how it would work for users.
My idea was that it would work automatically. But you're right that
users might also paste text with lines breaks, and my idea of having a
parsing or normalization (when reading the input)
On 2015-10-11 11:58, Dominique Pellé wrote:
> Would be possible to allow for several tags
> in the same rule?
I don't think it's very difficult. I could put it on my TODO list, but I
cannot make any promises about when I have time for this.
Regards
Daniel
Hi
Consider this very simple rule in the English grammar.xml:
egg
yoke
The rule works fine of the 2 words are separated with
at least spaces, tabs or newlines. However, it does not
work when the 2 words are separated with a non-breaking
space (U+000A0). I wonder why.
With a
I woud agree with that, I had to add 00A0 in a lot of places including
sentence tokenizer, word tokenizer and some rules for Ukrainian. But
from text analysis it's pretty much the same as normal space so it
would make sense to handle this at common level (early in the
process).
Thanks
Andriy