Re: [Apertium-stuff] Transfer rules and chunks was: Hi , I am newbie !

2012-04-17 Thread Francis Tyers
El dl 16 de 04 de 2012 a les 14:46 +0200, en/na Per Tunedal va escriure: Hi, I have read the article too. I noted two interesting features: No-one has responded to this, so I will -- but I'm not the most adequate person to respond. 1. ... apertium-transfer-tools implements an

[Apertium-stuff] soft hyphens and tokenisation

2012-04-17 Thread Kevin Brubeck Unhammer
Hi, I notice that soft/hidden hyphens (#173;) can split words, e.g. in Jesper­sen there's a soft hyphen between n and t, but it should be analysed as one word. I've noticed this a lot in web pages, I guess a lot of news sites and such use programs that hyphenate using that character. The

Re: [Apertium-stuff] soft hyphens and tokenisation

2012-04-17 Thread Kevin Brubeck Unhammer
Kevin Brubeck Unhammer unham...@fsfe.org writes: Hi, I notice that soft/hidden hyphens (#173;) can split words, e.g. in Jesper­sen there's a soft hyphen between n and t, but it should be analysed as one Wops, between r and s! word. I've noticed this a lot in web pages, I guess a lot

Re: [Apertium-stuff] soft hyphens and tokenisation

2012-04-17 Thread Jimmy O'Regan
On 17 April 2012 14:51, Kevin Brubeck Unhammer unham...@fsfe.org wrote: Hi, I notice that soft/hidden hyphens (#173;) can split words, e.g. in    Jesper­sen there's a soft hyphen between n and t, but it should be analysed as one word. I've noticed this a lot in web pages, I guess a lot of