Re: [Dspam-devel] Quick comment on non Western languages

2009-11-19 Thread Stevan Bajić
On Thu, 19 Nov 2009 12:23:17 +0100 Alexander Prinsier wrote: > On 11/19/2009 12:11 PM, Paul Cockings wrote: > > This sounds like a topic for 4.0 something or later. > > Yeah it's not for the current release :) No worries ;) > Paul is hot getting 3.9.0 out. :) btw: When do we consider 3.9.0 to

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-19 Thread Stevan Bajić
On Thu, 19 Nov 2009 11:11:13 + Paul Cockings wrote: > Alexander Prinsier wrote: > > > > Well Chinese people just list all western languages as spam... Not many > > people speak both categories. Anyway, at least that's how I do it now, > > and how many people could probably do it :) > > > >

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-19 Thread Stevan Bajić
On Thu, 19 Nov 2009 12:05:50 +0100 Alexander Prinsier wrote: > Well Chinese people just list all western languages as spam... Not many > people speak both categories. Anyway, at least that's how I do it now, > and how many people could probably do it :) > > So ok, dspam is used in Asia :) When

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-19 Thread Alexander Prinsier
On 11/19/2009 12:11 PM, Paul Cockings wrote: > This sounds like a topic for 4.0 something or later. Yeah it's not for the current release :) No worries ;) Alexander -- Let Crystal Reports handle the reporting - Free Crys

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-19 Thread Paul Cockings
Alexander Prinsier wrote: Well Chinese people just list all western languages as spam... Not many people speak both categories. Anyway, at least that's how I do it now, and how many people could probably do it :) So ok, dspam is used in Asia :) When I find time I'll take a look at ICU. Alex

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-19 Thread Alexander Prinsier
On 11/19/2009 11:58 AM, Stevan Bajić wrote: >>> Alexander. What would you say about adding ICU and that character handling >>> into DSPAM? You seem to be capable to do it. Would be a nice thing to do. I >>> would not mind if you would take that task :) >> >> I think the performance penalty would

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-19 Thread Stevan Bajić
On Thu, 19 Nov 2009 11:12:33 +0100 Alexander Prinsier wrote: > On 11/19/2009 01:45 AM, Stevan Bajić wrote: > There's also IBM's ITU (open source library) if you need something > heavier. > > > Alexander. What would you say about adding ICU and that character handling > > into DSPAM?

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-19 Thread Alexander Prinsier
On 11/19/2009 01:45 AM, Stevan Bajić wrote: There's also IBM's ITU (open source library) if you need something heavier. > Alexander. What would you say about adding ICU and that character handling > into DSPAM? You seem to be capable to do it. Would be a nice thing to do. I > would not mind

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Stevan Bajić
On Thu, 19 Nov 2009 01:08:38 +0100 Alexander Prinsier wrote: > >> There's also IBM's ITU (open source library) if you need something heavier. > >> Alexander. What would you say about adding ICU and that character handling into DSPAM? You seem to be capable to do it. Would be a nice thing to do.

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Stevan Bajić
On Thu, 19 Nov 2009 01:08:38 +0100 Alexander Prinsier wrote: > On 11/18/2009 10:30 PM, Stevan Bajić wrote: > >> So you mean, you can break cyrillic/slavic at spaces too like Western > >> languages? So then it'll work? You just break everything you know at > >> spaces, and what you don't know, lik

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Alexander Prinsier
On 11/18/2009 10:30 PM, Stevan Bajić wrote: >> So you mean, you can break cyrillic/slavic at spaces too like Western >> languages? So then it'll work? You just break everything you know at >> spaces, and what you don't know, like Chinese, at UTF32 code points. >> > No. I did not say that. You said:

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Stevan Bajić
On Wed, 18 Nov 2009 22:18:40 +0100 Alexander Prinsier wrote: > On 11/18/2009 09:53 PM, Stevan Bajić wrote: > >> Then do what you used to do for Western languages: tokenize using spaces > >> as separators. For other languages split every 4 bytes. > >> > > Not going to work. You see my name? It's s

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Stevan Bajić
On Wed, 18 Nov 2009 15:16:44 -0600 Kenneth Marshall wrote: > On Wed, Nov 18, 2009 at 10:09:16PM +0100, Stevan Baji?? wrote: > > On Wed, 18 Nov 2009 14:29:53 -0600 > > Kenneth Marshall wrote: > > > > > > Alexander > > > > > > > I thought that UTF8, UTF-16 and UTF-32 can represent all the charac

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Alexander Prinsier
On 11/18/2009 09:53 PM, Stevan Bajić wrote: >> Then do what you used to do for Western languages: tokenize using spaces >> as separators. For other languages split every 4 bytes. >> > Not going to work. You see my name? It's slavic. And I am able to write in > other languages (9 of them) and in ot

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Kenneth Marshall
On Wed, Nov 18, 2009 at 10:09:16PM +0100, Stevan Baji?? wrote: > On Wed, 18 Nov 2009 14:29:53 -0600 > Kenneth Marshall wrote: > > > > Alexander > > > > > I thought that UTF8, UTF-16 and UTF-32 can represent all the characters. > > In that case, why wouldn't you use the UTF8 equivalent? At the le

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Stevan Bajić
On Wed, 18 Nov 2009 14:29:53 -0600 Kenneth Marshall wrote: > > Alexander > > > I thought that UTF8, UTF-16 and UTF-32 can represent all the characters. > In that case, why wouldn't you use the UTF8 equivalent? At the least it > would save space. > * UTF-16 and UTF-32 are not widely used. * Wron

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Stevan Bajić
On Wed, 18 Nov 2009 21:20:58 +0100 Alexander Prinsier wrote: > Hello, > Hallo Alexander, > I'm separating the discussion about handling non-Western languages here. > > One solution, which is what is used by for example xml parsers, and > other kinds of software which want to do the right thi

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Kenneth Marshall
On Wed, Nov 18, 2009 at 09:20:58PM +0100, Alexander Prinsier wrote: > Hello, > > I'm separating the discussion about handling non-Western languages here. > > One solution, which is what is used by for example xml parsers, and > other kinds of software which want to do the right thing (tm) at all