Re: [Dspam-devel] Quick comment on non Western languages

2009-11-19 Thread Stevan Bajić
On Thu, 19 Nov 2009 12:23:17 +0100 Alexander Prinsier wrote: > On 11/19/2009 12:11 PM, Paul Cockings wrote: > > This sounds like a topic for 4.0 something or later. > > Yeah it's not for the current release :) No worries ;) > Paul is hot getting 3.9.0 out. :) btw: When do we consider 3.9.0 to

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-19 Thread Stevan Bajić
On Thu, 19 Nov 2009 11:11:13 + Paul Cockings wrote: > Alexander Prinsier wrote: > > > > Well Chinese people just list all western languages as spam... Not many > > people speak both categories. Anyway, at least that's how I do it now, > > and how many people could probably do it :) > > > >

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-19 Thread Stevan Bajić
On Thu, 19 Nov 2009 12:05:50 +0100 Alexander Prinsier wrote: > Well Chinese people just list all western languages as spam... Not many > people speak both categories. Anyway, at least that's how I do it now, > and how many people could probably do it :) > > So ok, dspam is used in Asia :) When

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-19 Thread Alexander Prinsier
On 11/19/2009 12:11 PM, Paul Cockings wrote: > This sounds like a topic for 4.0 something or later. Yeah it's not for the current release :) No worries ;) Alexander -- Let Crystal Reports handle the reporting - Free Crys

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-19 Thread Paul Cockings
Alexander Prinsier wrote: Well Chinese people just list all western languages as spam... Not many people speak both categories. Anyway, at least that's how I do it now, and how many people could probably do it :) So ok, dspam is used in Asia :) When I find time I'll take a look at ICU. Alex

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-19 Thread Alexander Prinsier
On 11/19/2009 11:58 AM, Stevan Bajić wrote: >>> Alexander. What would you say about adding ICU and that character handling >>> into DSPAM? You seem to be capable to do it. Would be a nice thing to do. I >>> would not mind if you would take that task :) >> >> I think the performance penalty would

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-19 Thread Stevan Bajić
On Thu, 19 Nov 2009 11:12:33 +0100 Alexander Prinsier wrote: > On 11/19/2009 01:45 AM, Stevan Bajić wrote: > There's also IBM's ITU (open source library) if you need something > heavier. > > > Alexander. What would you say about adding ICU and that character handling > > into DSPAM?

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-19 Thread Alexander Prinsier
On 11/19/2009 01:45 AM, Stevan Bajić wrote: There's also IBM's ITU (open source library) if you need something heavier. > Alexander. What would you say about adding ICU and that character handling > into DSPAM? You seem to be capable to do it. Would be a nice thing to do. I > would not mind

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Stevan Bajić
On Thu, 19 Nov 2009 01:08:38 +0100 Alexander Prinsier wrote: > >> There's also IBM's ITU (open source library) if you need something heavier. > >> Alexander. What would you say about adding ICU and that character handling into DSPAM? You seem to be capable to do it. Would be a nice thing to do.

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Stevan Bajić
On Thu, 19 Nov 2009 01:08:38 +0100 Alexander Prinsier wrote: > On 11/18/2009 10:30 PM, Stevan Bajić wrote: > >> So you mean, you can break cyrillic/slavic at spaces too like Western > >> languages? So then it'll work? You just break everything you know at > >> spaces, and what you don't know, lik

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Alexander Prinsier
On 11/18/2009 10:30 PM, Stevan Bajić wrote: >> So you mean, you can break cyrillic/slavic at spaces too like Western >> languages? So then it'll work? You just break everything you know at >> spaces, and what you don't know, like Chinese, at UTF32 code points. >> > No. I did not say that. You said:

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Stevan Bajić
On Wed, 18 Nov 2009 22:18:40 +0100 Alexander Prinsier wrote: > On 11/18/2009 09:53 PM, Stevan Bajić wrote: > >> Then do what you used to do for Western languages: tokenize using spaces > >> as separators. For other languages split every 4 bytes. > >> > > Not going to work. You see my name? It's s

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Stevan Bajić
On Wed, 18 Nov 2009 15:16:44 -0600 Kenneth Marshall wrote: > On Wed, Nov 18, 2009 at 10:09:16PM +0100, Stevan Baji?? wrote: > > On Wed, 18 Nov 2009 14:29:53 -0600 > > Kenneth Marshall wrote: > > > > > > Alexander > > > > > > > I thought that UTF8, UTF-16 and UTF-32 can represent all the charac

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Alexander Prinsier
On 11/18/2009 09:53 PM, Stevan Bajić wrote: >> Then do what you used to do for Western languages: tokenize using spaces >> as separators. For other languages split every 4 bytes. >> > Not going to work. You see my name? It's slavic. And I am able to write in > other languages (9 of them) and in ot

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Kenneth Marshall
On Wed, Nov 18, 2009 at 10:09:16PM +0100, Stevan Baji?? wrote: > On Wed, 18 Nov 2009 14:29:53 -0600 > Kenneth Marshall wrote: > > > > Alexander > > > > > I thought that UTF8, UTF-16 and UTF-32 can represent all the characters. > > In that case, why wouldn't you use the UTF8 equivalent? At the le

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Stevan Bajić
On Wed, 18 Nov 2009 14:29:53 -0600 Kenneth Marshall wrote: > > Alexander > > > I thought that UTF8, UTF-16 and UTF-32 can represent all the characters. > In that case, why wouldn't you use the UTF8 equivalent? At the least it > would save space. > * UTF-16 and UTF-32 are not widely used. * Wron

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Stevan Bajić
On Wed, 18 Nov 2009 21:20:58 +0100 Alexander Prinsier wrote: > Hello, > Hallo Alexander, > I'm separating the discussion about handling non-Western languages here. > > One solution, which is what is used by for example xml parsers, and > other kinds of software which want to do the right thi

Re: [Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Kenneth Marshall
On Wed, Nov 18, 2009 at 09:20:58PM +0100, Alexander Prinsier wrote: > Hello, > > I'm separating the discussion about handling non-Western languages here. > > One solution, which is what is used by for example xml parsers, and > other kinds of software which want to do the right thing (tm) at all

[Dspam-devel] Quick comment on non Western languages

2009-11-18 Thread Alexander Prinsier
Hello, I'm separating the discussion about handling non-Western languages here. One solution, which is what is used by for example xml parsers, and other kinds of software which want to do the right thing (tm) at all costs, is: Read in the message, using it's encoding-type. Html, Xml, but also