Re: Language detection in TextCat

2009-12-07 Thread Marc Perkel
Matt Kettler wrote: Marc Perkel wrote: I'm wondering if the language detection in TextCat can be improved. Here's the situation. It appears that TextCat was designed to be inclusive. You list the languages you want and it returns many possibilities so as not to trigger unwanted

Re: Language detection in TextCat

2009-12-07 Thread LuKreme
On 7-Dec-2009, at 09:55, Marc Perkel wrote: Any chance someone might be interested in a radical redesign? I think language exclusion would be an extremely effective spam deterrent as email in a language you don't speak is definitely spam. Erm… not necessarily. As a general rule, this might

Re: Language detection in TextCat

2009-12-07 Thread Martin Gregorie
On Mon, 2009-12-07 at 08:55 -0800, Marc Perkel wrote: Except for very short messages I would think that if you spell checked the message in several languages and found that 80% was spelled correctly that you have a match. You wouldn't have to check every language, just start with some common

Re: Language detection in TextCat

2009-12-07 Thread Matus UHLAR - fantomas
Please, could you configure your MUA to quote, instead of colouring? HTML mail sucks. On 07.12.09 08:55, Marc Perkel wrote: Any chance someone might be interested in a radical redesign? I think language exclusion would be an extremely effective spam deterrent as email in a language

Re: Language detection in TextCat

2009-12-07 Thread Marc Perkel
Martin Gregorie wrote: On Mon, 2009-12-07 at 08:55 -0800, Marc Perkel wrote: Except for very short messages I would think that if you spell checked the message in several languages and found that 80% was spelled correctly that you have a match. You wouldn't have to check every

RE: Language detection in TextCat

2009-12-07 Thread R-Elists
This should be fairly easy to do: configure SA with the language(s) you will accept and the ratio of misspellings to total words that you'll accept as meaning 'unwanted language' after numbers and HTML tags have been excluded from the check. Apply the test to the whole body of a

Re: Language detection in TextCat

2009-12-06 Thread Matt Kettler
Marc Perkel wrote: I'm wondering if the language detection in TextCat can be improved. Here's the situation. It appears that TextCat was designed to be inclusive. You list the languages you want and it returns many possibilities so as not to trigger unwanted falsely. What I'm doing is

Re: Language detection in TextCat

2009-12-06 Thread Henrik K
On Sun, Dec 06, 2009 at 11:49:25PM -0500, Matt Kettler wrote: Marc Perkel wrote: I'm wondering if the language detection in TextCat can be improved. Here's the situation. It appears that TextCat was designed to be inclusive. You list the languages you want and it returns many

Re: Language detection in TextCat

2009-12-06 Thread Matus UHLAR - fantomas
On 06.12.09 11:39, Marc Perkel wrote: I'm wondering if the language detection in TextCat can be improved. Here's the situation. It appears that TextCat was designed to be inclusive. You list the languages you want and it returns many possibilities so as not to trigger unwanted falsely.