On 4/12/2010 4:58 PM, Martin Gregorie wrote:
I had quite a bit to do with phone numbers en mass a while back. My
initial reaction is that its not easy: not only do phone numbers vary in
length between locales, but even such things as the 'international
dialing' and non-local-call prefix vary from country to country.
That is certainly true with all phone numbers, but I suspect it's not
for cell phone numbers using text-to-email. I don't have any non-US
examples to verify against, but it really wouldn't make sense for
providers to use international dialing codes in this case...at least not
a huge variety at any rate. I'm hoping that those in the non-US
community can contribute opinions. Maybe this problem isn't as complex
as it initially sounds.
On 4/12/2010 5:57 PM, Ted Mittelstaedt wrote:
The fundamental flaw
here is in the assumption that an all-number mailbox user ID is
virtually certain to be spam. It is not. Clearly, the default score
assignment to that rule is too high.
That could certainly be true and it may prove that doing the proposed
tests just aren't worth the CPU cycles. Only a test against the corpus
will say with any degree of certainty. Sadly, I don't have the perl
skills to make that judgment, hence my appeal to the community for
ideas, opinions, and possible code to test the theory.
/Jason