On 4/12/2010 4:58 PM, Martin Gregorie wrote:
I had quite a bit to do with phone numbers en mass a while back. My
initial reaction is that its not easy: not only do phone numbers vary in
length between locales, but even such things as the 'international
dialing' and non-local-call prefix vary from country to country.
That is certainly true with all phone numbers, but I suspect it's not for cell phone numbers using text-to-email. I don't have any non-US examples to verify against, but it really wouldn't make sense for providers to use international dialing codes in this case...at least not a huge variety at any rate. I'm hoping that those in the non-US community can contribute opinions. Maybe this problem isn't as complex as it initially sounds.

On 4/12/2010 5:57 PM, Ted Mittelstaedt wrote:
The fundamental flaw
here is in the assumption that an all-number mailbox user ID is virtually certain to be spam. It is not. Clearly, the default score assignment to that rule is too high.

That could certainly be true and it may prove that doing the proposed tests just aren't worth the CPU cycles. Only a test against the corpus will say with any degree of certainty. Sadly, I don't have the perl skills to make that judgment, hence my appeal to the community for ideas, opinions, and possible code to test the theory.

/Jason

Reply via email to