From: Dan Barker [mailto:[EMAIL PROTECTED] > > Giampaolo: I hope you succeed. > > I've given up hope on convincing folks (Mapquest in particular) > that radius > searches can be indexed. You needn't pull the lat/long of every > single entry > to run the distance function, and then discard the ones too far away. You > can index on LAT and LONG and structure the query such that only the > "possible" lat/long values need the distance function (and the rest of the > record fetched) evaluated.
Right. > Just because it's two orders of magnitude more efficient doesn't make > anybody listen. > > Same conversation, different universe! You mean that it is probably a concept to far away from the origin of someone's comprehensibility space? :) giampaolo > Dan > > -----Original Message----- > From: Giampaolo Tomassoni [mailto:[EMAIL PROTECTED] > Sent: Monday, January 08, 2007 2:00 PM > To: [EMAIL PROTECTED]; users@spamassassin.apache.org > Subject: RE: [Devel-spam] FuzzyOcr 3.5.1 released > > > From: Andy Dills [mailto:[EMAIL PROTECTED] > > > > ...omissis... > > > > > I understand that the "order" keyword in select is potentially > > expensive, but > > > necessary because matches occur generally towards the most > > recent entries, > > > thus increasing the possibility of a match earlier on. When > > your hash count > > > is in the thousands, earlier matches mean less queries to the > > database, and > > > potentially faster results. > > > > It's not just the order directive, it's the iteration throughout the > > entire database. > > > > Consider when the database grows to >50k records. For a new image that > > doesn't have a hash, that's 50k records that must be sorted then > > sent from > > the DB server to the mail server, then all 50k records must be checked > > against the hash before we decide that we haven't seen this > image before. > > That just isn't a workable algorithm. If iteration throughout the entire > > database is a requirement, hashing is a performance hit rather than a > > performance gain. > > > > A better solution might be a seperate daemon that holds the hashes in > > memory, to which you submit the hash being considered. > > Other ways could be the ones depicted in my recent post (Message-ID: > <[EMAIL PROTECTED]>), in which > close images > are basicly clustered together thanks to a surrogate index. > > giampaolo > > > > > Honestly, I have been extremely impressed with having hashing turned > > completely off. > > > > Andy > > > > --- > > Andy Dills > > Xecunet, Inc. > > www.xecu.net > > 301-682-9972 > > --- > >