>
>
> Do you still think there would be such a drastic difference in a lower
> density situation?
>

I think if you bench mark it there will be cross over. It would be difficult
to measure at low ms ranges though.
If I were a betting person, I think it would be low.


Yeah I have looked into hilbert curve a little myself.  Do you think its an
> approach worth investigating? or will it add more complexity?


It depends, it solves the MMB (minimal bounding box) effectively, without a
doubt, but it doesn't help with polygons or
again doesn't solve the distance calculation. It's a lot of complexity for
4ms- but that may be important to some folks, where distance isn't an issue.


I went down the road of doing the calculations in parallel, and
> addressing the cost of actual calculations themselves.  This has been
> pretty
> effective, but I am very interested in this new projection idea?


Threaded distance calculations get you into a pickle with the solr leaf
index readers, there may be a base offset available today but back in June
there wasn't so
you had to ensure the distance / doc id calculations were performed in
serial with the index readers, and were aware of previous indexreader max
docs to determine
the 'true' doc id..

The new projection stuff, is all coming down the pipeline as soon as I get a
couple of free weekends to tidy up code.
The first part includes the polygon searching stuff I've been working on
that I have a lucene implementation of, but haven't had time
make a nice wrapper of in Solr. Hoping to see what happens with
FunctionQueries - pseudo fields as well.

The distance calculation is going to be based on the cartesian coordinates,
with refinement to use heavier calculations only when needed.
First pass shows dramatic performance improvements, but requires the right
projection to work.

I'm also going to include tools to visualize this stuff on a map, that's
been one of the things that's slowed me down a lot in the past, it was hard
to verify what I was doing
but recent google map contributions and it's community have made that a lot
easier.



On Tue, Dec 29, 2009 at 11:47 AM, Chris Male <gento...@gmail.com> wrote:

> On Tue, Dec 29, 2009 at 8:41 PM, patrick o'leary <pj...@pjaol.com> wrote:
>
> > Afraid I just took a sample set of data that was available to me at my
> last
> > job, and ran the test.
> > It kind of matched my expectations in terms of locallucene at the time,
> and
> > what Ure predicted for Trie.
> >
>
> Do you still think there would be such a drastic difference in a lower
> density situation?
>
>
> >
> > To give you an idea of it's performance in production, the bounding box
> > retrieval for a single solr core of about 3million docs
> > on a dual core 2.3ghz server with I think 8gb of ram, was about 8 - 12ms
> > avg. And had ~ 3,000 results per result set.
> >
> > The slow part for geo search was always the distance calculation not the
> > bounding box retrieval.
> > I've seen feedback of where hilbert curve is meant to be faster again by
> an
> > average of 40%, so say 4-6 ms for bounding box retrieval.
> >
>
> Yeah I have looked into hilbert curve a little myself.  Do you think its an
> approach worth investigating? or will it add more complexity?
>
>
> > But that still doesn't solve the long haul of distance calculations,
> which
> > has been one of my focuses recently with a new projection and
> > distance calculation based up that projection.
> >
> >
> Tell us more! Yeah I also ran into the cost of the distance calculations,
> which is why I went down the road of doing the calculations in parallel,
> and
> addressing the cost of actual calculations themselves.  This has been
> pretty
> effective, but I am very interested in this new projection idea?
>
>
> >
> >
> > On Tue, Dec 29, 2009 at 11:31 AM, Chris Male <gento...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I had never done any experiments comparing them, that was what I was
> > hoping
> > > was going to be explored more and it seems you have done that.  Do you
> > have
> > > more statistics by chance?  Does the difference (which is pretty
> > dramatic)
> > > stay a constant ratio as you change the density and/or distances?
> > >
> > > On Tue, Dec 29, 2009 at 8:25 PM, patrick o'leary <pj...@pjaol.com>
> > wrote:
> > >
> > > > Hmm, so it's faster to do 2 range searches than use the
> TermEnumerator
> > to
> > > > find maybe 4-6 individual CartesianTier id's?
> > > >
> > > > I had similar approaches in the past like 2 years ago, that just
> > weren't
> > > > fast enough, and I've even published comparisons with Trie data
> types,
> > > and
> > > > find CartesianTier id's
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/SOLR-773?focusedCommentId=12708605&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12708605
> > > >
> > > > The speed of Trie match what Ure's expectations were about 100ms, but
> > > > Cartesian is just 12ms.
> > > >
> > > > The custom code, well you'd have to have custom code to figure out
> the
> > > > bounding box from a point, unless you want to user to figure that
> out?
> > > > And the Cartesian stuff is pretty small, it's underlying structure
> can
> > /
> > > > and
> > > > now does use Trie (simply because it's the only numeric field cache
> > > > interface common between lucene and solr).
> > > >
> > > > P
> > > >
> > > >
> > > > On Tue, Dec 29, 2009 at 11:11 AM, Chris Male (JIRA) <j...@apache.org
> >
> > > > wrote:
> > > >
> > > > >
> > > > >    [
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795112#action_12795112
> > > > ]
> > > > >
> > > > > Chris Male commented on SOLR-1586:
> > > > > ----------------------------------
> > > > >
> > > > > Ah yes sorry TrieFields.  I don't see searching 2 fields as a
> > downside
> > > > > since that's just an implementation detail like the Spatial Tile
> > (which
> > > > > requires you to have upto 15 fields).  Assuming you can use the
> Point
> > > > > FieldType to index an x and y field, then it just becomes another
> > > option
> > > > > like Spatial Tile.  The fact they are supported out of box is part
> of
> > > the
> > > > > attraction, as it would reduce how much custom code has to be
> > > maintained.
> > > > >
> > > > > > Create Spatial Point FieldTypes
> > > > > > -------------------------------
> > > > > >
> > > > > >                 Key: SOLR-1586
> > > > > >                 URL:
> > https://issues.apache.org/jira/browse/SOLR-1586
> > > > > >             Project: Solr
> > > > > >          Issue Type: Improvement
> > > > > >            Reporter: Grant Ingersoll
> > > > > >            Assignee: Grant Ingersoll
> > > > > >            Priority: Minor
> > > > > >             Fix For: 1.5
> > > > > >
> > > > > >         Attachments: examplegeopointdoc.patch.txt,
> > > > > SOLR-1586-geohash.patch,
> > > > SOLR-1586.Mattmann.112209.geopointonly.patch.txt,
> > > > > SOLR-1586.Mattmann.112209.geopointonly.patch.txt,
> > > > > SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt,
> > > > > SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt,
> > > > > SOLR-1586.Mattmann.112509.geopointandgeohash.patch.txt,
> > > > > SOLR-1586.Mattmann.120709.geohashonly.patch.txt,
> > > > > SOLR-1586.Mattmann.121209.geohash.outarr.patch.txt,
> > > > > SOLR-1586.Mattmann.121209.geohash.outstr.patch.txt,
> > > > > SOLR-1586.Mattmann.122609.patch.txt, SOLR-1586.patch,
> SOLR-1586.patch
> > > > > >
> > > > > >
> > > > > > Per SOLR-773, create field types that hid the details of creating
> > > > tiers,
> > > > > geohash and lat/lon fields.
> > > > > > Fields should take in lat/lon points in a single form, as in:
> > > > > > <field name="foo">lat lon</field>
> > > > >
> > > > > --
> > > > > This message is automatically generated by JIRA.
> > > > > -
> > > > > You can reply to this email to add a comment to the issue online.
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Chris Male | Software Developer | JTeam BV.| www.jteam.nl
> > >
> >
>
>
>
> --
> Chris Male | Software Developer | JTeam BV.| www.jteam.nl
>

Reply via email to