On Thu, Feb 16, 2012 at 3:37 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Everybody start from daily bounce, but end up with UPDATED_AT column and
> delta updates , just consider urgent content fix usecase. Don't think it's
> worth to rely on daily bounce as a cornerstone of architecture.
>

I'd be happy to avoid it, for all the obvious reasons.

I do know that performance of this type of services tends to be not that
great (as in "700 to 5000 msec"), and there should be ways to do it several
times faster than this.


> you can use grid of coordinates to reduce their entropy


I don't understand this statement. Can you elaborate, please?

Since my bounding boxes are small, one [premature optimization] idea could
be to divide Earth into 2x2 degree overlapping tiles at 1 degree step in
both directions (such that any bounding box fits within at least one of
them, and any location belongs to 4 of them), then use tileId=X as a cached
filter and geofilt as a post-filter. Is that along the lines of what you
are talking about?


> <http://yonik.wordpress.com/2012/02/10/advanced-filter-caching-in-solr/>
> > Lucene internals, caching of filters probably doesn't make sense either.
> > from what little I understand about
> But solr does it http://wiki.apache.org/solr/SolrCaching#filterCache
>

I didn't realize that multiple qf's in the same query were applied in
parallel as set intersections. In that case, the non-geography filters
should be cached (and added to the prewarming routine, I guess) even when
they are usually far less specific than the bounding box. Makes sense.


> > 1. Search server is an internal service that uses embedded Solr for the
> > indexing part. RAMDirectoryFactory as index storage.
> Bad idea. It's purposed mostly for tests, the closest purposed for
> production analogue is
> org.apache.lucene.store.instantiated.InstantiatedIndex
>
...

> AFAIK the state of art is use file directory (MMAP or whatever), rely on
> Linux file system RAM cache.
>

OK, I may as well start the spike from this angle, too. By the way, this is
precisely the kind of advice I was hoping for. Thanks a lot.

> 5. All Solr caching is switched off.

> But why?
>

Because (a) I shouldn't need to cache documents, if they are all in memory
anyway; (b) query caching will have abysmal hit/miss because of the spatial
component; and (c) I misunderstood how query filters work. So, now I'm
thinking a FastLFU query filter cache for non-geo filters.


> Btw, if you need multivalue geofield pls vote for SOLR-2155
>
Our data has one lon/lat pair per entity... so no, I don't need it. Or at
least haven't figured out that I do yet. :)

-- 
Alexey Verkhovsky
http://alex-verkhovsky.blogspot.com/
CruiseControl.rb [http://cruisecontrolrb.thoughtworks.com]

Reply via email to