Re: How real-time are Solr/Lucene queries?

Amit Nithian Tue, 25 May 2010 23:25:14 -0700

This is an interesting discussion and I have a few questions:
1) My apologies but I haven't been following the NRT patch beyond what was
presented at a meetup some months back and the wiki but what is the status
of it in Solr?
2) What are typical/accepted definitions of "Real Time" vs "Near Real Time"?
==> Related to Grant's points earlier.
3) I could understand POSTing a document to a server and then turning around
and searching for it on the same server but what about a replicated
environment and how do you prevent caches from being blown and constantly
re-warmed (hence performance degradation)? I could set Solr replication once
per minute or less but then the caches caches would be regenerated each
minute which I suspect is not cheap.


Note: I am curious about this from a typical web-based application
perspective as opposed to an embedded/desktop application like context
described earlier in this thread.

Thanks!
Amit

On Tue, May 25, 2010 at 10:28 AM, Thomas J. Buhr
<t...@superstringmedia.com>wrote:

> My documents are all quite small if not down right tiny, there is not much
> analysis to do. I plan to mainly use Solr for indexing application
> configuration data which there is a lot of and I have all pre-formated.
> Since it is a music application there are many score templates, scale and
> rhythm strings, notation symbol skins, etc. Then there are slightly more
> usual things to index like application help pages and tutorials.
>
> In terms of queries per second there will be a lot being fired by our
> painter. In our application data is flowing into a painter who in turn
> delegates specific painting tasks to renderer objects. These renderer
> objects then make many queries extremely fast to the embedded Solr indexes
> for data they need, such as layout and style values.
>
> Believe me there is a lot of detailed data involved in music notation and
> abstracting it into configurations in the form of index documents is a good
> way to manage such data. Further, the data in the form of documents work as
> a form of plugins so that alternate configurations for different notation
> types can be added to the index. Then via simple search it is possible to
> dialup a certain set of documents which contain all the details of a given
> notation. Mean while the renderer objects remain generic and are just
> reconfigured with the different indexed configuration documents.
>
> Will making many fast queries from renderers to an embedded local Solr
> index slow my painting down?
>
> Thom
>
>
> On 2010-05-25, at 6:09 AM, Grant Ingersoll wrote:
>
> > How many docs are in the batch you are pulling down?  How many
> docs/second do you expect on the index size?  How big are the docs?  What do
> you expect in terms of queries per second?  How fast do new documents need
> to be available on the local server?  How much analysis do you have to do?
>  Also, define Real Time.  You'd be surprised at the number of people I talk
> to who think they need Real Time, but then when you ask them questions like
> I just did, they don't really need it.  I've seen Solr turn around new docs
> in as little as 30 seconds on commodity hardware w/o any special engineering
> effort and I've seen it faster than that with some engineering effort.  That
> isn't necessarily possible for every application, but...
> >
> > Despite the other suggestions, what you describe still looks feasible to
> me in Solr, pending the questions above (and some followups).
> >
> >
> > On May 21, 2010, at 4:05 AM, Thomas J. Buhr wrote:
> >
> >> Thanks for the new information. Its really great to see so many options
> for Lucene.
> >>
> >> In my scenario there are the following pieces:
> >>
> >> 1 - A local Java client with an embedded Solr instance and its own local
> index/s.
> >> 2 - A remote server running Solr with index/s that are more like a
> repository that local clients query for extra goodies.
> >> 3 - The client is also a JXTA node so it can share indexes or documents
> too.
> >> 4 - There is no browser involved what so ever.
> >>
> >> My music composing application is a local client that uses
> configurations which would become many different document types. A subset of
> these configurations will be bundled with the application and then many more
> would be made available via a server/s running Solr.
> >>
> >> I would not expect the queries which would be made from within the local
> client to be returned in real-time. I would only expect such queries to be
> made in reasonable time and returned to the client. The client would have
> its local Lucene index system (embedded Solr using SolrJ) which would be
> updated with the results of the query made to the Solr instance running on
> the remote server.
> >>
> >> Then the user on the client would issue queries to the local Lucene
> index/s to obtain results which are used to setup contexts for different
> aspects of the client. For example: an activated context for musical scales
> and rhythms used for creating musical notes, an activated context for
> rendering with layout and style information for different music symbol
> renderer types.
> >>
> >> I'm not yet sure but it may be best to make queries against the local
> Lucene index/s and then convert the results into some context objects, maybe
> an array or map (I'd like to learn more about how query results can be
> returned as arrays or maps as well). Then the tools and renderers which
> require the information in the contexts would do any real-time lookup
> directly from the context objects not the local or remote Lucene or Solr
> index/s. The local client is also a JXTA node so it can share its own
> index/s with fellow peers.
> >>
> >> This is how I envision this happening with my limited knowledge of
> Lucene/Solr at this time. What are your thoughts on the feasibility of such
> a scenario?
> >>
> >> I'm just reading through the Solr reference PDF now and looking over the
> Solr admin application. Looking at the Schema.xml it seems to be field not
> document oriented. From my point of view I think in terms of configuration
> types which would be documents. In the schema it seems like only fields are
> defined and it does not matter which configuration/document they belong to?
> I guess this is fine as long as the indexing takes into account my unique
> document types and I can search for them as a whole as well, not only for
> specific values across a set of indexed documents.
> >>
> >> Also, does the schema allow me to index certain documents into specific
> indexes or are they all just bunched together? I'd rather have unique
> indexes for specific document types. I've just read about multiple cores
> running under one Solr instance, is this the only way to support multiple
> indexes?
> >>
> >> I'm thinking of ordering the Lucene in Action v2 book which is due this
> month and also the Solr 1.4 book. Before I do I just need to understand a
> few things which is why I'm writing such a long message :-)
> >>
> >> Thom
> >>
> >>
> >> On 2010-05-21, at 2:12 AM, Ben Eliott wrote:
> >>
> >>> Further to earlier note re Lucandra.  I note that Cassandra, which
> Lucandra backs onto,  is 'eventually consistent',  so given your real-time
> requirements,  you may want to review this in the first instance, if
> Lucandra is of interest.
> >>>
> >>> On 21 May 2010, at 06:12, Walter Underwood wrote:
> >>>
> >>>> Solr is a very good engine, but it is not real-time. You can turn off
> the caches and reduce the delays, but it is fundamentally not real-time.
> >>>>
> >>>> I work at MarkLogic, and we have a real-time transactional search
> engine (and respository). If you are curious, contact me directly.
> >>>>
> >>>> I do like Solr for lots of applications -- I chose it when I was at
> Netflix.
> >>>>
> >>>> wunder
> >>>>
> >>>> On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote:
> >>>>
> >>>>> Hello Soir,
> >>>>>
> >>>>> Soir looks like an excellent API and its nice to have a tutorial that
> makes it easy to discover the basics of what Soir does, I'm impressed. I can
> see plenty of potential uses of Soir/Lucene and I'm interested now in just
> how real-time the queries made to an index can be?
> >>>>>
> >>>>> For example, in my application I have time ordered data being
> processed by a paint method in real-time. Each piece of data is identified
> and its associated renderer is invoked. The Java2D renderer would then
> lookup any layout and style values it requires to render the current data it
> has received from the layout and style indexes. What I'm wondering is if
> this lookup which would be a Lucene search will be fast enough?
> >>>>>
> >>>>> Would it be best to make Lucene queries for the relevant layout and
> style values required by the renderers ahead of rendering time and have the
> query results placed into the most performant collection (map/array) so
> renderer lookup would be as fast as possible? Or can Lucene handle many
> individual lookup queries fast enough so rendering is quick?
> >>>>>
> >>>>> Best regards from Canada,
> >>>>>
> >>>>> Thom
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >
> > --------------------------
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> >
> > Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
> >
> >
>
>

Re: How real-time are Solr/Lucene queries?

Reply via email to