Re: How real-time are Solr/Lucene queries?

Jason Rutherglen Tue, 25 May 2010 09:52:05 -0700

The main issue is if you're using facets, which are currently
inefficient for the realtime use case because they're created on the
entire set of segment/readers.  Field caches in Lucene are per segment
and so don't have this problem.


On Tue, May 25, 2010 at 4:09 AM, Grant Ingersoll <gsing...@apache.org> wrote:
> How many docs are in the batch you are pulling down?  How many docs/second do 
> you expect on the index size?  How big are the docs?  What do you expect in 
> terms of queries per second?  How fast do new documents need to be available 
> on the local server?  How much analysis do you have to do?  Also, define Real 
> Time.  You'd be surprised at the number of people I talk to who think they 
> need Real Time, but then when you ask them questions like I just did, they 
> don't really need it.  I've seen Solr turn around new docs in as little as 30 
> seconds on commodity hardware w/o any special engineering effort and I've 
> seen it faster than that with some engineering effort.  That isn't 
> necessarily possible for every application, but...
>
> Despite the other suggestions, what you describe still looks feasible to me 
> in Solr, pending the questions above (and some followups).
>
>
> On May 21, 2010, at 4:05 AM, Thomas J. Buhr wrote:
>
>> Thanks for the new information. Its really great to see so many options for 
>> Lucene.
>>
>> In my scenario there are the following pieces:
>>
>> 1 - A local Java client with an embedded Solr instance and its own local 
>> index/s.
>> 2 - A remote server running Solr with index/s that are more like a 
>> repository that local clients query for extra goodies.
>> 3 - The client is also a JXTA node so it can share indexes or documents too.
>> 4 - There is no browser involved what so ever.
>>
>> My music composing application is a local client that uses configurations 
>> which would become many different document types. A subset of these 
>> configurations will be bundled with the application and then many more would 
>> be made available via a server/s running Solr.
>>
>> I would not expect the queries which would be made from within the local 
>> client to be returned in real-time. I would only expect such queries to be 
>> made in reasonable time and returned to the client. The client would have 
>> its local Lucene index system (embedded Solr using SolrJ) which would be 
>> updated with the results of the query made to the Solr instance running on 
>> the remote server.
>>
>> Then the user on the client would issue queries to the local Lucene index/s 
>> to obtain results which are used to setup contexts for different aspects of 
>> the client. For example: an activated context for musical scales and rhythms 
>> used for creating musical notes, an activated context for rendering with 
>> layout and style information for different music symbol renderer types.
>>
>> I'm not yet sure but it may be best to make queries against the local Lucene 
>> index/s and then convert the results into some context objects, maybe an 
>> array or map (I'd like to learn more about how query results can be returned 
>> as arrays or maps as well). Then the tools and renderers which require the 
>> information in the contexts would do any real-time lookup directly from the 
>> context objects not the local or remote Lucene or Solr index/s. The local 
>> client is also a JXTA node so it can share its own index/s with fellow peers.
>>
>> This is how I envision this happening with my limited knowledge of 
>> Lucene/Solr at this time. What are your thoughts on the feasibility of such 
>> a scenario?
>>
>> I'm just reading through the Solr reference PDF now and looking over the 
>> Solr admin application. Looking at the Schema.xml it seems to be field not 
>> document oriented. From my point of view I think in terms of configuration 
>> types which would be documents. In the schema it seems like only fields are 
>> defined and it does not matter which configuration/document they belong to? 
>> I guess this is fine as long as the indexing takes into account my unique 
>> document types and I can search for them as a whole as well, not only for 
>> specific values across a set of indexed documents.
>>
>> Also, does the schema allow me to index certain documents into specific 
>> indexes or are they all just bunched together? I'd rather have unique 
>> indexes for specific document types. I've just read about multiple cores 
>> running under one Solr instance, is this the only way to support multiple 
>> indexes?
>>
>> I'm thinking of ordering the Lucene in Action v2 book which is due this 
>> month and also the Solr 1.4 book. Before I do I just need to understand a 
>> few things which is why I'm writing such a long message :-)
>>
>> Thom
>>
>>
>> On 2010-05-21, at 2:12 AM, Ben Eliott wrote:
>>
>>> Further to earlier note re Lucandra.  I note that Cassandra, which Lucandra 
>>> backs onto,  is 'eventually consistent',  so given your real-time 
>>> requirements,  you may want to review this in the first instance, if 
>>> Lucandra is of interest.
>>>
>>> On 21 May 2010, at 06:12, Walter Underwood wrote:
>>>
>>>> Solr is a very good engine, but it is not real-time. You can turn off the 
>>>> caches and reduce the delays, but it is fundamentally not real-time.
>>>>
>>>> I work at MarkLogic, and we have a real-time transactional search engine 
>>>> (and respository). If you are curious, contact me directly.
>>>>
>>>> I do like Solr for lots of applications -- I chose it when I was at 
>>>> Netflix.
>>>>
>>>> wunder
>>>>
>>>> On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote:
>>>>
>>>>> Hello Soir,
>>>>>
>>>>> Soir looks like an excellent API and its nice to have a tutorial that 
>>>>> makes it easy to discover the basics of what Soir does, I'm impressed. I 
>>>>> can see plenty of potential uses of Soir/Lucene and I'm interested now in 
>>>>> just how real-time the queries made to an index can be?
>>>>>
>>>>> For example, in my application I have time ordered data being processed 
>>>>> by a paint method in real-time. Each piece of data is identified and its 
>>>>> associated renderer is invoked. The Java2D renderer would then lookup any 
>>>>> layout and style values it requires to render the current data it has 
>>>>> received from the layout and style indexes. What I'm wondering is if this 
>>>>> lookup which would be a Lucene search will be fast enough?
>>>>>
>>>>> Would it be best to make Lucene queries for the relevant layout and style 
>>>>> values required by the renderers ahead of rendering time and have the 
>>>>> query results placed into the most performant collection (map/array) so 
>>>>> renderer lookup would be as fast as possible? Or can Lucene handle many 
>>>>> individual lookup queries fast enough so rendering is quick?
>>>>>
>>>>> Best regards from Canada,
>>>>>
>>>>> Thom
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem using Solr/Lucene: 
> http://www.lucidimagination.com/search
>
>

Re: How real-time are Solr/Lucene queries?

Reply via email to