Re: How real-time are Solr/Lucene queries?

Grant Ingersoll Tue, 25 May 2010 04:10:26 -0700

How many docs are in the batch you are pulling down?  How many docs/second do 
you expect on the index size?  How big are the docs?  What do you expect in 
terms of queries per second?  How fast do new documents need to be available on 
the local server?  How much analysis do you have to do?  Also, define Real 
Time.  You'd be surprised at the number of people I talk to who think they need 
Real Time, but then when you ask them questions like I just did, they don't 
really need it.  I've seen Solr turn around new docs in as little as 30 seconds 
on commodity hardware w/o any special engineering effort and I've seen it 
faster than that with some engineering effort.  That isn't necessarily possible 
for every application, but...


Despite the other suggestions, what you describe still looks feasible to me in 
Solr, pending the questions above (and some followups).


On May 21, 2010, at 4:05 AM, Thomas J. Buhr wrote:

> Thanks for the new information. Its really great to see so many options for 
> Lucene.
> 
> In my scenario there are the following pieces:
> 
> 1 - A local Java client with an embedded Solr instance and its own local 
> index/s.
> 2 - A remote server running Solr with index/s that are more like a repository 
> that local clients query for extra goodies.
> 3 - The client is also a JXTA node so it can share indexes or documents too.
> 4 - There is no browser involved what so ever.
> 
> My music composing application is a local client that uses configurations 
> which would become many different document types. A subset of these 
> configurations will be bundled with the application and then many more would 
> be made available via a server/s running Solr.
> 
> I would not expect the queries which would be made from within the local 
> client to be returned in real-time. I would only expect such queries to be 
> made in reasonable time and returned to the client. The client would have its 
> local Lucene index system (embedded Solr using SolrJ) which would be updated 
> with the results of the query made to the Solr instance running on the remote 
> server.
> 
> Then the user on the client would issue queries to the local Lucene index/s 
> to obtain results which are used to setup contexts for different aspects of 
> the client. For example: an activated context for musical scales and rhythms 
> used for creating musical notes, an activated context for rendering with 
> layout and style information for different music symbol renderer types.
> 
> I'm not yet sure but it may be best to make queries against the local Lucene 
> index/s and then convert the results into some context objects, maybe an 
> array or map (I'd like to learn more about how query results can be returned 
> as arrays or maps as well). Then the tools and renderers which require the 
> information in the contexts would do any real-time lookup directly from the 
> context objects not the local or remote Lucene or Solr index/s. The local 
> client is also a JXTA node so it can share its own index/s with fellow peers.
> 
> This is how I envision this happening with my limited knowledge of 
> Lucene/Solr at this time. What are your thoughts on the feasibility of such a 
> scenario?
> 
> I'm just reading through the Solr reference PDF now and looking over the Solr 
> admin application. Looking at the Schema.xml it seems to be field not 
> document oriented. From my point of view I think in terms of configuration 
> types which would be documents. In the schema it seems like only fields are 
> defined and it does not matter which configuration/document they belong to? I 
> guess this is fine as long as the indexing takes into account my unique 
> document types and I can search for them as a whole as well, not only for 
> specific values across a set of indexed documents. 
> 
> Also, does the schema allow me to index certain documents into specific 
> indexes or are they all just bunched together? I'd rather have unique indexes 
> for specific document types. I've just read about multiple cores running 
> under one Solr instance, is this the only way to support multiple indexes?
> 
> I'm thinking of ordering the Lucene in Action v2 book which is due this month 
> and also the Solr 1.4 book. Before I do I just need to understand a few 
> things which is why I'm writing such a long message :-)
> 
> Thom
> 
> 
> On 2010-05-21, at 2:12 AM, Ben Eliott wrote:
> 
>> Further to earlier note re Lucandra.  I note that Cassandra, which Lucandra 
>> backs onto,  is 'eventually consistent',  so given your real-time 
>> requirements,  you may want to review this in the first instance, if 
>> Lucandra is of interest.
>> 
>> On 21 May 2010, at 06:12, Walter Underwood wrote:
>> 
>>> Solr is a very good engine, but it is not real-time. You can turn off the 
>>> caches and reduce the delays, but it is fundamentally not real-time.
>>> 
>>> I work at MarkLogic, and we have a real-time transactional search engine 
>>> (and respository). If you are curious, contact me directly.
>>> 
>>> I do like Solr for lots of applications -- I chose it when I was at Netflix.
>>> 
>>> wunder
>>> 
>>> On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote:
>>> 
>>>> Hello Soir,
>>>> 
>>>> Soir looks like an excellent API and its nice to have a tutorial that 
>>>> makes it easy to discover the basics of what Soir does, I'm impressed. I 
>>>> can see plenty of potential uses of Soir/Lucene and I'm interested now in 
>>>> just how real-time the queries made to an index can be?
>>>> 
>>>> For example, in my application I have time ordered data being processed by 
>>>> a paint method in real-time. Each piece of data is identified and its 
>>>> associated renderer is invoked. The Java2D renderer would then lookup any 
>>>> layout and style values it requires to render the current data it has 
>>>> received from the layout and style indexes. What I'm wondering is if this 
>>>> lookup which would be a Lucene search will be fast enough?
>>>> 
>>>> Would it be best to make Lucene queries for the relevant layout and style 
>>>> values required by the renderers ahead of rendering time and have the 
>>>> query results placed into the most performant collection (map/array) so 
>>>> renderer lookup would be as fast as possible? Or can Lucene handle many 
>>>> individual lookup queries fast enough so rendering is quick?
>>>> 
>>>> Best regards from Canada,
>>>> 
>>>> Thom
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search

Re: How real-time are Solr/Lucene queries?

Reply via email to