Re: How real-time are Solr/Lucene queries?

Peter Karich Sun, 23 May 2010 13:02:05 -0700

Hi Thomas,

> A question that remains is this, is it better to use the core Lucene
API in my local client
> for the work it does locally with indexes or is it okay to use
embedded Solr with SolrJ?


Thats a very good question. Hopefully experts could answer this for us.
I will use SolrJ instead Lucene because of [1] and because I think the
explanation of [2] is a bit misleading,
because it only means the EmbeddedServer part of SolrJ is deprecated and
not the whole API e.g. via CommonsHttpSolrServer
<http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.html>.


But I do not know this for sure.

Regards,
Peter.

[1]
http://stackoverflow.com/questions/2856427/situations-to-prefer-apache-lucene-over-solr

[2]
http://wiki.apache.org/solr/EmbeddedSolr

> While Solr is optimized for the server aspects I'm not sure if it is the best 
> option for the client side of things?
>
> Thom
>
>
> On 2010-05-23, at 7:36 AM, Peter Karich wrote:
>
>   
>> Hi,
>>
>> just as a side note as I did not read the link in your conversation:
>>
>> http://wiki.apache.org/lucene-java/NearRealtimeSearch (I just stumbled
>> over this as I am interested in this feature too)
>>
>> Regards,
>> Peter.
>>
>>     
>>> Thanks for the new information. Its really great to see so many options for 
>>> Lucene.
>>>
>>> In my scenario there are the following pieces:
>>>
>>> 1 - A local Java client with an embedded Solr instance and its own local 
>>> index/s.
>>> 2 - A remote server running Solr with index/s that are more like a 
>>> repository that local clients query for extra goodies.
>>> 3 - The client is also a JXTA node so it can share indexes or documents too.
>>> 4 - There is no browser involved what so ever.
>>>
>>> My music composing application is a local client that uses configurations 
>>> which would become many different document types. A subset of these 
>>> configurations will be bundled with the application and then many more 
>>> would be made available via a server/s running Solr.
>>>
>>> I would not expect the queries which would be made from within the local 
>>> client to be returned in real-time. I would only expect such queries to be 
>>> made in reasonable time and returned to the client. The client would have 
>>> its local Lucene index system (embedded Solr using SolrJ) which would be 
>>> updated with the results of the query made to the Solr instance running on 
>>> the remote server.
>>>
>>> Then the user on the client would issue queries to the local Lucene index/s 
>>> to obtain results which are used to setup contexts for different aspects of 
>>> the client. For example: an activated context for musical scales and 
>>> rhythms used for creating musical notes, an activated context for rendering 
>>> with layout and style information for different music symbol renderer types.
>>>
>>> I'm not yet sure but it may be best to make queries against the local 
>>> Lucene index/s and then convert the results into some context objects, 
>>> maybe an array or map (I'd like to learn more about how query results can 
>>> be returned as arrays or maps as well). Then the tools and renderers which 
>>> require the information in the contexts would do any real-time lookup 
>>> directly from the context objects not the local or remote Lucene or Solr 
>>> index/s. The local client is also a JXTA node so it can share its own 
>>> index/s with fellow peers.
>>>
>>> This is how I envision this happening with my limited knowledge of 
>>> Lucene/Solr at this time. What are your thoughts on the feasibility of such 
>>> a scenario?
>>>
>>> I'm just reading through the Solr reference PDF now and looking over the 
>>> Solr admin application. Looking at the Schema.xml it seems to be field not 
>>> document oriented. From my point of view I think in terms of configuration 
>>> types which would be documents. In the schema it seems like only fields are 
>>> defined and it does not matter which configuration/document they belong to? 
>>> I guess this is fine as long as the indexing takes into account my unique 
>>> document types and I can search for them as a whole as well, not only for 
>>> specific values across a set of indexed documents. 
>>>
>>> Also, does the schema allow me to index certain documents into specific 
>>> indexes or are they all just bunched together? I'd rather have unique 
>>> indexes for specific document types. I've just read about multiple cores 
>>> running under one Solr instance, is this the only way to support multiple 
>>> indexes?
>>>
>>> I'm thinking of ordering the Lucene in Action v2 book which is due this 
>>> month and also the Solr 1.4 book. Before I do I just need to understand a 
>>> few things which is why I'm writing such a long message :-)
>>>
>>> Thom
>>>
>>>
>>> On 2010-05-21, at 2:12 AM, Ben Eliott wrote:
>>>
>>>
>>>       
>>>> Further to earlier note re Lucandra.  I note that Cassandra, which 
>>>> Lucandra backs onto,  is 'eventually consistent',  so given your real-time 
>>>> requirements,  you may want to review this in the first instance, if 
>>>> Lucandra is of interest.
>>>>
>>>> On 21 May 2010, at 06:12, Walter Underwood wrote:
>>>>
>>>>
>>>>         
>>>>> Solr is a very good engine, but it is not real-time. You can turn off the 
>>>>> caches and reduce the delays, but it is fundamentally not real-time.
>>>>>
>>>>> I work at MarkLogic, and we have a real-time transactional search engine 
>>>>> (and respository). If you are curious, contact me directly.
>>>>>
>>>>> I do like Solr for lots of applications -- I chose it when I was at 
>>>>> Netflix.
>>>>>
>>>>> wunder
>>>>>
>>>>> On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> Hello Soir,
>>>>>>
>>>>>> Soir looks like an excellent API and its nice to have a tutorial that 
>>>>>> makes it easy to discover the basics of what Soir does, I'm impressed. I 
>>>>>> can see plenty of potential uses of Soir/Lucene and I'm interested now 
>>>>>> in just how real-time the queries made to an index can be?
>>>>>>
>>>>>> For example, in my application I have time ordered data being processed 
>>>>>> by a paint method in real-time. Each piece of data is identified and its 
>>>>>> associated renderer is invoked. The Java2D renderer would then lookup 
>>>>>> any layout and style values it requires to render the current data it 
>>>>>> has received from the layout and style indexes. What I'm wondering is if 
>>>>>> this lookup which would be a Lucene search will be fast enough?
>>>>>>
>>>>>> Would it be best to make Lucene queries for the relevant layout and 
>>>>>> style values required by the renderers ahead of rendering time and have 
>>>>>> the query results placed into the most performant collection (map/array) 
>>>>>> so renderer lookup would be as fast as possible? Or can Lucene handle 
>>>>>> many individual lookup queries fast enough so rendering is quick?
>>>>>>
>>>>>> Best regards from Canada,
>>>>>>
>>>>>> Thom
>>>>>>             
>>
>>     
>
>   


-- 
Free your timetabling!
http://timefinder.sourceforge.net/

Re: How real-time are Solr/Lucene queries?

Reply via email to