The main issue is if you're using facets, which are currently inefficient for the realtime use case because they're created on the entire set of segment/readers. Field caches in Lucene are per segment and so don't have this problem.
On Tue, May 25, 2010 at 4:09 AM, Grant Ingersoll <gsing...@apache.org> wrote: > How many docs are in the batch you are pulling down? How many docs/second do > you expect on the index size? How big are the docs? What do you expect in > terms of queries per second? How fast do new documents need to be available > on the local server? How much analysis do you have to do? Also, define Real > Time. You'd be surprised at the number of people I talk to who think they > need Real Time, but then when you ask them questions like I just did, they > don't really need it. I've seen Solr turn around new docs in as little as 30 > seconds on commodity hardware w/o any special engineering effort and I've > seen it faster than that with some engineering effort. That isn't > necessarily possible for every application, but... > > Despite the other suggestions, what you describe still looks feasible to me > in Solr, pending the questions above (and some followups). > > > On May 21, 2010, at 4:05 AM, Thomas J. Buhr wrote: > >> Thanks for the new information. Its really great to see so many options for >> Lucene. >> >> In my scenario there are the following pieces: >> >> 1 - A local Java client with an embedded Solr instance and its own local >> index/s. >> 2 - A remote server running Solr with index/s that are more like a >> repository that local clients query for extra goodies. >> 3 - The client is also a JXTA node so it can share indexes or documents too. >> 4 - There is no browser involved what so ever. >> >> My music composing application is a local client that uses configurations >> which would become many different document types. A subset of these >> configurations will be bundled with the application and then many more would >> be made available via a server/s running Solr. >> >> I would not expect the queries which would be made from within the local >> client to be returned in real-time. I would only expect such queries to be >> made in reasonable time and returned to the client. The client would have >> its local Lucene index system (embedded Solr using SolrJ) which would be >> updated with the results of the query made to the Solr instance running on >> the remote server. >> >> Then the user on the client would issue queries to the local Lucene index/s >> to obtain results which are used to setup contexts for different aspects of >> the client. For example: an activated context for musical scales and rhythms >> used for creating musical notes, an activated context for rendering with >> layout and style information for different music symbol renderer types. >> >> I'm not yet sure but it may be best to make queries against the local Lucene >> index/s and then convert the results into some context objects, maybe an >> array or map (I'd like to learn more about how query results can be returned >> as arrays or maps as well). Then the tools and renderers which require the >> information in the contexts would do any real-time lookup directly from the >> context objects not the local or remote Lucene or Solr index/s. The local >> client is also a JXTA node so it can share its own index/s with fellow peers. >> >> This is how I envision this happening with my limited knowledge of >> Lucene/Solr at this time. What are your thoughts on the feasibility of such >> a scenario? >> >> I'm just reading through the Solr reference PDF now and looking over the >> Solr admin application. Looking at the Schema.xml it seems to be field not >> document oriented. From my point of view I think in terms of configuration >> types which would be documents. In the schema it seems like only fields are >> defined and it does not matter which configuration/document they belong to? >> I guess this is fine as long as the indexing takes into account my unique >> document types and I can search for them as a whole as well, not only for >> specific values across a set of indexed documents. >> >> Also, does the schema allow me to index certain documents into specific >> indexes or are they all just bunched together? I'd rather have unique >> indexes for specific document types. I've just read about multiple cores >> running under one Solr instance, is this the only way to support multiple >> indexes? >> >> I'm thinking of ordering the Lucene in Action v2 book which is due this >> month and also the Solr 1.4 book. Before I do I just need to understand a >> few things which is why I'm writing such a long message :-) >> >> Thom >> >> >> On 2010-05-21, at 2:12 AM, Ben Eliott wrote: >> >>> Further to earlier note re Lucandra. I note that Cassandra, which Lucandra >>> backs onto, is 'eventually consistent', so given your real-time >>> requirements, you may want to review this in the first instance, if >>> Lucandra is of interest. >>> >>> On 21 May 2010, at 06:12, Walter Underwood wrote: >>> >>>> Solr is a very good engine, but it is not real-time. You can turn off the >>>> caches and reduce the delays, but it is fundamentally not real-time. >>>> >>>> I work at MarkLogic, and we have a real-time transactional search engine >>>> (and respository). If you are curious, contact me directly. >>>> >>>> I do like Solr for lots of applications -- I chose it when I was at >>>> Netflix. >>>> >>>> wunder >>>> >>>> On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote: >>>> >>>>> Hello Soir, >>>>> >>>>> Soir looks like an excellent API and its nice to have a tutorial that >>>>> makes it easy to discover the basics of what Soir does, I'm impressed. I >>>>> can see plenty of potential uses of Soir/Lucene and I'm interested now in >>>>> just how real-time the queries made to an index can be? >>>>> >>>>> For example, in my application I have time ordered data being processed >>>>> by a paint method in real-time. Each piece of data is identified and its >>>>> associated renderer is invoked. The Java2D renderer would then lookup any >>>>> layout and style values it requires to render the current data it has >>>>> received from the layout and style indexes. What I'm wondering is if this >>>>> lookup which would be a Lucene search will be fast enough? >>>>> >>>>> Would it be best to make Lucene queries for the relevant layout and style >>>>> values required by the renderers ahead of rendering time and have the >>>>> query results placed into the most performant collection (map/array) so >>>>> renderer lookup would be as fast as possible? Or can Lucene handle many >>>>> individual lookup queries fast enough so rendering is quick? >>>>> >>>>> Best regards from Canada, >>>>> >>>>> Thom >>>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >> > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem using Solr/Lucene: > http://www.lucidimagination.com/search > >