This is an interesting discussion and I have a few questions: 1) My apologies but I haven't been following the NRT patch beyond what was presented at a meetup some months back and the wiki but what is the status of it in Solr? 2) What are typical/accepted definitions of "Real Time" vs "Near Real Time"? ==> Related to Grant's points earlier. 3) I could understand POSTing a document to a server and then turning around and searching for it on the same server but what about a replicated environment and how do you prevent caches from being blown and constantly re-warmed (hence performance degradation)? I could set Solr replication once per minute or less but then the caches caches would be regenerated each minute which I suspect is not cheap.
Note: I am curious about this from a typical web-based application perspective as opposed to an embedded/desktop application like context described earlier in this thread. Thanks! Amit On Tue, May 25, 2010 at 10:28 AM, Thomas J. Buhr <t...@superstringmedia.com>wrote: > My documents are all quite small if not down right tiny, there is not much > analysis to do. I plan to mainly use Solr for indexing application > configuration data which there is a lot of and I have all pre-formated. > Since it is a music application there are many score templates, scale and > rhythm strings, notation symbol skins, etc. Then there are slightly more > usual things to index like application help pages and tutorials. > > In terms of queries per second there will be a lot being fired by our > painter. In our application data is flowing into a painter who in turn > delegates specific painting tasks to renderer objects. These renderer > objects then make many queries extremely fast to the embedded Solr indexes > for data they need, such as layout and style values. > > Believe me there is a lot of detailed data involved in music notation and > abstracting it into configurations in the form of index documents is a good > way to manage such data. Further, the data in the form of documents work as > a form of plugins so that alternate configurations for different notation > types can be added to the index. Then via simple search it is possible to > dialup a certain set of documents which contain all the details of a given > notation. Mean while the renderer objects remain generic and are just > reconfigured with the different indexed configuration documents. > > Will making many fast queries from renderers to an embedded local Solr > index slow my painting down? > > Thom > > > On 2010-05-25, at 6:09 AM, Grant Ingersoll wrote: > > > How many docs are in the batch you are pulling down? How many > docs/second do you expect on the index size? How big are the docs? What do > you expect in terms of queries per second? How fast do new documents need > to be available on the local server? How much analysis do you have to do? > Also, define Real Time. You'd be surprised at the number of people I talk > to who think they need Real Time, but then when you ask them questions like > I just did, they don't really need it. I've seen Solr turn around new docs > in as little as 30 seconds on commodity hardware w/o any special engineering > effort and I've seen it faster than that with some engineering effort. That > isn't necessarily possible for every application, but... > > > > Despite the other suggestions, what you describe still looks feasible to > me in Solr, pending the questions above (and some followups). > > > > > > On May 21, 2010, at 4:05 AM, Thomas J. Buhr wrote: > > > >> Thanks for the new information. Its really great to see so many options > for Lucene. > >> > >> In my scenario there are the following pieces: > >> > >> 1 - A local Java client with an embedded Solr instance and its own local > index/s. > >> 2 - A remote server running Solr with index/s that are more like a > repository that local clients query for extra goodies. > >> 3 - The client is also a JXTA node so it can share indexes or documents > too. > >> 4 - There is no browser involved what so ever. > >> > >> My music composing application is a local client that uses > configurations which would become many different document types. A subset of > these configurations will be bundled with the application and then many more > would be made available via a server/s running Solr. > >> > >> I would not expect the queries which would be made from within the local > client to be returned in real-time. I would only expect such queries to be > made in reasonable time and returned to the client. The client would have > its local Lucene index system (embedded Solr using SolrJ) which would be > updated with the results of the query made to the Solr instance running on > the remote server. > >> > >> Then the user on the client would issue queries to the local Lucene > index/s to obtain results which are used to setup contexts for different > aspects of the client. For example: an activated context for musical scales > and rhythms used for creating musical notes, an activated context for > rendering with layout and style information for different music symbol > renderer types. > >> > >> I'm not yet sure but it may be best to make queries against the local > Lucene index/s and then convert the results into some context objects, maybe > an array or map (I'd like to learn more about how query results can be > returned as arrays or maps as well). Then the tools and renderers which > require the information in the contexts would do any real-time lookup > directly from the context objects not the local or remote Lucene or Solr > index/s. The local client is also a JXTA node so it can share its own > index/s with fellow peers. > >> > >> This is how I envision this happening with my limited knowledge of > Lucene/Solr at this time. What are your thoughts on the feasibility of such > a scenario? > >> > >> I'm just reading through the Solr reference PDF now and looking over the > Solr admin application. Looking at the Schema.xml it seems to be field not > document oriented. From my point of view I think in terms of configuration > types which would be documents. In the schema it seems like only fields are > defined and it does not matter which configuration/document they belong to? > I guess this is fine as long as the indexing takes into account my unique > document types and I can search for them as a whole as well, not only for > specific values across a set of indexed documents. > >> > >> Also, does the schema allow me to index certain documents into specific > indexes or are they all just bunched together? I'd rather have unique > indexes for specific document types. I've just read about multiple cores > running under one Solr instance, is this the only way to support multiple > indexes? > >> > >> I'm thinking of ordering the Lucene in Action v2 book which is due this > month and also the Solr 1.4 book. Before I do I just need to understand a > few things which is why I'm writing such a long message :-) > >> > >> Thom > >> > >> > >> On 2010-05-21, at 2:12 AM, Ben Eliott wrote: > >> > >>> Further to earlier note re Lucandra. I note that Cassandra, which > Lucandra backs onto, is 'eventually consistent', so given your real-time > requirements, you may want to review this in the first instance, if > Lucandra is of interest. > >>> > >>> On 21 May 2010, at 06:12, Walter Underwood wrote: > >>> > >>>> Solr is a very good engine, but it is not real-time. You can turn off > the caches and reduce the delays, but it is fundamentally not real-time. > >>>> > >>>> I work at MarkLogic, and we have a real-time transactional search > engine (and respository). If you are curious, contact me directly. > >>>> > >>>> I do like Solr for lots of applications -- I chose it when I was at > Netflix. > >>>> > >>>> wunder > >>>> > >>>> On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote: > >>>> > >>>>> Hello Soir, > >>>>> > >>>>> Soir looks like an excellent API and its nice to have a tutorial that > makes it easy to discover the basics of what Soir does, I'm impressed. I can > see plenty of potential uses of Soir/Lucene and I'm interested now in just > how real-time the queries made to an index can be? > >>>>> > >>>>> For example, in my application I have time ordered data being > processed by a paint method in real-time. Each piece of data is identified > and its associated renderer is invoked. The Java2D renderer would then > lookup any layout and style values it requires to render the current data it > has received from the layout and style indexes. What I'm wondering is if > this lookup which would be a Lucene search will be fast enough? > >>>>> > >>>>> Would it be best to make Lucene queries for the relevant layout and > style values required by the renderers ahead of rendering time and have the > query results placed into the most performant collection (map/array) so > renderer lookup would be as fast as possible? Or can Lucene handle many > individual lookup queries fast enough so rendering is quick? > >>>>> > >>>>> Best regards from Canada, > >>>>> > >>>>> Thom > >>>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>> > >>> > >> > > > > -------------------------- > > Grant Ingersoll > > http://www.lucidimagination.com/ > > > > Search the Lucene ecosystem using Solr/Lucene: > http://www.lucidimagination.com/search > > > > > >