This is related to something I must have only day dreamed (dreamt?) about, but not actually mentioned on solr-dev. My feeling is we are moving Solr in a direction of a more general web service that can host various NLP and ML components, and no longer only do IR/Lucene. We see that with a few patches that Grant is cooking, I think we'll see that in the Solr+Mahout marriage down the road, and so on.
Is it time to start thinking about Solr sa a server for IR and ML and NLP tasks and see how the tightly coupled Lucene can be made more....pluggable? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Grant Ingersoll <[EMAIL PROTECTED]> > To: [email protected] > Sent: Monday, October 20, 2008 7:56:32 PM > Subject: Must QueryComponent always be on and other Design Questions > > I've run into this a couple of times now and I feel like it warrants a > discussion > > For both the SpellCheckComponent (SCC) and now for the new > ClusteringComponent (SOLR-769) I think there are cases where the > QueryComponent (QC) is not required. In the SpellCheckComponent case > it is when building the spelling index. In the ClusteringComponent, > it is possible to ask for document clusters without running any query > (it also will be possible to get clusters _with_ a query as well, and > it also is distinguished from the handling of search results > clustering, too). Thus, it seems really weird to have to pass in a > dummy query, yet that is what one has to do in order to avoid getting > an NPE in the QC. > > Now, I suppose these pieces could be modeled as something else or it's > possible to split the two functionalities into separate things (1 > ReqHandler, 1 SearchComp). In fact, the said functionality is not > really "search" functionality, or SearchComponent functionality, yet > much of the rest of the functionality in the code in question is > "search" functionality and logically belongs as a SearchComponent. In > the case of the SCC build, it's akin to an indexing operation. In the > clustering case, it's a query, albeit a non-traditional one. In some > sense, this kind of document clustering is like non-query based > faceting which leads to more navigation/browsing instead of searching. > > The quick fix is to just put in null checks into the QC or pass in a > dummy query with rows=0, but I'm not sure if there isn't a slightly > bigger picture here that needs adjusting in terms of > SearchComponents. Namely, must the QC always be on? And, should we > think a little more about components that don't require a query in > order to function and how they play in the scheme of things? > > Thoughts? Recommendations? > > -Grant
