Re: Must QueryComponent always be on and other Design Questions

Ryan McKinley Tue, 21 Oct 2008 08:10:04 -0700


On Oct 21, 2008, at 8:17 AM, Grant Ingersoll wrote:

On Oct 20, 2008, at 11:35 PM, Otis Gospodnetic wrote:
This is related to something I must have only day dreamed (dreamt?)about, but not actually mentioned on solr-dev.My feeling is we are moving Solr in a direction of a more generalweb service that can host various NLP and ML components, and nolonger only do IR/Lucene. We see that with a few patches thatGrant is cooking, I think we'll see that in the Solr+Mahoutmarriage down the road, and so on.
I somewhat agree, but I hesitate to go so far as saying a "generalweb service".

I won't suggest that solr is (or should be) a general web service, butwt=json/xml/python + RequestHandler makes a pretty nice cross platforminterface all on its own.

I see Solr as a pretty nice platform for doing things like NLP/ML(see the AnalysisRequestHandler, TermVectorComponent,ClusteringComponent, LukeReqHandler, FacetingComp., Payloads, etc.),but I mostly view them as enhancing search/navigation. That is,things like clustering/faceting (they are closely related), namedentity recognition, search, etc. all act as organizing componentsfor structured and unstructured data. Expressing my vision for Solr(and actually, the Lucene TLP, too, if I put on my PMC hat) it's onethat aims to bring coherence to (structured and unstructured)content. This starts with search as a foundation, since theindexing process creates much of the information that empowers theothers. I think once you see the flexible indexing stuff added toLucene Java, we'll see even more opportunity for making Solr evenmore powerful in these regards.


agree.

Is it time to start thinking about Solr sa a server for IR and MLand NLP tasks and see how the tightly coupled Lucene can be mademore....pluggable?
Yeah, this is what the Solr 2.0 thread that Yonik started a fewweeks ago aims to discuss, along with scalability/fault tolerance.More important, for me anyway, is the decoupling of theconfiguration. For instance, I see no reason why IndexSchema needsto know anything about an InputStream.


also agree.  The biggest challenge for 2.0 is decoupling configuration

As for Lucene, it's really quite good at serving as the backendstore/enabler for all these tasks.


I have not messed with it yet, but perhaps also HBase...

At any rate, the question still remains as to how best to handle theQueryComponent :-)


aaah, your question!

I see two options:

1. If no other component needs docList or docSet and the query isempty, then skip the QueryComponent2. add a 'runQuery' param (or somethign like that) and default totrue. It can be turned off when not necessary.


I like option 1 better.

ryan

Re: Must QueryComponent always be on and other Design Questions

Reply via email to