Thanks Timothy,

In regards to you mentioning using MoreLikeThis, do you know what kind of
algorithm it uses? My searching didn't reveal anything.


On Thu, Mar 28, 2013 at 10:51 AM, Timothy Potter <thelabd...@gmail.com>wrote:

> Hi Mike,
>
> Interesting problem - here's some pointers on where to get started.
>
> For finding similar segments, check out Solr's More Like This support -
> it's built in to the query request processing so you just need to enable it
> with query params.
>
> There's nothing built in for doing batch queries from the client side. You
> might look into implementing a custom search component and register it as a
> first-component in your search handler (take a look at solrconfig.xml for
> how search handlers are configured, e.g. /browse).
>
> Cheers,
> Tim
>
>
> On Thu, Mar 28, 2013 at 9:43 AM, Mike Haas <mikehaas...@gmail.com> wrote:
>
> > Hello. My company is currently thinking of switching over to Solr 4.2,
> > coming off of SQL Server. However, what we need to do is a bit weird.
> >
> > Right now, we have ~12 million segments and growing. Usually these are
> > sentences but can be other things. These segments are what will be stored
> > in Solr. I’ve already done that.
> >
> > Now, what happens is a user will upload say a word document to us. We
> then
> > parse it and process it into segments. It very well could be 5000
> segments
> > or even more in that word document. Each one of those ~5000 segments
> needs
> > to be searched for similar segments in solr. I’m not quite sure how I
> will
> > do the query (whether proximate or something else). The point though, is
> to
> > get back similar results for each segment.
> >
> > However, I think I’m seeing a bigger problem first. I have to search
> > against ~5000 segments. That would be 5000 http requests. That’s a lot!
> I’m
> > pretty sure that would take a LOT of hardware. Keep in mind this could be
> > happening with maybe 4 different users at once right now (and of course
> > more in the future). Is there a good way to send a batch query over one
> (or
> > at least a lot fewer) http requests?
> >
> > If not, what kinds of things could I do to implement such a feature (if
> > feasible, of course)?
> >
> >
> > Thanks,
> >
> > Mike
> >
>

Reply via email to