: Now, what happens is a user will upload say a word document to us. We then
: parse it and process it into segments. It very well could be 5000 segments
: or even more in that word document. Each one of those ~5000 segments needs
: to be searched for similar segments in solr. I’m not quite sure h
I will definitely let you all know what we end up doing. I realized I
forgot to mention something that might make what we do more clear.
Right now we use sql server full text to get back fairly similar matches
for each segment. We do this with some funky sql stuff which I didn't write
and haven't
On Thu, Mar 28, 2013 at 12:27 PM, Mike Haas wrote:
> Thanks for your reply, Roman. Unfortunately, the business has been running
> this way forever so I don't think it would be feasible to switch to a whole
>
sure, no arguing against that :)
> document store versus segments store. Even then, if
Thanks Timothy,
In regards to you mentioning using MoreLikeThis, do you know what kind of
algorithm it uses? My searching didn't reveal anything.
On Thu, Mar 28, 2013 at 10:51 AM, Timothy Potter wrote:
> Hi Mike,
>
> Interesting problem - here's some pointers on where to get started.
>
> For fi
This might not be a good match for Solr, or for many other systems. It does
seem like a natural fit for MarkLogic. That natively searches and selects over
XML documents.
Disclaimer: I worked at MarkLogic for a couple of years.
wunder
On Mar 28, 2013, at 9:27 AM, Mike Haas wrote:
> Thanks for
Thanks for your reply, Roman. Unfortunately, the business has been running
this way forever so I don't think it would be feasible to switch to a whole
document store versus segments store. Even then, if I understand you
correctly it would not work for our needs. I'm thinking because we don't
care a
Apologies if you already do something similar, but perhaps of general
interest...
One (different approach) to your problem is to implement a local
fingerprint - if you want to find documents with overlapping segments, this
algorithm will dramatically reduce the number of segments you create/search
Hi Mike,
Interesting problem - here's some pointers on where to get started.
For finding similar segments, check out Solr's More Like This support -
it's built in to the query request processing so you just need to enable it
with query params.
There's nothing built in for doing batch queries fro
Hello. My company is currently thinking of switching over to Solr 4.2,
coming off of SQL Server. However, what we need to do is a bit weird.
Right now, we have ~12 million segments and growing. Usually these are
sentences but can be other things. These segments are what will be stored
in Solr. I’v