Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

Mikhail Khludnev Fri, 12 Dec 2014 09:25:41 -0800

Tom,
note about https://issues.apache.org/jira/browse/SOLR-6559 and
https://issues.apache.org/jira/browse/SOLR-3585. They seem relevant.


On Fri, Dec 12, 2014 at 7:31 PM, Tom Burton-West <tburt...@umich.edu> wrote:

> Thanks everybody for the information.
>
> Shawn, thanks for bringing up the issues around making sure each document
> is indexed ok.  With our current architecture, that is important for us.
>
> Yonik's clarification about streaming really helped me to understand one of
> the main advantages of CUSS:
>
> >>When you add a document, it immediately writes it to a stream where
> solr can read it off and index it.  When you add a second document,
> it's immediately written to the same stream (or at least one of the
> open streams), as part of the same udpate request.  No separate HTTP
> request, No separate update request.
>
> In our use case, where documents are in the 700K-2MB range, I suspect that
> the overhead of opening/closing new requests is dwarfed by the time it
> takes to just send the data over the wire and parsing the data. However,
> I'm starting to think about whether I can find some time to do some
> testing.
>
> Mikhail, thanks for suggesting looking at DIH,  I haven't looked at it in
> several years and didn't realize there is now functionality to deal with
> XML documents.
>
> When I asked about being able to read XML files from the filesystem, it was
> for the purposes of running some benchmark tests to see if CUSS offers
> enough advantages to re-architect our system.
>
> Currently the main bottleneck in our system is constructing Solr documents.
> We use multiple "document producers" which are responsible both for
> creating a document and for sending it to Solr.  Although each producer
> waits until it gets a response from Solr before sending the next document
> to be indexed, we run 20-100 producers, so this is similar to CUSS running
> multiple threads. (although of course we open a new http request and Solr
> update request each time)
>
> As far as using DIH or something like it, we might be able to use it for
> testing with already created documents.
>
> Creating the documents requires assembling (and massaging) data from
> several sources including a few database queries, unzipping files on our
> filesystem and contatenating them, and querying another Solr instance which
> has metadata.
>
> I'm now thinking that for testing purposes it  might be sufficient to
> construct dummy documents as in the examples rather than trying to use our
> actual documents.  If the speed improvements look significant enough, then
> I'd need to figure out how to test with real documents.
>
> Thanks again for all the input.
>
> Tom
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhlud...@griddynamics.com>

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

Reply via email to