Re: Slow indexing speed when collection size is large

Zheng Lin Edwin Yeo Sat, 06 May 2017 17:49:36 -0700

Hi Shawn,

For my rich documentation handling, I'm using Extracting Request Handler,
and it requires OCR.

However, currently, for the slow indexing speed which I'm experiencing, the
indexing is done directly from the Sybase database. I will fetch about 1000
records at a time from Sybase, and stored in into a CacheRowSet for it to
be indexed. The query to the Sybase database is quite fast, and most of the
time is spend on processes in the CacheRowSet.

Here are the answers to the other questions:

On a single Solr server, how much total memory is installed?
A) 384 GB

What is the total amount of memory reserved for Solr heaps on that server?
A) 22 GB

What is the total on-disk size of all the Solr indexes on that server?
A) 5 TB

-- Multiple replicas must be included if they are present on one machine.
>From the core (shard replica) perspective, how many documents are on
that server?
A) About 200 million documents for both replica. Each replica is about 100
million. Currently, both replicas are in the same server, but different
disk.

-- Multiple replicas must be included here too.
Is there software other than the Solr server process(es) running on that
server?
A) A virtual machine with Sybase database is running on the server

Are you making queries at the same time you're indexing?
A) Only occasionally. Most of the time, there is no queries made.

Regards,
Edwin

On 6 May 2017 at 20:41, Shawn Heisey <apa...@elyograg.org> wrote:

> On 5/1/2017 10:17 AM, Zheng Lin Edwin Yeo wrote:
> > I'm using Solrj for the indexing, not using curl. Normally I bundle
> > about 1000 documents for each POST. There's more than 300GB of RAM for
> > that server, and I do not use any sharing at the moment.
>
> Looking over your email history on the list, I was able to determine
> some information, but not everything I was wondering about.  I have some
> questions.
>
> Are you still using the Extracting Request Handler for your rich
> document handling, or have you moved Tika processing outside Solr?
> If it's outside Solr, is it on different machines?
> Are your rich documents still requiring OCR?
>
> Other questions:
>
> On a single Solr server, how much total memory is installed?
> What is the total amount of memory reserved for Solr heaps on that server?
> What is the total on-disk size of all the Solr indexes on that server?
> -- Multiple replicas must be included if they are present on one machine.
> From the core (shard replica) perspective, how many documents are on
> that server?
> -- Multiple replicas must be included here too.
> Is there software other than the Solr server process(es) running on that
> server?
> Are you making queries at the same time you're indexing?
>
> Thanks,
> Shawn
>
>

Re: Slow indexing speed when collection size is large

Reply via email to