Re: [ANNOUNCE] Apache Solr 6.4.2 released

2017-03-08 Thread Caruana, Matthew
n 8 Mar 2017, at 5:25 pm, Shawn Heisey <apa...@elyograg.org> wrote: > >> On 3/8/2017 5:30 AM, Caruana, Matthew wrote: >> After upgrading to 6.4.2 from 6.4.1, we’ve seen replication time for a >> 200gb index decrease from 45 hours to 1.5 hours. > > Just to che

Re: [ANNOUNCE] Apache Solr 6.4.2 released

2017-03-08 Thread Caruana, Matthew
After upgrading to 6.4.2 from 6.4.1, we’ve seen replication time for a 200gb index decrease from 45 hours to 1.5 hours. > On 7 Mar 2017, at 20:32, Ishan Chattopadhyaya wrote: > > 7 March 2017, Apache Solr 6.4.2 available > > Solr is the popular, blazing fast, open source

Re: What is the bottleneck for an optimise operation? / solve the disk space and time issues by specifying multiple segments to optimize

2017-03-05 Thread Caruana, Matthew
ging area in the filesystem. > > cheers -- Rick > > >> On 2017-03-03 03:00 AM, Caruana, Matthew wrote: >> This is the current config: >> >> >> 100 >> 1 >> > cl

Re: What is the bottleneck for an optimise operation?

2017-03-03 Thread Caruana, Matthew
if any performance gain is worth it ;)... And as I mentioned earlier, optimizing is unlikely to be related to OOMs during indexing. You never know of course Best, Erick On Fri, Mar 3, 2017 at 3:40 AM, Caruana, Matthew <mcaru...@icij.org<mailto:mcaru...@icij.org>> wrote: Than

Re: What is the bottleneck for an optimise operation?

2017-03-03 Thread Caruana, Matthew
Thank you, you’re right - only one of the four cores is hitting 100%. This is the correct answer. The bottleneck is CPU exacerbated by an absence of parallelisation. > On 3 Mar 2017, at 12:32, Toke Eskildsen <t...@kb.dk> wrote: > > On Thu, 2017-03-02 at 15:39 +, Caruana

Re: What is the bottleneck for an optimise operation? / solve the disk space and time issues by specifying multiple segments to optimize

2017-03-03 Thread Caruana, Matthew
> doing the grand optimize: > https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig#IndexConfiginSolrConfig-MergingIndexSegments > > Regards, > Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > O

Re: What is the bottleneck for an optimise operation? / solve the disk space and time issues by specifying multiple segments to optimize

2017-03-02 Thread Caruana, Matthew
ation that can require up to 3-times >>> amount of disk during the processing. >>> >>> This is not to say yours is a valid question, which I am leaving to >>> others to respond. >>> >>> Regards, >>>Alex. >>> >>>

Re: What is the bottleneck for an optimise operation? / solve the disk space and time issues by specifying multiple segments to optimize

2017-03-02 Thread Caruana, Matthew
eaving to >>> others to respond. >>> >>> Regards, >>>Alex. >>> >>> http://www.solr-start.com/ - Resources for Solr users, new and experienced >>> >>> >>>> On 2 March 2017 at 10:04, Caruana, Matthew &l

Re: What is the bottleneck for an optimise operation?

2017-03-02 Thread Caruana, Matthew
doc index > with an 8 GB heap (Java 8u121, G1 collector). I recommend a smaller heap so > the OS can use that RAM to cache file buffers. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > >> On Mar 2, 2017, at

Re: What is the bottleneck for an optimise operation?

2017-03-02 Thread Caruana, Matthew
cessing. >> >> This is not to say yours is a valid question, which I am leaving to >> others to respond. >> >> Regards, >> Alex. >> >> http://www.solr-start.com/ - Resources for Solr users, new and experienced >> >> >>

What is the bottleneck for an optimise operation?

2017-03-02 Thread Caruana, Matthew
I’m currently performing an optimise operation on a ~190GB index with about 4 million documents. The process has been running for hours. This is surprising, because the machine is an EC2 r4.xlarge with four cores and 30GB of RAM, 24GB of which is allocated to the JVM. The load average has been

Re: Stored value for highlighting from different field?

2017-03-01 Thread Caruana, Matthew
Many of our field values are large, so we want to use the unified highlighter for its performance benefits. Development also seems to be focussed on that highlighter. > On 1 Mar 2017, at 19:07, Rick Leir wrote: > > Matthew, Is TVH term vector highlighter an option? Just a

Stored value for highlighting from different field?

2017-03-01 Thread Caruana, Matthew
We’re currently using copyField directives in our schema to copy the same text to different fields that use different analysers. For example, assuming the original field contained in the document payload sent to the update handler is called “tika_output", it is copied to “text”,