Re: SolrCloud performance

2018-11-02 Thread Deepak Goel
Please see inline for my thoughts Deepak "The greatness of a nation can be judged by the way its animals are treated. Please consider stopping the cruelty by becoming a Vegan" +91 73500 12833 deic...@gmail.com Facebook: https://www.facebook.com/deicool LinkedIn: www.linkedin.com/in/deicool

Re: Index optimization takes too long

2018-11-02 Thread Shawn Heisey
On 11/2/2018 5:00 PM, Wei wrote: After a recent schema change, it takes almost 40 minutes to optimize the index. The schema change is to enable docValues for all sort/facet fields, which increase the index size from 12G to 14G. Before the change it only takes 5 minutes to do the optimization.

Index optimization takes too long

2018-11-02 Thread Wei
Hello, After a recent schema change, it takes almost 40 minutes to optimize the index. The schema change is to enable docValues for all sort/facet fields, which increase the index size from 12G to 14G. Before the change it only takes 5 minutes to do the optimization. I have tried to increase

Re: SolrCloud performance

2018-11-02 Thread Shawn Heisey
On 11/2/2018 1:38 PM, Chuming Chen wrote: I am running a Solr cloud 7.4 with 4 shards and 4 nodes (JVM "-Xms20g -Xmx40g”), each shard has 32 million documents and 32Gbytes in size. A 40GB heap is probably completely unnecessary for an index of that size.  Does each machine have one replica

SolrCloud performance

2018-11-02 Thread Chuming Chen
Hi All, I am running a Solr cloud 7.4 with 4 shards and 4 nodes (JVM "-Xms20g -Xmx40g”), each shard has 32 million documents and 32Gbytes in size. For a given query (I use complexphrase query), typically, the first time it took a couple of seconds to return the first 20 docs. However, for the

Re: solr cloud - hdfs folder structure best practice

2018-11-02 Thread lstusr 5u93n4
Great, thanks for the response. This is how we have it configured now, but we just had the idea the other day that maybe it would be better otherwise... And thhanks for the blog post! We ended up with basically the same config, so it's good to see that validated. Kyle On Fri, 2 Nov 2018 at

Re: solr cloud - hdfs folder structure best practice

2018-11-02 Thread Kevin Risden
I prefer a single HDFS home since it definitely simplifies things. No need to create folders for each node or anything like that if you add nodes to the cluster. The replicas underneath will get their own folders. I don't know if there are issues with autoAddReplicas or other types of failovers if

solr cloud - hdfs folder structure best practice

2018-11-02 Thread lstusr 5u93n4
Hi All, Here's a question that I can't find an answer to in the documentation: When configuring solr cloud with HDFS, is it best to: a) provide a unique hdfs folder for each solr cloud instance or b) provide the same hdfs folder to all solr cloud instances. So for example, if I have two

Re: Solr OCR Support

2018-11-02 Thread Tim Allison
+1 Thank you, Daniel. If you have any interest in helping out on TIKA-2749, please join the fun. :D On Fri, Nov 2, 2018 at 12:12 PM Davis, Daniel (NIH/NLM) [C] wrote: > > I think that you also have to process a PDF pretty deeply to decide if you > want it to be OCR. I have worked on projects

RE: Solr OCR Support

2018-11-02 Thread Davis, Daniel (NIH/NLM) [C]
I think that you also have to process a PDF pretty deeply to decide if you want it to be OCR. I have worked on projects where all of the PDFs are really like faxes - images are encoded in JBIG2 black and white or similar, and there is really one image per page, and no text. I have also

Re: TLOG replica stucks

2018-11-02 Thread Shawn Heisey
On 11/2/2018 3:12 AM, Vadim Ivanov wrote: It seems to me that issue related with: - restart solr node - rebalance leader - reload collection - reload core (Core admin is not forbidden but seems obsolete in SolrCloud) In SolrCloud, CoreAdmin is an expert option.  Many of the things that the

Re: Solr OCR Support

2018-11-02 Thread Tim Allison
OCR'ing of PDFs is fiddly at the moment because of Tika, not Solr! We have an open ticket to make it "just work", but we aren't there yet (TIKA-2749). You have to tell Tika how you want to process images from PDFs via the tika-config.xml file. You've seen this link in the links you mentioned:

Solr OCR Support

2018-11-02 Thread Furkan KAMACI
Hi All, I want to index images and pdf documents which have images into Solr. I test it with my Solr 6.3.0. I've installed tesseract at my computer (Mac). I verify that Tesseract works fine to extract text from an image. I index image into Solr but it has no content. However, as far as I know,

Re: SolrCloud Replication Failure

2018-11-02 Thread Jeremy Smith
Hi Susheel, Yes, it appears that under certain conditions, if a follower is down when the leader gets an update, the follower will not receive that update when it comes back (or maybe it receives the update and it's then overwritten by its own transaction logs, I'm not sure).

RE: TLOG replica stucks

2018-11-02 Thread Vadim Ivanov
It seems to me that issue related with: - restart solr node - rebalance leader - reload collection - reload core (Core admin is not forbidden but seems obsolete in SolrCloud) If nothing is changing in cluster state everything goes smoothly. May be it can be reproduced wit the same test as in "