Re: Document Update performances Improvement

2019-10-23 Thread Jörn Franke
Well coalesce does require shuffle and network, however in most cases it is less than repartition as it moves the data (through the network) to already existing executors. However as you see and others confirm: for high peformance you don’t need high parallelism on the ingestion side, but you ca

Re: tlogs are not deleted

2019-10-23 Thread Erick Erickson
Why it’s enabled by default: Really it shouldn’t be. Raise a JIRA? Why it’s there in the first place: It’s a leftover from before there was the “full sync” capability so you could intentionally queue up the updates while performing maintenance on the target cluster. Not great reasons, but…. >

Re: Document Update performances Improvement

2019-10-23 Thread Erick Erickson
My first question is always “what’s the bottleneck”? Unless you’re driving your CPUs and/or I/O hard on Solr, the bottleneck is in the acquisition of the docs not on the Solr side. Also, be sure and batch in groups of at least 10x the number of shards, see: https://lucidworks.com/post/really-ba

Re: regarding Extracting text from Images

2019-10-23 Thread Erick Erickson
Here’s a blog about why and how to use Tika outside Solr (and an RDBMS too, but you can pull that part out pretty easily): https://lucidworks.com/post/indexing-with-solrj/ > On Oct 23, 2019, at 7:16 PM, Alexandre Rafalovitch wrote: > > Again, I think you are best to do it out of Solr. > > Bu

Re: regarding Extracting text from Images

2019-10-23 Thread Alexandre Rafalovitch
Again, I think you are best to do it out of Solr. But even of you want to get it to work in Solr, I think you start by getting it to work directly in Tika. Then, get the missing libraries and configuration into Solr. Regards, Alex On Wed, Oct 23, 2019, 7:08 PM suresh pendap, wrote: > Hi Al

Re: regarding Extracting text from Images

2019-10-23 Thread suresh pendap
Hi Alex, Thanks for your reply. How do we integrate tesseract with Solr? Do we have to implement Custom update processor or extend the ExtractingRequestProcessor? Regards Suresh On Wed, Oct 23, 2019 at 11:21 AM Alexandre Rafalovitch wrote: > I believe Tika that powers this can do so with extra

How to update range of dynamic fields in Solr

2019-10-23 Thread Arnold Bronley
Here is the detailed question in stack-overflow. Please help. https://stackoverflow.com/questions/14280506/how-to-update-range-of-dynamic-fields-in-solr-4

Re: copyField - why source should contain * when dest contains *?

2019-10-23 Thread Chris Hostetter
: Documentation says that we can copy multiple fields using wildcard to one : or more than one fields. correct ... the limitation is in the syntax and the ambiguity that would be unresolvable if you had a wildcard in the dest but not in the source. the wildcard is essentially a variable. if

Re: Document Update performances Improvement

2019-10-23 Thread Nicolas Paris
> With Spark-Solr additional complexity comes. You could have too many > executors for your Solr instance(s), ie a too high parallelism. I have been reducing the parallelism of spark-solr part by 5. I had 40 executors loading 4 shards. Right now only 8 executors loading 4 shards. As a result, I ca

Re: WordDelimiter in extended way.

2019-10-23 Thread servus01
got it, thank you -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Document Update performances Improvement

2019-10-23 Thread Nicolas Paris
> Set the first two to the same number, and the third to a minumum of three > times what you set the other two. > When I built a Solr setup, I increased maxMergeAtOnce and segmentsPerTier to > 35, and maxMergeAtOnceExplicit to 105. This made merging happen a lot less > frequently. Good to know th

Re: Document Update performances Improvement

2019-10-23 Thread Nicolas Paris
> . > Thanks for those relevant pointers and the explanation. > How often do you commit? Are yo

Re: regarding Extracting text from Images

2019-10-23 Thread Alexandre Rafalovitch
I believe Tika that powers this can do so with extra libraries (tesseract?) But Solr does not bundle those extras. In any case, you may want to run Tika externally to avoid the conversion/extraction process be a burden to Solr itself. Regards, Alex On Wed, Oct 23, 2019, 1:58 PM suresh penda

regarding Extracting text from Images

2019-10-23 Thread suresh pendap
Hello, I am reading the Solr documentation about integration with Tika and Solr Cell framework over here https://lucene.apache.org/solr/guide/6_6/uploading-data-with-solr-cell-using-apache-tika.html I would like to know if the can Solr Cell framework also be used to extract text from the image fil

RE: tlogs are not deleted

2019-10-23 Thread Webster Homer
Tlogs will accumulate if you have buffers "enabled". Make sure that you explicitly disable buffering from the cdcr endpoint https://lucene.apache.org/solr/guide/7_7/cdcr-api.html#disablebuffer Make sure that they're disabled on both the source and targets I believe that sometimes buffers get enab

Re: WordDelimiter in extended way.

2019-10-23 Thread Shawn Heisey
On 10/23/2019 9:41 AM, servus01 wrote: Hey, thank you for helping me: Thanks in advanced for any help, really appriciate. It is not the WordDelimiter filter th

Re: Solr Prod stopped yesterday - saya "insufficient memory for the Java Runtime Environment"

2019-10-23 Thread Shawn Heisey
On 10/23/2019 9:08 AM, Vignan Malyala wrote: Ok. I have around 500 cores in my solr. So, how much heap I should allocate in solr and jvm? (Currently as I see, in solr.in.sh shows heap as - Xms 20g -Xmx 20g. And my system jvm heap shows -Xms 528m -Xmx 8g. I've re-checked it.) We have no way of

Re: WordDelimiter in extended way.

2019-10-23 Thread servus01
Hey, thank you for helping me: Thanks in advanced for any help, really appriciate. -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.h

Re: Solr Prod stopped yesterday - saya "insufficient memory for the Java Runtime Environment"

2019-10-23 Thread Vignan Malyala
Ok. I have around 500 cores in my solr. So, how much heap I should allocate in solr and jvm? (Currently as I see, in solr.in.sh shows heap as - Xms 20g -Xmx 20g. And my system jvm heap shows -Xms 528m -Xmx 8g. I've re-checked it.) On Wed 23 Oct, 2019, 7:52 PM Shawn Heisey, wrote: > On 10/23/20

Re: WordDelimiter in extended way.

2019-10-23 Thread Shawn Heisey
On 10/23/2019 7:43 AM, servus01 wrote: Now Solr behaves in such a way that on the one hand the hyphens which have a blank before and after are not indexed and also the search as soon as blank - blank is searched does not return any results. With the WordDelimiter I have already covered the cases

Re: Solr Paryload example

2019-10-23 Thread Vincenzo D'Amore
Hi Erick, yes, absolutely, it's a great pleasure for me contribute. On Wed, Oct 23, 2019 at 2:25 PM Erick Erickson wrote: > Bookmarked. Do you intend that this should be incorporated into Solr? If > so, please raise a JIRA and link your PR in…. > > Thanks! > Erick > > > On Oct 22, 2019, at 6:56

Re: Solr Prod stopped yesterday - saya "insufficient memory for the Java Runtime Environment"

2019-10-23 Thread Shawn Heisey
On 10/23/2019 4:09 AM, Vignan Malyala wrote: *Solr prod stopped yesterday. How to prevent this.* Solr heap info is : -Xms20g -Xmx20g JVM Heap info. : -Xms528m -Xmx8g There is no such thing as a Solr heap separate from the JVM heap. There are multiple environment variables that can specify t

Re: Document Update performances Improvement

2019-10-23 Thread Shawn Heisey
On 10/22/2019 1:12 PM, Nicolas Paris wrote: We, at Auto-Suggest, also do atomic updates daily and specifically changing merge factor gave us a boost of ~4x Interesting. What kind of change exactly on the merge factor side ? The mergeFactor setting is deprecated. Instead, use maxMergeAtOnce,

WordDelimiter in extended way.

2019-10-23 Thread servus01
Hello, maybe somebody can help me out. We have a lot of datasets that are always built according to the same scheme: Expression - Expression as an example: "CCF *HD - 2nd* BL 2019-2020 1st matchday VfL Osnabrück vs. 1st FC Heidenheim 1846 | 1st HZ without WZ" or "Scouting Feed *mp4 - 2.* BL

Re: Solr 7.2.1 - Performance recommendation needed

2019-10-23 Thread Erick Erickson
I suggest you use something like GCViewer to analyze it and report on what you see and any specific questions you have. Erick > On Oct 22, 2019, at 7:23 PM, saravanamanoj wrote: > > Thanks Erick, > > Below is the link for our GC report when the incident happened. > > https://nam03.safelinks.

Re: Solr Prod stopped yesterday - saya "insufficient memory for the Java Runtime Environment"

2019-10-23 Thread Paras Lehana
Hi Vignan, I see this setting quite strange. Also, if this is the case, you have allocated Solr more memory even than maximum allowed for total JVM. Solr heap is a subset of JVM heap - do note that! On Wed, 23 Oct 2019 at 16:41, Vincenzo D'Amore wrote: > Hi, > > I see this setting quite stran

Re: copyField - why source should contain * when dest contains *?

2019-10-23 Thread Paras Lehana
Hey Erick, Thanks for addressing. Copyfields are intended to copy exactly one field in the input into exactly > one field in the destination, not multiple ones at the same time. Documentation says that we can copy multiple fields using wildcard to one or more than one fields. Remember that S

Re: Query on changing FieldType

2019-10-23 Thread Erick Erickson
Really, just don’t do this. Please. As others have pointed out, it may look like it works, but it won’t. I’ve spent many hours tracking down why clients got weird errors after making changes like this, sometimes weeks later. Or more accurately, if you choose to change field types without reindex

Re: Solr Paryload example

2019-10-23 Thread Erick Erickson
Bookmarked. Do you intend that this should be incorporated into Solr? If so, please raise a JIRA and link your PR in…. Thanks! Erick > On Oct 22, 2019, at 6:56 PM, Vincenzo D'Amore wrote: > > Hi all, > > this evening I had some spare hour to spend in order to put everything > together in a re

Re: tlogs are not deleted

2019-10-23 Thread Erick Erickson
My first guess is that your CDCR setup isn’t running. CDCR uses tlogs as a queueing mechanism. If CDCR can’t send docs to the target collection, they’ll accumulate forever. Best, Erick > On Oct 22, 2019, at 7:48 PM, Woo Choi wrote: > > Hi, > > We are using solr 7.7 cloud with CDCR(every coll

Re: copyField - why source should contain * when dest contains *?

2019-10-23 Thread Erick Erickson
So how would that work? Copyfields are intended to copy exactly one field in the input into exactly one field in the destination, not multiple ones at the same time. If you need to do that, define multiple copyField directives. I don’t even see how that would work. . Remember that Solr is also

copyField - why source should contain * when dest contains *?

2019-10-23 Thread Paras Lehana
Hi Community, I was just going through *Solr Ref Guide 8.1* from scratch and I was reading about* copyFields *. We have been working on copyFields in 6.6 since a year. I just wanted to refresh what we know and what we should before we u

Re: Solr Prod stopped yesterday - saya "insufficient memory for the Java Runtime Environment"

2019-10-23 Thread Vincenzo D'Amore
Hi, I see this setting quite strange: Solr heap info is : -Xms20g -Xmx20g JVM Heap info. : -Xms528m -Xmx8g “Usually” Solr runs inside the jvm and you can have only one of these settings really active. I suggest to double check your memory configuration. Ciao, Vincenzo -- skype: free.dev >

Solr Prod stopped yesterday - saya "insufficient memory for the Java Runtime Environment"

2019-10-23 Thread Vignan Malyala
*Solr prod stopped yesterday. How to prevent this.* Solr heap info is : -Xms20g -Xmx20g JVM Heap info. : -Xms528m -Xmx8g Physical Ram - 32GB Solr version - 6.6.1 Swap memory - 8g *hc_err_pid.log got created with following info in it:* # # There is insufficient memory for the Java Runtime Environ

Re: Query on changing FieldType

2019-10-23 Thread Emir Arnautović
Hi Shubham, My guess that it might be working for text because it uses o.toString() so there are no runtime errors while in case of others, it has to assume some class so it does class casting. You can check in logs what sort of error happens. But in any case, like Jason pointed out, that is a p