Re: how to update billions of docs

2016-03-24 Thread Mohsin Beg Beg
An update on how I ended up implementing the requirement in case it helps others. There are lots of other code I did not include but the general logic is below. While performance is still not great, it is 10x faster than atomic updates ( because RealTimeGetComponent.getInputDocument() is not

Re: Indexing multiple pdf's and partial update of pdf

2016-03-24 Thread Alexandre Rafalovitch
An approach that comes to mind is to use DataImportHandler with PDF parsing being in the inner definition while indexed entity being at the parent level. The main issue is how to ensure Tika output from one PDF does not map to the same fields as from the second one. Maybe give different prefixes.

Use default field, if more specific field does not exist

2016-03-24 Thread Georg Sorst
Hi list, we use Solr to search ecommerce products. Items have a default price which can be overwritten per user. So when searching we have to return the user price if it is set, otherwise the default price. Same goes for building facets and when filtering by price. What's the best way to

Re: No live SolrServers available to handle this request

2016-03-24 Thread Elaine Cario
Anil, I've seen situations where if there was a problem with a specific query, and every shard responds with the same error, the actual exception gets hidden by a "No live SolrServers..." exception. We originally saw this with wildcard queries (when every shard reported a "too many

Re: Solr 5 and JDK 8 - awful performance

2016-03-24 Thread Mikhail Khludnev
Dragos, I wonder if you have a ScriptTransformer in your config? Just a clue SolrAdmin has Threads tab, sometimes it's possible to diagnose severe performance problem just by observing deep stacks there. On Thu, Mar 24, 2016 at 6:54 PM, Dragos Vizireanu wrote: > Hi, > > I

Re: SyntaxError - Block Join Parent Query

2016-03-24 Thread Mikhail Khludnev
On Thu, Mar 24, 2016 at 8:31 PM, Charles Sanders wrote: > I tried this on another machine with a clean index. I was able to get the > query to work. Thank you. > > Couple of related questions. > 1) I was able to get this to work on a single shard machine. But I am not > able

Re: [nesting] Any way to return the whole hierarchical structure when doing Block Join queries?

2016-03-24 Thread Mikhail Khludnev
I think you cal already kick tires and contribute a test case into https://issues.apache.org/jira/browse/SOLR-8208 that's already reachable there I believe, but I still working on core design. On Thu, Mar 24, 2016 at 10:02 PM, Alisa Z. wrote: > Hi all, > > I apologize for

Re: Performance potential for updating (reindexing) documents

2016-03-24 Thread Erick Erickson
Well, for comparison I routinely get 20K docs/second on my Mac Pro indexing Wikipedia docs. I _think_ I have 4 shards when I do this, all in the same JVM. I'd be surprised if you can't get your 5K docs/sec, but you may indeed need more shards. All that said, 4G for the JVM is kind of

Re: Overriding SolrCloud Leader Election and manually assign leadership?-Is it possible?

2016-03-24 Thread Erick Erickson
First of all, for a cluster this size the additional work a leader does is so small I suspect you'd have a hard time measuring any performance difference. Personally I wouldn't worry about it. If you insist, you can look at the collections API call REBALANCELEADERS (you have to assigned the

Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-24 Thread Jack Krupansky
Thanks, Erick, I had forgotten about that. I did find one short reference to it in the doc: "Be sure to run the Lucene IndexUpgrader included with Solr 4.10 if you might still have old 3x formatted segments in your index. Alternatively: fully optimize your index with Solr 4.10 to make sure it

Overriding SolrCloud Leader Election and manually assign leadership?-Is it possible?

2016-03-24 Thread ram
Hello, We have a setup where we have a 5 server cluster of which 3 are cloud boxes and 2 are physical boxes. We have external zookeeper setup for the same.The physical boxes have more capacity and in the past,we have seen whenever the one of the boxes is leader in solrcloud,the performance seems

[nesting] Any way to return the whole hierarchical structure when doing Block Join queries?

2016-03-24 Thread Alisa Z .
Hi all, I apologize for duplicating my previous message: Solr 5.3:  anything similar to ChildDocTransformerFactory  that does not flatten the hierarchical structure?    However, it is still an open and interesting question:  Following the example from

Re: Performance potential for updating (reindexing) documents

2016-03-24 Thread tedsolr
Hi Erick, My post was scant on details. The numbers I gave for collection sizes are projections for the future. I am in the midst of an upgrade that will be completed within a few weeks. My concern is that I may not be able to produce the throughput necessary to index an entire collection quickly

Re: SyntaxError - Block Join Parent Query

2016-03-24 Thread Charles Sanders
I tried this on another machine with a clean index. I was able to get the query to work. Thank you. Couple of related questions. 1) I was able to get this to work on a single shard machine. But I am not able to get this query to work on Solr with two shards (SorlCloud). Any reason why this

RE: Solr 5.5 Issue with CJK and mm being ignored when searching with white space.

2016-03-24 Thread Tiffany Goguen
Hi Shawn, Thank you for the reply. I removed defaultOperator parameter from the schema. I have the following in the request handler: edismax 100 I reindexed content. I am still seeing the same incorrect behavior. mm=100 does not seem to be sticking with クイック リファレンス(space

Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-24 Thread Yonik Seeley
On Thu, Mar 24, 2016 at 12:32 PM, Tomás Fernández Löbbe wrote: >> >> >> Not to mention the fact that Solr 6 is using deprecated Lucene 6 >> numeric types if those are removed in Lucene 7, then what? >> >> I believe this is going to be an issue. We have SOLR-8396 >

RE: Reload or Reload and Solr Restart

2016-03-24 Thread Matt Kuiper
Based on what I have read, it looks like only a collection reload is needed for the scenario below and for that matter for applying any modifications to the solrconfig.xml. Matt From: Matt Kuiper Sent: Wednesday, March 23, 2016 10:26 AM To: solr-user@lucene.apache.org Subject: Reload or Reload

Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-24 Thread Yago Riveiro
I did the IndexUpgrade path to upgrade my 4.x index to 5.x (15 terabytes of data an growing), It wasn't an easy task to do it without downtime, IndexUpgrade doesn't work if the replica is loaded. With 12T of data re-index is like a no-no operation (the time expended to do the re-index can

Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-24 Thread Tomás Fernández Löbbe
> > > Not to mention the fact that Solr 6 is using deprecated Lucene 6 > numeric types if those are removed in Lucene 7, then what? > > I believe this is going to be an issue. We have SOLR-8396 open, but it doesn't look like it's going to make

Re: Performance potential for updating (reindexing) documents

2016-03-24 Thread Erick Erickson
Impossible to say if for no other reason than you haven't told us how many physical machines this is spread over ;). For the process you've outlined to work, all the fields are stored, right? So why not use Atomic Updates? You still have to query the docs. About querying. If I'm reading this

Re: SolrCloud: published host/port

2016-03-24 Thread Tomás Fernández Löbbe
I believe this can be done by setting the "host" and "hostPort" elements in solr.xml. In the default solr.xml they are configured in a way to support also setting them via System properties: ${host:} ${jetty.port:8983} Tomás On Wed, Mar 23, 2016 at 11:26 PM, Hendrik Haddorp

Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-24 Thread Yonik Seeley
On Thu, Mar 24, 2016 at 12:16 PM, Erick Erickson wrote: > There's always the IndexUpgrader, one could run the 5x version against > a 4x index and have a 5x-compatible index that would then be readable > by 6x OOB. This may be the last time that will work. See the thread

Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-24 Thread Erick Erickson
There's always the IndexUpgrader, one could run the 5x version against a 4x index and have a 5x-compatible index that would then be readable by 6x OOB. A bit convoluted to be sure. Erick On Thu, Mar 24, 2016 at 8:49 AM, Yonik Seeley wrote: > On Thu, Mar 24, 2016 at 11:45 AM,

Re: Solr 5 and JDK 8 - awful performance

2016-03-24 Thread Yonik Seeley
Wow... that's pretty strange. > indexing just didn't do anything for some minutes and then started again. I wonder if it's anything to do with DNS lookups or something like that? -Yonik On Thu, Mar 24, 2016 at 11:54 AM, Dragos Vizireanu wrote: > Hi, > > I have a big

Solr 5 and JDK 8 - awful performance

2016-03-24 Thread Dragos Vizireanu
Hi, I have a big problem with the performance of running Solr 5 with JDK 8. Details: - tried with both Solr 5.4.0 and Solr 5.5.0 (even with Solr 4) - default Solr 5 configuration - created a new core, for which I am using data import handler to get data from MySQL When I am trying to index

Re: Index not fitting in memory (file-cache)

2016-03-24 Thread Toke Eskildsen
Robert Brown wrote: > Before I go out and throw more RAM into the system, in the above > example, what would you recommend? That you try to determine what causes the slow response times. Replay logged queries (thousands of queries, not just a few) and see if the pauses

Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-24 Thread Yonik Seeley
On Thu, Mar 24, 2016 at 11:38 AM, Bram Van Dam wrote: > On 23/03/16 15:50, Yonik Seeley wrote: >> Kind of a unique situation for a dot-oh release, but from the Solr >> perspective, 6.0 should have *fewer* bugs than 5.5 (for those features >> in 5.5 at least)... we've been

Re: SyntaxError - Block Join Parent Query

2016-03-24 Thread Mikhail Khludnev
I suggest to add debugQuery=true and fl=*,[child ...] doc transformer. And come back with response. On Thu, Mar 24, 2016 at 3:23 PM, Charles Sanders wrote: > Ah yes. Thank you. Made the correction and I do not get the SyntaxError. > However, it does not apply the child

Is dataimporter.functions.escapeSql() functional ?

2016-03-24 Thread Joachim DORNBUSCH
Hi, I wonder if the function ${dataimporter.functions.escapeSql()} is available in Solr 5.3.1. Whenever i use it in my data import handlers, Solr replaces '${dataimporter.functions.escapeSql(field)}' by '' (an empty string). How can I escape strings when building sql queries in DIHconfigFile

Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-24 Thread Yonik Seeley
On Thu, Mar 24, 2016 at 11:45 AM, Yonik Seeley wrote: >> I've been led to understand that 6.X (at least the Lucene part?) won't >> be backwards compatible with 4.X data. 5.5 at least works fine with data >> files from 4.7, for instance. It really doesn't seem like much changed

Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-24 Thread Jack Krupansky
Does anybody know if we have doc on the recommended process for upgrading data after upgrading Solr? Sure the upgraded version will work fine with that old data, but unless the data is upgraded, the user can't then upgrade to the next major release after that. This is a case in point - the user is

Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-24 Thread Bram Van Dam
On 23/03/16 15:50, Yonik Seeley wrote: > Kind of a unique situation for a dot-oh release, but from the Solr > perspective, 6.0 should have *fewer* bugs than 5.5 (for those features > in 5.5 at least)... we've been squashing a bunch of docValue related > issues. I've been led to understand that

Re: Solr 5.5.0: JVM args warning in console logfile.

2016-03-24 Thread Bram Van Dam
> When I made the change outlined in the patch on SOLR-8145 to my bin/solr > script, the warning disappeared. That was not the intended effect of > the patch, but I'm glad to have the mystery solved. > > Thank you for mentioning the problem so we could track it down. You're welcome. And thanks

RE: Indexing multiple pdf's and partial update of pdf

2016-03-24 Thread Jay Parashar
Thanks Reth, Yes I am using Apache Tike and went by the instructions given in https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika Here I see we can index a pdf " solr-word.pdf" to a document with unique key = "doc1" as below curl

Performance potential for updating (reindexing) documents

2016-03-24 Thread tedsolr
With a properly tuned solr cloud infrastructure and less than 1B total docs spread out over 50 collections where the largest collection is 100M docs, what is a reasonable target goal for entirely reindexing a single collection? I understand there are a lot of variables, so I'm hypothetically

Re: Index not fitting in memory (file-cache)

2016-03-24 Thread Robert Brown
Thanks Shawn, One of my indexes is 70G on disk but only has 25G RAM, usually it's fast as hell, less than 0.5s for a full API wrapped call, but we do occasionally see searches taking 2.5 seconds. I'm currently shuffling VMs around to increase the RAM, good to hear this may solve those

Re: SolrCloud: published host/port

2016-03-24 Thread Shawn Heisey
On 3/24/2016 12:26 AM, Hendrik Haddorp wrote: > is it possible to instruct Solr to publish a different host/port into > ZooKeeper then it is actually running on? This is required if the Solr > node is not directly reachable on its port from outside due to a NAT > setup or when running Solr as a

Re: Index not fitting in memory (file-cache)

2016-03-24 Thread Shawn Heisey
On 3/24/2016 4:02 AM, Robert Brown wrote: > If my index data directory size is 70G, and I don't have 70G (plus > heap, etc) in the system, this will occasionally affect search speed > right? When Solr has to resort to reading from disk? > > Before I go out and throw more RAM into the system, in

Re: SolrJ Indexing

2016-03-24 Thread Shawn Heisey
On 3/24/2016 4:06 AM, fabigol wrote: > I know doint that for DIH but with solrJ i don't know. Must i use the > annotations as @Field...? > > Moreover, i create a new project solr with the same XML Files - copy conf > directory - and oddly the Indexing is much faster and not a little 100 time >

Re: SyntaxError - Block Join Parent Query

2016-03-24 Thread Charles Sanders
Ah yes. Thank you. Made the correction and I do not get the SyntaxError. However, it does not apply the child filter. The query should return only TestParent4. But it is returning TestParent2, TestParent3 and TestParent4. All of these meet the parent portion of the query (+blue). But only

Re: Issue With Manual Lock

2016-03-24 Thread Reth RM
Hi Salman, The index lock error is generally reported when 2 cores are trying to share an index directory between more than one core or Solr instance. Please check if there are more than one of those cores pointing to same data directory. You can see dir path on "overview" tab admin page. On

Re: SolrJ Indexing

2016-03-24 Thread fabigol
Hi Shawn thank for your response. Like i can see in my XML file i have many enties which are linked between it. I know doint that for DIH but with solrJ i don't know. Must i use the annotations as @Field...? Moreover, i create a new project solr with the same XML Files - copy conf directory - and

Index not fitting in memory (file-cache)

2016-03-24 Thread Robert Brown
Hi, If my index data directory size is 70G, and I don't have 70G (plus heap, etc) in the system, this will occasionally affect search speed right? When Solr has to resort to reading from disk? Before I go out and throw more RAM into the system, in the above example, what would you

SolrCloud: published host/port

2016-03-24 Thread Hendrik Haddorp
Hi, is it possible to instruct Solr to publish a different host/port into ZooKeeper then it is actually running on? This is required if the Solr node is not directly reachable on its port from outside due to a NAT setup or when running Solr as a Docker container with a mapped port. For what its

Re: Merge two Solr documents into One

2016-03-24 Thread Alexandre Rafalovitch
You might be able to apply XSLT spreadsheet on incoming XML document to merge them. But doing it outside of Solr is probably better. On 23 Mar 2016 11:13 pm, "solr2020" wrote: > Hi, > > I have 2-3 Solr documents but i would like to merge all these into one > document while