Query Performance

2015-07-21 Thread Nagasharath
Any recommended tool to test the query performance would be of great help. Thanks

Migrating junit tests from Solr 4.5.1 to Solr 5.2.1

2015-07-21 Thread Rich Hume
I am migrating from Solr 4.5.1 to Solr 5.2.1 on a Windows platform. I am using multi-core, but not Solr cloud. I am having issues with my suite of junit tests. My tests currently use code I found in SOLR-4502. I was wondering whether anyone could point me at best-practice examples of

Re: Use REST API URL to update field

2015-07-21 Thread Zheng Lin Edwin Yeo
Ok. Thanks for your advice. Regards, Edwin On 21 July 2015 at 15:37, Upayavira u...@odoko.co.uk wrote: curl is just a command line HTTP client. You can use HTTP POST to send the JSON that you are mentioning below via any means that works for you - the file does not need to exist on disk - it

Re: Query Performance

2015-07-21 Thread Nagasharath
I tried using SolrMeter but for some reason it does not detect my url and throws solr server exception Sent from my iPhone On 21-Jul-2015, at 10:58 am, Alessandro Benedetti benedetti.ale...@gmail.com wrote: SolrMeter mate, http://code.google.com/p/solrmeter/ Take a look, it will

Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-21 Thread Alessandro Benedetti
Hi Mese, let me try to answer to your 2 questions : 1. What happens if a shard(both leader and replica) goes down. If the document on the dead shard is updated, will it forward the document to the new shard. If so, when the dead shard comes up again, will this not be considered for the same

Re: Data Import Handler Stays Idle

2015-07-21 Thread Paden
Okay. I'm going to run the index again with specifications that you recommended. This could take a few hours but I will post the entire trace on that error when it pops up again and I will let you guys know the results of increasing the heap size. -- View this message in context:

Re: Data Import Handler Stays Idle

2015-07-21 Thread Paden
Hey shawn when I use the -m 2g command in my script I get the error a 'cannot open [path]/server/logs/solr.log for reading: No such file or directory' I do not see how this would affect that. -- View this message in context:

Re: Query Performance

2015-07-21 Thread Alessandro Benedetti
SolrMeter mate, http://code.google.com/p/solrmeter/ Take a look, it will help you a lot ! Cheers 2015-07-21 16:49 GMT+01:00 Nagasharath sharathrayap...@gmail.com: Any recommended tool to test the query performance would be of great help. Thanks -- -- Benedetti

Re: SOLR nrt read writes

2015-07-21 Thread Alessandro Benedetti
Could this be due to caching? I have tried to disable all in my solrconfig. If you mean Solr caches ? NO . Solr caches live the life of the searcher. So new searcher, new caches ( possibly warmed with updated results) . If you mean your application caching or browser caching, you should

Re: solr blocking and client timeout issue

2015-07-21 Thread Jeremy Ashcraft
I did find a dark corner of our application that a dev had left some experimental code in that snuck past QA, because it was rarely used. A client discovered and was using it heavily over the past week. It was generating multiple consecutive update/commit requests. Its been disabled and the

upgrade clusterstate.json fom 4.10.4 to split state.json in 5.2.1

2015-07-21 Thread Yago Riveiro
Hi, How can I upgrade the clusterstate.json to be split by collection? I read this issue https://issues.apache.org/jira/browse/SOLR-5473. In theory exists a param “stateFormat” that configured to 2 says to use the /collections/collection/cluster.son format. Where can I configure this?

Parsing and indexing parts of the input file paths

2015-07-21 Thread Andrew Musselman
Dear user and dev lists, We are loading files from a directory and would like to index a portion of each file path as a field as well as the text inside the file. E.g., on HDFS we have this file path: /user/andrew/1234/1234/file.pdf And we would like the 1234 token parsed from the file path

issue with query boost using qf and edismax

2015-07-21 Thread sandeep bonkra
Hi, I am implementing searching using SOLR 5.0 and facing very strange problem. I am having 4 fields Name and address, city and state in the document apart from a unique ID. My requirement is that it should give me those results first where there is a match in name , then address, then state,

Running SolrJ from Solr's REST API

2015-07-21 Thread Zheng Lin Edwin Yeo
Hi, Would like to check, as I've created a SorJ program and exported it as an Runnable JAR, how do I integrate it together with Solr so that I can call this JAR directly from Solr's REST API? Currently I can only run it on command prompt using the command java -jar solrj.jar I'm using Solr

Re: Performance of facet contain search in 5.2.1

2015-07-21 Thread Erick Erickson
contains has to basically examine each and every term to see if it matches. Say my facet.contains=bbb. A matching term could be aaabbbxyz or zzzbbbxyz So there's no way to _know_ when you've found them all without examining every last one. So I'd try to redefine the problem to not require that.

Re: Tips for faster indexing

2015-07-21 Thread Vineeth Dasaraju
Hi, Thank You Erick for your inputs. I tried creating batches of 1000 objects and indexing it to solr. The performance is way better than before but I find that number of indexed documents that is shown in the dashboard is lesser than the number of documents that I had actually indexed through

IntelliJ setup

2015-07-21 Thread Andrew Musselman
I followed the instructions here https://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ, including `ant idea`, but I'm still not getting the links in solr classes and methods; do I need to add libraries, or am I missing something else? Thanks!

Re: Parsing and indexing parts of the input file paths

2015-07-21 Thread Upayavira
Keeping to the user list (the right place for this question). More information is needed here - how are you getting these documents into Solr? Are you posting them to /update/extract? Or using DIH, or? Upayavira On Tue, Jul 21, 2015, at 06:31 PM, Andrew Musselman wrote: Dear user and dev

Re: Parsing and indexing parts of the input file paths

2015-07-21 Thread Andrew Musselman
I'm not sure, it's a remote team but will get more info. For now, assuming that a certain directory is specified, like /user/andrew/, and a regex is applied to capture anything two directories below matching */*/*.pdf. Would there be a way to capture the wild-carded values and index them as

Re: Parsing and indexing parts of the input file paths

2015-07-21 Thread Andrew Musselman
Which can only happen if I post it to a web service, and won't happen if I do it through config? On Tue, Jul 21, 2015 at 2:19 PM, Upayavira u...@odoko.co.uk wrote: yes, unless it has been added consciously as a separate field. On Tue, Jul 21, 2015, at 09:40 PM, Andrew Musselman wrote:

Re: Parsing and indexing parts of the input file paths

2015-07-21 Thread Upayavira
Solr generally does not interact with the file system in that way (with the exception of the DIH). It is the job of the code that pushes a file to Solr to process the filename and send that along with the request. See here for more info:

Re: Tips for faster indexing

2015-07-21 Thread Upayavira
Are you making sure that every document has a unique ID? Index into an empty Solr, then look at your maxdocs vs numdocs. If they are different (maxdocs is higher) then some of your documents have been deleted, meaning some were overwritten. That might be a place to look. Upayavira On Tue, Jul

Re: Parsing and indexing parts of the input file paths

2015-07-21 Thread Andrew Musselman
Thanks, so by the time we would get to an Analyzer the file path is forgotten? https://cwiki.apache.org/confluence/display/solr/Analyzers On Tue, Jul 21, 2015 at 1:27 PM, Upayavira u...@odoko.co.uk wrote: Solr generally does not interact with the file system in that way (with the exception of

Re: Tips for faster indexing

2015-07-21 Thread solr . user . 1507
I can confirm this behavior, seen when sending json docs in batch, never happens when sending one by one, but sporadic when sending batches. Like if sole/jetty drops couple of documents out of the batch. Regards On 21 Jul 2015, at 21:38, Vineeth Dasaraju vineeth.ii...@gmail.com wrote: Hi,

Re: Issue with using createNodeSet in Solr Cloud

2015-07-21 Thread Savvas Andreas Moysidis
Ah, nice tip, thanks! This could also make scripts more portable too. Cheers, Savvas On 21 July 2015 at 08:40, Upayavira u...@odoko.co.uk wrote: Note, when you start up the instances, you can pass in a hostname to use instead of the IP address. If you are using bin/solr (which you should

Re: Parsing and indexing parts of the input file paths

2015-07-21 Thread Upayavira
yes, unless it has been added consciously as a separate field. On Tue, Jul 21, 2015, at 09:40 PM, Andrew Musselman wrote: Thanks, so by the time we would get to an Analyzer the file path is forgotten? https://cwiki.apache.org/confluence/display/solr/Analyzers On Tue, Jul 21, 2015 at 1:27

Re: Tips for faster indexing

2015-07-21 Thread Vineeth Dasaraju
Hi Upayavira, I guess that is the problem. I am currently using a function for generating an ID. It takes the current date and time to milliseconds and generates the id. This is the function. public static String generateID(){ Date dNow = new Date(); SimpleDateFormat ft = new

Re: Tips for faster indexing

2015-07-21 Thread Fadi Mohsen
In Java: UUID.randomUUID(); That is what I'm using. Regards On 21 Jul 2015, at 22:38, Vineeth Dasaraju vineeth.ii...@gmail.com wrote: Hi Upayavira, I guess that is the problem. I am currently using a function for generating an ID. It takes the current date and time to milliseconds and

Re: IntelliJ setup

2015-07-21 Thread Konstantin Gribov
Try invalidate caches and restart in IDEA, remove .idea directory in lucene-solr dir. After that run ant idea and re-open project. Also, you have to, at least, close project, run ant idea and re-open it if switching between too diverged branches (e.g., 4.10 and 5_x). вт, 21 июля 2015 г. в 21:53,

Re: IntelliJ setup

2015-07-21 Thread Andrew Musselman
Bingo, thanks! On Tue, Jul 21, 2015 at 4:12 PM, Konstantin Gribov gros...@gmail.com wrote: Try invalidate caches and restart in IDEA, remove .idea directory in lucene-solr dir. After that run ant idea and re-open project. Also, you have to, at least, close project, run ant idea and re-open

Re: WordDelimiterFilter Leading Trailing Special Character

2015-07-21 Thread Sathiya N Sundararajan
Upayavira, thanks for the helpful suggestion, that works. I was looking for an option to turn off/circumvent that particular WordDelimiterFilter's behavior completely. Since our indexes are hundred's of Terabytes, every time we find a term that needs to be added, it will be a cumbersome process

Re: WordDelimiterFilter Leading Trailing Special Character

2015-07-21 Thread Jack Krupansky
You can also use the types attribute to change the type of specific characters, such as to treat the ! or as an ALPHA. -- Jack Krupansky On Tue, Jul 21, 2015 at 7:43 PM, Sathiya N Sundararajan ausat...@gmail.com wrote: Upayavira, thanks for the helpful suggestion, that works. I was looking

Re: Issue with using createNodeSet in Solr Cloud

2015-07-21 Thread Upayavira
Note, when you start up the instances, you can pass in a hostname to use instead of the IP address. If you are using bin/solr (which you should be!!) then you can use bin/solr -h my-host-name and that'll be used in place of the IP. Upayavira On Tue, Jul 21, 2015, at 05:45 AM, Erick Erickson

Re: SOLR nrt read writes

2015-07-21 Thread Upayavira
Bhawna, I think you need to reconcile yourself to the fact that what you want to achieve is not going to be possible. Solr (and Lucene underneath it) is HEAVILY optimised for high read/low write situations, and that leads to some latency in content reaching the index. If you wanted to change

Re: WordDelimiterFilter Leading Trailing Special Character

2015-07-21 Thread Upayavira
Looking at the javadoc for the WordDelimiterFilterFactory, it suggests this config: fieldType name=text_wd class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory

Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-21 Thread Upayavira
I suspect you can delete a document from the wrong shard by using update?distrib=false. I also suspect there are people here who would like to help you debug this, because it has been reported before, but we haven't yet been able to see whether it occurred due to human or software error.

Re: solr blocking and client timeout issue

2015-07-21 Thread Daniel Collins
We have a similar situation: production runs Java 7u10 (yes, we know its old!), and has custom GC options (G1 works well for us), and a 40Gb heap. We are a heavy user of NRT (sub-second soft-commits!), so that may be the common factor here. Every time we have tried a later Java 7 or Java 8, the

Performance of facet contain search in 5.2.1

2015-07-21 Thread Lo Dave
I found that facet contain search take much longer time than facet prefix search. Do anyone have idea how to make contain search faster? org.apache.solr.core.SolrCore; [concordance] webapp=/solr path=/select

Re: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field content

2015-07-21 Thread Ali Nazemian
Dear Erick, I found another thing, I did check the number of unique terms for this field using schema browser, It reported 1683404 number of terms! Does it exceed the maximum number of unique terms for fcs facet method? I read somewhere it should be more than 16m does it true?! Best regards. On

Re: Performance of facet contain search in 5.2.1

2015-07-21 Thread Alessandro Benedetti
Hi Dave, generally giving terms in a dictionary, it's much more efficient to run prefix queries than contain queries. Talking about using docValues, if I remember well when they are loaded in memory they are skipList, so you can use two operators on them : - next() that simply gives you ht next

RE: Programmatically find out if node is overseer

2015-07-21 Thread Markus Jelsma
Hello - this approach not only solves the problem but also allows me to run different processing threads on other nodes. Thanks! Markus -Original message- From:Chris Hostetter hossman_luc...@fucit.org Sent: Saturday 18th July 2015 1:00 To: solr-user solr-user@lucene.apache.org

Re: Use REST API URL to update field

2015-07-21 Thread Upayavira
curl is just a command line HTTP client. You can use HTTP POST to send the JSON that you are mentioning below via any means that works for you - the file does not need to exist on disk - it just needs to be added to the body of the POST request. I'd say review how to do HTTP POST requests from

Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-21 Thread mesenthil1
Unable to delete by passing distrib=false as well. Also it is difficult to identify those duplicate documents among the 130 million. Is there a way we can see the generated hash key and mapping them to the specific shard? -- View this message in context:

RE: Solr Cloud: Duplicate documents in multiple shards

2015-07-21 Thread Reitzel, Charles
When are you generating the UUID exactly? If you set the unique ID field on an update, and it contains a new UUID, you have effectively created a new document. Just a thought. -Original Message- From: mesenthil1 [mailto:senthilkumar.arumu...@viacomcontractor.com] Sent: Tuesday,

RE: Solr Cloud: Duplicate documents in multiple shards

2015-07-21 Thread Reitzel, Charles
Also, the function used to generate hashes is org.apache.solr.common.util.Hash.murmurhash3_x86_32(), which produces a 32-bit value. The range of the hash values assigned to each shard are resident in Zookeeper. Since you are using only a single hash component, all 32-bits will be used by

Re: Installing Banana on Solr 5.2.1

2015-07-21 Thread Upayavira
On Tue, Jul 21, 2015, at 02:00 AM, Shawn Heisey wrote: On 7/20/2015 5:45 PM, Vineeth Dasaraju wrote: I am trying to install Banana on top of solr but haven't been able to do so. All the procedures that I get are for an earlier version of solr. Since the directory structure has changed in

Re: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field content

2015-07-21 Thread Yonik Seeley
On Tue, Jul 21, 2015 at 3:09 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear Erick, I found another thing, I did check the number of unique terms for this field using schema browser, It reported 1683404 number of terms! Does it exceed the maximum number of unique terms for fcs facet method?

Re: Data Import Handler Stays Idle

2015-07-21 Thread Paden
There are some zip files inside the directory and have been addressed to in the database. I'm thinking those are the one's it's jumping right over. They are not the issue. At least I'm 95% sure. And Shawn if you're still watching I'm sorry I'm using solr-5.1.0. -- View this message in context:

Re: Data Import Handler Stays Idle

2015-07-21 Thread Shawn Heisey
On 7/21/2015 8:17 AM, Paden wrote: There are some zip files inside the directory and have been addressed to in the database. I'm thinking those are the one's it's jumping right over. They are not the issue. At least I'm 95% sure. And Shawn if you're still watching I'm sorry I'm using