Lucene test framework documentation?

2015-01-08 Thread TK Solr
Is there any good document about Lucene Test Framework? I can only find API docs. Mimicking the unit test I've found in Lucene trunk, I tried to write a unit test that tests a TokenFilter I am writing. But it is failing with an error message like: java.lang.AssertionError: close() called in wrong

Re: UUIDUpdateProcessorFactory causes repeated documents when uploading csv files?

2015-01-08 Thread jia gu
Problem solved - it's caused by a system outside of Solr. Thank you all for the prompt replies! :) On Thu, Jan 8, 2015 at 12:40 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Thank you for your reply Chris :) Solr is producing the correct result on : its own. The problem is that I am

Re: Lucene test framework documentation?

2015-01-08 Thread Alexandre Rafalovitch
(semi-relevant aside) We do happen to ship this test framework with Solr distribution (in dist/test-framework). Why, I don't know! Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 8 January 2015 at 23:23, Shawn Heisey apa...@elyograg.org wrote:

Re: Lucene test framework documentation?

2015-01-08 Thread Shawn Heisey
On 1/8/2015 8:31 PM, TK Solr wrote: Is there any good document about Lucene Test Framework? I can only find API docs. Mimicking the unit test I've found in Lucene trunk, I tried to write a unit test that tests a TokenFilter I am writing. But it is failing with an error message like:

GC tuning question - can improving GC pauses cause indexing to slow down?

2015-01-08 Thread Shawn Heisey
Is it possible that tuning garbage collection to achieve much better pause characteristics might actually *decrease* index performance? Rebuilds that I did while still using a tuned CMS config would take between 5.5 and 6 hours, sometimes going slightly over 6 hours. A rebuild that I did

Re: GC tuning question - can improving GC pauses cause indexing to slow down?

2015-01-08 Thread Boogie Shafer
In the abstract, it sounds like you are seeing the difference between tuning for latency vs tuning for throughput My hunch would be you are seeing more (albeit individually quicker) GC events with your new settings during the rebuild I imagine that in most cases a solr rebuild is relatively

Re: GC tuning question - can improving GC pauses cause indexing to slow down?

2015-01-08 Thread Walter Underwood
I would not be surprised at all. Optimizing for minimum pauses usually increases overhead that decreases overall throughput. This is a pretty common tradeoff. For maximum throughput, when you don’t care about pauses, the simplest non-concurrent GC is often the best. That might be the right

Re: GC tuning question - can improving GC pauses cause indexing to slow down?

2015-01-08 Thread Shawn Heisey
On 1/8/2015 11:05 PM, Boogie Shafer wrote: In the abstract, it sounds like you are seeing the difference between tuning for latency vs tuning for throughput My hunch would be you are seeing more (albeit individually quicker) GC events with your new settings during the rebuild I imagine

Re: How to return child documents with parent

2015-01-08 Thread Mikhail Khludnev
Did you check [child] at https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents ? On Thu, Jan 8, 2015 at 5:53 PM, yliu y...@mathworks.com wrote: Hi, What is the best way to return both parent document and child documents in one query? I used SolrJ to create a

Re: Determining the Number of Solr Shards

2015-01-08 Thread Nishanth S
Thanks guys for your inputs I would be looking at around 100 Tb of total index size with 5100 million documents for a period of 30 days before we purge the indexes.I had estimated it slightly on the higher side of things but that's where I feel we would be. Thanks, Nishanth On Wed, Jan 7,

Re: Determining the Number of Solr Shards

2015-01-08 Thread Jack Krupansky
My final advice would be my standard proof of concept implementation advice - test a configuration with 10% (or 5%) of the target data size and 10% (or 5%) of the estimated resource requirements (maybe 25% of the estimated RAM) and see how well it performs. Take the actual index size and multiply

Re: How large is your solr index?

2015-01-08 Thread Bram Van Dam
On 01/07/2015 05:42 PM, Erick Erickson wrote: True, and you can do this if you take explicit control of the document routing, but... that's quite tricky. You forever after have to send any _updates_ to the same shard you did the first time, whereas SPLITSHARD will do the right thing. Hmm. That

Re: How large is your solr index?

2015-01-08 Thread Toke Eskildsen
On Wed, 2015-01-07 at 22:26 +0100, Joseph Obernberger wrote: Thank you Toke - yes - the data is indexed throughout the day. We are handling very few searches - probably 50 a day; this is an RD system. If your searches are in small bundles, you could pause the indexing flow while the searches

Re: Solr: IndexNotFoundException: no segments* file HdfsDirectoryFactory

2015-01-08 Thread xinwu
Hi,did you solve this problem? I met the same problem when I setted up solr+hdfs. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-IndexNotFoundException-no-segments-file-HdfsDirectoryFactory-tp4138737p4178034.html Sent from the Solr - User mailing list archive at

Re: leader split-brain at least once a day - need help

2015-01-08 Thread Thomas Lamy
Hi Alan, thanks for the pointer, I'll look at our gc logs Am 07.01.2015 um 15:46 schrieb Alan Woodward: I had a similar issue, which was caused by https://issues.apache.org/jira/browse/SOLR-6763. Are you getting long GC pauses or similar before the leader mismatches occur? Alan Woodward

Re: Solr startup script in version 4.10.3

2015-01-08 Thread Ramkumar R. Aiyengar
Versions 4.10.3 and beyond already use server rather than example, which still finds a reference in the script purely for back compat. A major release 5.0 is coming soon, perhaps the back compat can be removed for that. On 6 Jan 2015 09:30, Dominique Bejean dominique.bej...@eolya.fr wrote: Hi,

Re: Solr startup script in version 4.10.3

2015-01-08 Thread Anshum Gupta
Things have changed reasonably for the 5.0 release. In case of a standalone mode, it still defaults to the server directory. So you'd find your logs in server/logs. In case of solrcloud mode e.g. if you ran bin/solr -e cloud -noprompt this would default to stuff being copied into example

Re: UUIDUpdateProcessorFactory causes repeated documents when uploading csv files?

2015-01-08 Thread jia gu
Thank you for your reply Chris :) Solr is producing the correct result on its own. The problem is that I am calling a dataload class to call Solr, which worked for assigned ID and composite ID, but not for UUID. Is there a place to delete my question on the mailing list? Thank you, Jia On Wed,

Solr with Tomcat - enabling SSL problem

2015-01-08 Thread Tali Finelt
Hi, I am using Solr 4.10.2 with tomcat and embedded Zookeeper. I followed https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SolrCloud to enable SSL. I am currently doing the following: Starting tomcat Running: ../scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983

Re: How large is your solr index?

2015-01-08 Thread Shawn Heisey
On 1/8/2015 9:39 AM, Joseph Obernberger wrote: Yes - it would be 20GBytes of cache per 270GBytes of data. That's not a lot of cache. One rule of thumb is that you should have at least 50% of the index size available as cache, with 100% being a lot better. The caching should happen on the Solr

RE: Solr Cloud, 100 shards, shards progressively become slower

2015-01-08 Thread Andrew Butkus
Extrapolating what Jack was saying on his reply ... with 100 shards and 4 replicas, you have 400 cores that are each about 2.8GB. That results in a total index size of just over a terabyte, with 140GB of index data on each of the eight servers. Assuming you have only one Solr instance

Re: UUIDUpdateProcessorFactory causes repeated documents when uploading csv files?

2015-01-08 Thread Chris Hostetter
: Thank you for your reply Chris :) Solr is producing the correct result on : its own. The problem is that I am calling a dataload class to call Solr, : which worked for assigned ID and composite ID, but not for UUID. Is there a Sorry -- still confused: are you confirming that you've tracked

RE: Solr Cloud, 100 shards, shards progressively become slower

2015-01-08 Thread Toke Eskildsen
Andrew Butkus [andrew.but...@c6-intelligence.com] wrote: [Shawn/Jack: Ideal amount of RAM] Have less than this :/ :( - with not much likelihood to upgrade anytime soon The right amount of RAM is what satisfies your requirements and is tightly correlated to the speed of your underlying

Re: Solr Cloud, 100 shards, shards progressively become slower

2015-01-08 Thread Shawn Heisey
On 1/8/2015 8:57 AM, Andrew Butkus wrote: We have 4gb usage (because the shards are split by 100 each shard is approx. 2.8gb on disk), we have allocated 14gb min and 16gb max of ram to solr, so it has plenty to use (the ram in the dashboard never goes above about 8gb - so still plenty ).

Re: Solr on HDFS in a Hadoop cluster

2015-01-08 Thread Charles VALLEE
Thanks a lot Otis, While reading the SolrCloud documentation to understand how SolrCloud could run on HDFS, I got confused with leader, replica, non-replica shards, core, index, and collections. Once it is specified that one cannot add shards, then that one can add replica-only shards, then

Re: How large is your solr index?

2015-01-08 Thread Erick Erickson
bq: you'll end up with N-2 nearly full boxes and 2 half-full boxes. True, you'd have to repeat the process N times. At that point, though, as Shawn mentions it's often easier to just re-index the whole thing. Do note that one strategy is to create more shards than you need at the beginning. Say

Re: Solr Cloud, 100 shards, shards progressively become slower

2015-01-08 Thread Shawn Heisey
On 1/8/2015 7:26 AM, Andrew Butkus wrote: Hi, we have 8 solr servers, split 4x4 across 2 data centers. We have a collection of around ½ billion documents, split over 100 shards, each is replicated 4 times on separate nodes (evenly distributed across both data centers). The problem we

RE: Solr Cloud, 100 shards, shards progressively become slower

2015-01-08 Thread Andrew Butkus
Hi, we have 8 solr servers, split 4x4 across 2 data centers. We have a collection of around ½ billion documents, split over 100 shards, each is replicated 4 times on separate nodes (evenly distributed across both data centers). The problem we have is that when we use cursormark (and also when

How to return child documents with parent

2015-01-08 Thread yliu
Hi, What is the best way to return both parent document and child documents in one query? I used SolrJ to create a document and added a few child documents using addChildDocuments() method and indexed the parent document. All documents are indexed successfully (parent and children). When I

Solr Cloud, 100 shards, shards progressively become slower

2015-01-08 Thread Andrew Butkus
Hi, we have 8 solr servers, split 4x4 across 2 data centers. We have a collection of around ½ billion documents, split over 100 shards, each is replicated 4 times on separate nodes (evenly distributed across both data centers). The problem we have is that when we use cursormark (and also

Re: leader split-brain at least once a day - need help

2015-01-08 Thread Yonik Seeley
It's worth noting that those messages alone don't necessarily signify a problem with the system (and it wouldn't be called split brain). The async nature of updates (and thread scheduling) along with stop-the-world GC pauses that can change leadership, cause these little windows of inconsistencies

Re: How large is your solr index?

2015-01-08 Thread Shawn Heisey
On 1/8/2015 4:37 AM, Bram Van Dam wrote: Hmm. That is a good point. I wonder if there's some kind of middle ground here? Something that lets me send an update (or new document) to an arbitrary node/shard but which is still routed according to my specific requirements? Maybe this can already be

Re: Solr with Tomcat - enabling SSL problem

2015-01-08 Thread Shawn Heisey
On 1/8/2015 6:25 AM, Tali Finelt wrote: I am using Solr 4.10.2 with tomcat and embedded Zookeeper. I followed https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SolrCloud to enable SSL. I am currently doing the following: Starting tomcat Running:

Re: Solr with Tomcat - enabling SSL problem

2015-01-08 Thread Shawn Heisey
On 1/8/2015 8:50 AM, Tali Finelt wrote: Thanks for clarifying this. Is there a different way to set the embedded Zookeeper urlScheme parameter before ever starting tomcat? (some configuration file etc.) This way I won't need to start tomcat twice. Most of the cloud options can be specified

Re: How large is your solr index?

2015-01-08 Thread Joseph Obernberger
On 1/8/2015 3:16 AM, Toke Eskildsen wrote: On Wed, 2015-01-07 at 22:26 +0100, Joseph Obernberger wrote: Thank you Toke - yes - the data is indexed throughout the day. We are handling very few searches - probably 50 a day; this is an RD system. If your searches are in small bundles, you could

Re: Solr: IndexNotFoundException: no segments* file HdfsDirectoryFactory

2015-01-08 Thread praneethvarma
I've missed Norgorn's reply above. But in the past and also as suggested above, I think the following lock type solved the problem for me. lockType${solr.lock.type:hdfs}/lockType in your indexConfig in solrconfig.xml -- View this message in context:

Re: Solr with Tomcat - enabling SSL problem

2015-01-08 Thread Tali Finelt
Hi Shawn, Thanks for clarifying this. Is there a different way to set the embedded Zookeeper urlScheme parameter before ever starting tomcat? (some configuration file etc.) This way I won't need to start tomcat twice. Thanks, Tali From: Shawn Heisey apa...@elyograg.org To:

RE: Solr Cloud, 100 shards, shards progressively become slower

2015-01-08 Thread Andrew Butkus
Hi Shawn, Thank you for your reply The part about memory usage is not clear. That 4GB and 16GB could refer to the operating system view of memory, or the view of memory within the JVM. I'm curious about how much total RAM each machine has, how large the Java heap is, and what the total size

Re: ignoring bad documents during index

2015-01-08 Thread Chris Hostetter
i don't have specific answers toall of your questions, but you should probably look at SOLR-445 where a lot of this has already ben discussed and multiple patches with different approaches have been started... https://issues.apache.org/jira/browse/SOLR-445 : Date: Wed, 7 Jan 2015 12:38:47