Is there any good document about Lucene Test Framework?
I can only find API docs.
Mimicking the unit test I've found in Lucene trunk, I tried to write
a unit test that tests a TokenFilter I am writing. But it is failing
with an error message like:
java.lang.AssertionError: close() called in wrong
Problem solved - it's caused by a system outside of Solr. Thank you all for
the prompt replies! :)
On Thu, Jan 8, 2015 at 12:40 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:
: Thank you for your reply Chris :) Solr is producing the correct result
on
: its own. The problem is that I am
(semi-relevant aside) We do happen to ship this test framework with
Solr distribution (in dist/test-framework).
Why, I don't know!
Regards,
Alex.
Sign up for my Solr resources newsletter at http://www.solr-start.com/
On 8 January 2015 at 23:23, Shawn Heisey apa...@elyograg.org wrote:
On 1/8/2015 8:31 PM, TK Solr wrote:
Is there any good document about Lucene Test Framework?
I can only find API docs.
Mimicking the unit test I've found in Lucene trunk, I tried to write
a unit test that tests a TokenFilter I am writing. But it is failing
with an error message like:
Is it possible that tuning garbage collection to achieve much better
pause characteristics might actually *decrease* index performance?
Rebuilds that I did while still using a tuned CMS config would take
between 5.5 and 6 hours, sometimes going slightly over 6 hours.
A rebuild that I did
In the abstract, it sounds like you are seeing the difference between tuning
for latency vs tuning for throughput
My hunch would be you are seeing more (albeit individually quicker) GC events
with your new settings during the rebuild
I imagine that in most cases a solr rebuild is relatively
I would not be surprised at all. Optimizing for minimum pauses usually
increases overhead that decreases overall throughput. This is a pretty common
tradeoff.
For maximum throughput, when you don’t care about pauses, the simplest
non-concurrent GC is often the best. That might be the right
On 1/8/2015 11:05 PM, Boogie Shafer wrote:
In the abstract, it sounds like you are seeing the difference between tuning
for latency vs tuning for throughput
My hunch would be you are seeing more (albeit individually quicker) GC events
with your new settings during the rebuild
I imagine
Did you check [child] at
https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents
?
On Thu, Jan 8, 2015 at 5:53 PM, yliu y...@mathworks.com wrote:
Hi,
What is the best way to return both parent document and child documents in
one query? I used SolrJ to create a
Thanks guys for your inputs I would be looking at around 100 Tb of total
index size with 5100 million documents for a period of 30 days before
we purge the indexes.I had estimated it slightly on the higher side of
things but that's where I feel we would be.
Thanks,
Nishanth
On Wed, Jan 7,
My final advice would be my standard proof of concept implementation advice
- test a configuration with 10% (or 5%) of the target data size and 10% (or
5%) of the estimated resource requirements (maybe 25% of the estimated RAM)
and see how well it performs.
Take the actual index size and multiply
On 01/07/2015 05:42 PM, Erick Erickson wrote:
True, and you can do this if you take explicit control of the document
routing, but...
that's quite tricky. You forever after have to send any _updates_ to the same
shard you did the first time, whereas SPLITSHARD will do the right thing.
Hmm. That
On Wed, 2015-01-07 at 22:26 +0100, Joseph Obernberger wrote:
Thank you Toke - yes - the data is indexed throughout the day. We are
handling very few searches - probably 50 a day; this is an RD system.
If your searches are in small bundles, you could pause the indexing flow
while the searches
Hi,did you solve this problem?
I met the same problem when I setted up solr+hdfs.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-IndexNotFoundException-no-segments-file-HdfsDirectoryFactory-tp4138737p4178034.html
Sent from the Solr - User mailing list archive at
Hi Alan,
thanks for the pointer, I'll look at our gc logs
Am 07.01.2015 um 15:46 schrieb Alan Woodward:
I had a similar issue, which was caused by
https://issues.apache.org/jira/browse/SOLR-6763. Are you getting long GC
pauses or similar before the leader mismatches occur?
Alan Woodward
Versions 4.10.3 and beyond already use server rather than example, which
still finds a reference in the script purely for back compat. A major
release 5.0 is coming soon, perhaps the back compat can be removed for that.
On 6 Jan 2015 09:30, Dominique Bejean dominique.bej...@eolya.fr wrote:
Hi,
Things have changed reasonably for the 5.0 release.
In case of a standalone mode, it still defaults to the server directory. So
you'd find your logs in server/logs.
In case of solrcloud mode e.g. if you ran
bin/solr -e cloud -noprompt
this would default to stuff being copied into example
Thank you for your reply Chris :) Solr is producing the correct result on
its own. The problem is that I am calling a dataload class to call Solr,
which worked for assigned ID and composite ID, but not for UUID. Is there a
place to delete my question on the mailing list?
Thank you,
Jia
On Wed,
Hi,
I am using Solr 4.10.2 with tomcat and embedded Zookeeper.
I followed
https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SolrCloud
to enable SSL.
I am currently doing the following:
Starting tomcat
Running:
../scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983
On 1/8/2015 9:39 AM, Joseph Obernberger wrote:
Yes - it would be 20GBytes of cache per 270GBytes of data.
That's not a lot of cache. One rule of thumb is that you should have at
least 50% of the index size available as cache, with 100% being a lot
better. The caching should happen on the Solr
Extrapolating what Jack was saying on his reply ... with 100 shards and
4 replicas, you have 400 cores that are each about 2.8GB. That results in a
total index size of just over a terabyte, with 140GB of index data on each of
the eight servers.
Assuming you have only one Solr instance
: Thank you for your reply Chris :) Solr is producing the correct result on
: its own. The problem is that I am calling a dataload class to call Solr,
: which worked for assigned ID and composite ID, but not for UUID. Is there a
Sorry -- still confused: are you confirming that you've tracked
Andrew Butkus [andrew.but...@c6-intelligence.com] wrote:
[Shawn/Jack: Ideal amount of RAM]
Have less than this :/ :( - with not much likelihood to upgrade anytime soon
The right amount of RAM is what satisfies your requirements and is tightly
correlated to the speed of your underlying
On 1/8/2015 8:57 AM, Andrew Butkus wrote:
We have 4gb usage (because the shards are split by 100 each shard is approx.
2.8gb on disk), we have allocated 14gb min and 16gb max of ram to solr, so it
has plenty to use (the ram in the dashboard never goes above about 8gb - so
still plenty ).
Thanks a lot Otis,
While reading the SolrCloud documentation to understand how SolrCloud
could run on HDFS, I got confused with leader, replica, non-replica
shards, core, index, and collections.
Once it is specified that one cannot add shards, then that one can add
replica-only shards, then
bq: you'll end up with N-2 nearly full boxes and 2 half-full boxes.
True, you'd have to repeat the process N times. At that point, though,
as Shawn mentions it's often easier to just re-index the whole thing.
Do note that one strategy is to create more shards than you need at
the beginning. Say
On 1/8/2015 7:26 AM, Andrew Butkus wrote:
Hi, we have 8 solr servers, split 4x4 across 2 data centers.
We have a collection of around ½ billion documents, split over 100 shards,
each is replicated 4 times on separate nodes (evenly distributed across both
data centers).
The problem we
Hi, we have 8 solr servers, split 4x4 across 2 data centers.
We have a collection of around ½ billion documents, split over 100 shards, each
is replicated 4 times on separate nodes (evenly distributed across both data
centers).
The problem we have is that when we use cursormark (and also when
Hi,
What is the best way to return both parent document and child documents in
one query? I used SolrJ to create a document and added a few child
documents using addChildDocuments() method and indexed the parent document.
All documents are indexed successfully (parent and children).
When I
Hi, we have 8 solr servers, split 4x4 across 2 data centers.
We have a collection of around ½ billion documents, split over 100 shards, each
is replicated 4 times on separate nodes (evenly distributed across both data
centers).
The problem we have is that when we use cursormark (and also
It's worth noting that those messages alone don't necessarily signify
a problem with the system (and it wouldn't be called split brain).
The async nature of updates (and thread scheduling) along with
stop-the-world GC pauses that can change leadership, cause these
little windows of inconsistencies
On 1/8/2015 4:37 AM, Bram Van Dam wrote:
Hmm. That is a good point. I wonder if there's some kind of middle
ground here? Something that lets me send an update (or new document) to
an arbitrary node/shard but which is still routed according to my
specific requirements? Maybe this can already be
On 1/8/2015 6:25 AM, Tali Finelt wrote:
I am using Solr 4.10.2 with tomcat and embedded Zookeeper.
I followed
https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SolrCloud
to enable SSL.
I am currently doing the following:
Starting tomcat
Running:
On 1/8/2015 8:50 AM, Tali Finelt wrote:
Thanks for clarifying this.
Is there a different way to set the embedded Zookeeper urlScheme parameter
before ever starting tomcat? (some configuration file etc.)
This way I won't need to start tomcat twice.
Most of the cloud options can be specified
On 1/8/2015 3:16 AM, Toke Eskildsen wrote:
On Wed, 2015-01-07 at 22:26 +0100, Joseph Obernberger wrote:
Thank you Toke - yes - the data is indexed throughout the day. We are
handling very few searches - probably 50 a day; this is an RD system.
If your searches are in small bundles, you could
I've missed Norgorn's reply above. But in the past and also as suggested
above, I think the following lock type solved the problem for me.
lockType${solr.lock.type:hdfs}/lockType in your indexConfig in
solrconfig.xml
--
View this message in context:
Hi Shawn,
Thanks for clarifying this.
Is there a different way to set the embedded Zookeeper urlScheme parameter
before ever starting tomcat? (some configuration file etc.)
This way I won't need to start tomcat twice.
Thanks,
Tali
From: Shawn Heisey apa...@elyograg.org
To:
Hi Shawn,
Thank you for your reply
The part about memory usage is not clear. That 4GB and 16GB could refer to
the operating system view of memory, or the view of memory within the JVM.
I'm curious about how much total RAM each machine has, how large the Java
heap is, and what the total size
i don't have specific answers toall of your questions, but you should
probably look at SOLR-445 where a lot of this has already ben discussed
and multiple patches with different approaches have been started...
https://issues.apache.org/jira/browse/SOLR-445
: Date: Wed, 7 Jan 2015 12:38:47
39 matches
Mail list logo