Hi,
I create a collection of 2 shards with 1 replication factor and enable
autoAddReplicas. Then I kill shard2 with 'kill -9' . The overseer asked the
other solr node to create a new solr core and point to the dataDir of shard2.
Unfortunately, the new core failed to come up because of
Hi Alex
The business use case for the field is
- exact match
- singular-plural stemmingon each terms in the field
Eg. search for "dvd cases" must match "dvd case"and "dvds case".
This is the field type currently and It satisfy the business use case.
The 1 drawback of this is I need to add those
On 3/30/2017 12:34 PM, Shashank Pedamallu wrote:
> I have some configuration variables that I need to hold in Solr as it
> switches between transient states on a transient core. What is the
> best way to do this? These variables can change value during a running
> environment. So, I need to have
Hi Erick,
Thanks for your response. So, by the way you say it, I understand that there is
no way to persist variables between transient states.
I just looked at ReplicationHandler class and it has the api to enable or
disable replication on a core which is stored as an AtomicBoolean (defaulted
Yes, that would seem an accurate assessment of the problem.
-Original Message-
From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com]
Sent: Thursday, 30 March 2017 4:53 p.m.
To: solr-user@lucene.apache.org
Subject: Re: Indexing speed reduced significantly with OCR
Thanks for your reply.
Short form: There's no easy to do that ATM. The whole synchronization
process when working with transient cores (i.e. synchronizing on some
of the internal structures is pretty hairy and would require you to
fork a version of Solr to change.
Much of this is being worked out in SOLR-8906 where you
Hi All,
I have some configuration variables that I need to hold in Solr as it switches
between transient states on a transient core. What is the best way to do this?
These variables can change value during a running environment. So, I need to
have read and write access to the persistent store.
Right, you're artificially forcing this loading/unloading, which is a
good thing to stress!
Every time the code accesses a core, the last access time should be
updated. So somehow either you're accessing more than two cores in a
round-robin fashion or you're somehow having other requests come in
With hight entropy we see the same latency even when working with 1 shard.
Assuming that even with 1 shard, Solr is still working hard to route the
documents,
what is the component that is responsible for the document routing?
Is it the zookeeper?
And how would you verify that that's the
Hi,
Thanks Erik and Shawn for so much info!
1) Yes, I have deliberately put transientCacheSize as very small(2) in my dev
laptop for testing how Solr handles the switch and how it affects my backup
process.
2) Yes, I'm not using Solr Cloud. Each individual Solr core is independent and
we are
Did you check the number of documents that end up on each shard in
these two scenarios.
My guess would be that - perhaps - low entropy key puts most of the
documents into one shard and high-entropy key causes a lot more
routing traffic with delay coming from the network communication
and/or
Hi
Yes it is solrCloud, we saw the same behavior with 1,2 and 4 shards. each
shard has 3 replicas.
Each bulk contains 300 docs. We get approximately 800 docs inserted in a
second.
~6000 docs are being sent in an iteration by all loading threads.
we have 20 threads, each sending bulks of 300
I'm inferring that at the end of the day, all your docs fit in a
single index, correct? SolrCloud won't be a magic bullet, and I'd
strongly advise if you _do_ go to SolrCloud to use SlolrJ or similar
to feed docs as DIH runs on a single server.
However, all that aside if I can restate your
bq: I thought that LotsOfCores didn't coexist with Cloud very well.
It doesn't, you're right, I got off on a tangent there. The OP
mentioned "Cloud" and my brain cross-wired.
On Thu, Mar 30, 2017 at 6:32 AM, Shawn Heisey wrote:
> On 3/29/2017 8:09 PM, Erick Erickson wrote:
Are you by any chance in the SolrCloud?
And to confirm, the total number of documents is the same within any
particular time period?
Regards,
Alex.
http://www.solr-start.com/ - Resources for Solr users, new and experienced
On 30 March 2017 at 10:50, moscovig wrote:
As I said before, this is a great application for pay-as-needed cloud servers.
Netflix’s first use of Amazon EC2 was encoding movies for different screen
sizes, data rates, codecs, and DRM. They would fire up a hundred or a thousand
instances, feed movies to them, pick up the encodes, then
OK, that complicates things a bit.
I would still try to go for a solution where you store the rich text in
Solr, but make sure you tokenize it correctly.
If the format is relatively simple, you could use either a regexp pattern
tokenizer
Thanks Shawn.
We do specify
3
30
false
but I guess that still, the commitWithin 300 ms is a bad idea.
We will definitely try playing with the configs you suggested.
I still don't get the reason for a fast inserting when
On 3/30/2017 7:36 AM, moscovig wrote:
> We are using solr 6.2.1 for server and solrj 6.2.0, with no explicit commits,
> and -
>
> 3
> 30
> for autoCommit.
>
> Each request to solr contains 300 small documents with different keys., with
> a commitWithin of 300 ms.
I think the
Hi
We are using solr 6.2.1 for server and solrj 6.2.0,
with no explicit commits, and -
3
30
for autoCommit.
Each request to solr contains 300 small documents with different keys., with
a commitWithin of 300 ms.
We have lots of requests coming in.
The behavior is as the following:
On 3/29/2017 8:09 PM, Erick Erickson wrote:
> bq: My guess is that it is decided by the load time, because this is
> the option that would have the best performance.
>
> Not at all. The theory here is that this is to support the pattern
> where some transient cores are used all the time and some
Hi All,
I have a problem with scalability on my project. we are running almost
close of 100 cores which are having documents of ~25000 each and the total
size of the index files being 7.5 GB.
Also, we have the staging server where we build index files using data
importer and using replication
> Note that the OCRing is a separate task from Solr indexing, and is best done
> on separate machines.
+1
-Original Message-
From: Rick Leir [mailto:rl...@leirtech.com]
Sent: Thursday, March 30, 2017 7:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Indexing speed reduced
The workflow is
-/ OCR new documents
-/ check quality and tune until you get good output text
-/ keep the output text in the file system
-/ index and re-index to Solr as necessary from the file system
Note that the OCRing is a separate task from Solr indexing, and is best done on
separate
What's you actual business use case?
On 30 Mar 2017 1:53 AM, "Derek Poh" wrote:
> Hi Erick
>
> So I could also not use the query analyzer stage to append the code to the
> search keyword?
> Have the front-end application append the code for every query it issue
>
Hi forest
Do you have a html to richtext converter? You could use it on the highlighter's
output. Otherwise you could count characters in the html. That might only be
useful if your richtext font is fixed width.
Cheers -- Rick
On March 30, 2017 4:39:39 AM EDT, forest_soup
Unfortunately the rich text is not an html/xml/doc/pdf or any other popular
rich text format. And we would like to show the highlighted text in the
doc's own specific viewer. That's why I'm eagerly want the offset.
The /tvrh(term vector component) and tv.offsets/tv.positions can give us
such
OK, so the next thing to do would be to index and store the rich text ...
is it HTML? Because then you can use HTMLStripCharFilterFactory in your
analyzer, and still get the correct highlight back with hl.fragsize=0.
I would think that you will have a hard time using the term positions, if
what
Hi Alex
Thank you for pointing out theUpdateRequestProcessor option.
On 3/30/2017 11:43 AM, Alexandre Rafalovitch wrote:
I am not sure I can tell how to decide on one or another. However, I
wanted to mention that you also have an option of doing in in the
UpdateRequestProcessor chain. That's
29 matches
Mail list logo