Erick,
Indexing happening via Solr cloud server. This thread was from the leader. Some
followers show symptom of high cpu during this time. You think this is from
locking? What is the thread that is holding the lock doing? Also, we are unable
to reproduce this issue in load test environment.
Hi solr-user,
Can't judge the cause on fast glimpse of your definition but some suggestions I
can give:
1. I take a look at Jieba. It uses a dictionary and it seems to do a good job
on CJK. I doubt this problem may be from those filters (note: I can understand
you may use CJKWidthFilter to
Hi solr-user,
I always uses CJKTokenizer on appropriate amount of Chinese news articles. Say
in Chinese, character C1 has same meaning as character C2 (e.g 台=臺), Is it
possible that I only add this line in synonym.txt:
C1,C2 (and in true exmaple: 台, 臺)
and by applying CJKTokenizer and
On Wed, 2015-10-14 at 10:17 +0200, Jan Høydahl wrote:
> I have not benchmarked various number of segments at different sizes
> on different HW etc, so my hunch could very well be wrong for Salman’s case.
> I don’t know how frequent updates there is to his data either.
>
> Have you done #segments
Hi Tim,
Should we remove the console appender by default? This is very trappy I guess.
On Tue, Oct 20, 2015 at 11:39 PM, Timothy Potter wrote:
> You should fix your log4j.properties file to no log to console ...
> it's there for the initial getting started experience, but
Hello,
recently I've deployed Solr 5.2.1 and I've observed the following issue:
My documents have two fields: id and text. Solr is configured to use
FastVectorHighlighter (I've tried StandardHighlighter too, no
difference). I've created the schema.xml, solrconfig.xml hasn't been
changed in
The details are in Tim's blog post and the linked JIRAs
Unfortunately, the only real solution I know of is to upgrade
to at least Solr 5.2. Meanwhile, throttling the indexing rate
will at least smooth out the issue. Not a great approach but
all there is for 4.6.
Best,
Erick
On Thu, Oct 22, 2015
Yes, it works (now, not sure when though). I just adjusted the
TestContentStreamDataSource test case, see patch pasted below, that passes.
Note that the solrconfig file has a mistake in that the attribute ‘key’ isn’t
correct - should be ‘name’
(this was tested on trunk via IntelliJ, just FYI
You need to tell the second call which documents to update. Are you doing
that?
There may also be a wrinkle in the URP order, but let's get the first step
working first.
On 22 Oct 2015 12:59 pm, "Roxana Danger"
wrote:
> yes, it's working now... but I can not use
Upayavira,
Thanks a lot for these information
Regards,
Bruno
Le 21/10/2015 19:24, Upayavira a écrit :
regexp will match the whole term. So, if you have stemming on, magnetic
may well stem to magnet, and that is the term against which the regexp
is executed.
If you want to do the regexp
Hello Roxana,
I feel it's almost impossible. I can only suggest to commit to make new
terms visible.
There is SolrCore.getRealtimeSearcher() but I never understand what it
does.
On Thu, Oct 22, 2015 at 1:20 PM, Roxana Danger <
roxana.dan...@reedonline.co.uk> wrote:
> Hello,
>
> I would like to
thread hijack:
Erick, wdyt about writing query-time analog of [child]
https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents
?
On Thu, Oct 22, 2015 at 6:32 PM, Erick Erickson
wrote:
> You will NOT get the stored fields from the child record
>
Mikhail:
Brilliant! Assuming we can get the "from" and "to" parameters out of
the query and, perhaps, the fromIndex (for cross-core) then it
_should_ just be a matter of fetching the from doc and adding the
fields. And since it's only operating on the returned documents it
also shouldn't be very
Thanks Erick. Currently, migrating to 5.3 and it is taking a bit of
time. Meanwhile, I looked at the JIRAs from the blog and the stack trace
looks a bit different from what I see but not sure if they are related.
Also, as per the stack trace I have included in my original email, it is
the
Hi Emir,
Very weirdly. I've reply to your email at home many times yesterday but they
never show up in the solr-user email list again. Don't know why. So I reply
this again at office. Hope this will show up.
Thanks to your explanation. I'll see PatternReplaceCharFilter as a workaround
(As I
Hi Edwin,
Since you've tested all my suggestions and the problem is still there, I can't
think of anything wrong with your configuration. Now I can only suspect two
things:
1. You said the problem only happens on "contents" field, so maybe there're
something wrong with the contents of that
Hi,
I'm using Solr 5.3.0 on a Red Hat EL 7 and I try to extract content from
PDF, Word, LibreOffice, etc. docs using the ExtractingRequestHandler.
Everything works fine, except when I want to extract content from embedding
images in PDF/Word etc. documents :
I send an extract request like this
Hello,
I would like to create an updateRequestProcessorChain that should to be
executed after a DB DIH. I am extending UpdateRequestProcessorFactory and
the UpdateRequestProcessor classes. The method processAdd of my
UpdateRequestProcessor should be able to update the documents with the
indexed
Hi Scott,
Using PatternReplaceCharFilter is not same as replacing raw data
(replacing raw data is not proper solution as it does not solve issue
when searching with "other" character). This is part of token
standardization, no different than lower casing - it is standard
approach as well when
Hi Scott,
I don't have experience with Chinese, but SynonymFilter works on tokens,
so if CJKTokenizer recognizes C1 and Cm as tokens, it should work. If
not, than you can try configuring PatternReplaceCharFilter to replace C1
to C2 during indexing and searching and get a match.
Thanks,
Emir
The leader election issue we were having was solved by passing
-Djava.net.preferIPv4Stack=true
to zookeeper startup script
It seems our Linux servers have IPv6 enabled but we have no IPv6 network.
Hope this helps others.
Arcadius.
On 4 September 2015 at 04:57, Arcadius Ahouansou
Hi solr-user,
Ya, I thought about replacing C1 with C2 in the underground raw data. However,
it's a huge data set (over 10M news articles) so I give up this strategy
eariler. My current temporary solution is going back to use 1-gram tokenizer
((i.e.StandardTokenizer) so I can only set 1 rule.
Roxana -
What is the purpose of doing this? (that’ll help guide the best approach)
It can be quite handy to get the terms from analysis into a field as stored
values and to separate terms into separate fields and such. Here’s a
presentation where I detailed an update script trick that
Hi,
Given an xml structure:
Subject
032-001946363
Subject
037-001946370
Author
040-001959713
Author
040-001959829
Hello all,
We experienced a two major problems in two days on one of our data centers.
Here is our setup: 15 nodes, 3 shards, one replica per node, around 50Gb of
index per shard.
We are running Solr 4.10.4 on Ubuntu servers using jdk 1.8.0u51.
We have an ensemble of 5 zookeeper nodes to
I don't think DIH supports siblings. Have you thought of using XSLT
processor before sending XML to Solr. Or using it instead of DIH
during the update (not a well know part of Solr):
On 10/22/2015 12:24 AM, Shalin Shekhar Mangar wrote:
> Should we remove the console appender by default? This is very trappy I guess.
The only time we should need console logging is when Solr is run in the
foreground, and in that case, it should not be saved to a file, just
printed on the
Hi Scott,
Thank you for your response and suggestions.
With respond to your questions, here are the answers:
1. I take a look at Jieba. It uses a dictionary and it seems to do a good
job on CJK. I doubt this problem may be from those filters (note: I can
understand you may use CJKWidthFilter to
Hello.
We have a Solr 5.3.0 installation with ~4 TB index size, and the volume
containing it is almost full. I hoped to utilize SolrCloud power to split
index into two shards or Solr nodes, thus spreading index across several
physical devices. But as I look closer, it turns out that
Hi Alex,
My idea behind this is avoid two calls: first, the importer and after the
updater. As there is an update processor chain than can be used after the
DIH, I thorough it was possible to get a real-time updater.
So, I am getting your advice and dividing the process in different steps. I
On 10/22/2015 8:29 AM, Nikolay Shuyskiy wrote:
> I imagined that I could, say, add two new nodes to SolrCloud, and split
> shard so that two new shards ("halves" of the one being split) will be
> created on those new nodes.
>
> Right now the only way to split shard in my situation I see is to
Hi Scott,
Thank you for your respond.
1. You said the problem only happens on "contents" field, so maybe there're
something wrong with the contents of that field. Doe it contain any special
thing in them, e.g. HTML tags or symbols. I recall SOLR-42 mentions
something about HTML stripping will
Hi,
Would like to check, is it good to use EdgeNGramFilterFactory for indexes
that contains Chinese characters?
Will it affect the accuracy of the search for Chinese words?
I have rich-text documents that are in both English and Chinese, and
currently I have EdgeNGramFilterFactory enabled during
On 10/22/2015 10:49 PM, awhosit wrote:
Not working one is solr 5.2.1/SLES 12.
> But I have working one with solr 5.2.1/SLES 11 and solr 5.2.1/Ubuntu 14.
>
> From the log left in sol-8983-console.log is as follow.
> I'm using OpenJDK 1.7 as follow.
>
> java version "1.7.0_85"
> OpenJDK Runtime
I can confirm this is working in PROD at 100M hits a day.
Can we commit it please? Begging here.
https://issues.apache.org/jira/browse/SOLR-7993
--
Bill Bell
billnb...@gmail.com
cell 720-256-8076
Hi, I'm newbie on solr, but have same issue.
More precisely, only one machine can't start solr with the message, "cannot
open {solr.log} file for reading: No such file or directory." Obviously
there is no file and even I created empty one, it doesn't help.
I've tried also - moving around the
Dear Mikhail,
Thank you very much for your advice. I have tried, but the realTimeSearcher
didn't help...
This may looks very silly but: can a commit be called with
RunUpdateProcessorFactory? Can I use it twice in a
updateRequestProcessorChain?
Thank you very much again,
Roana
On 22 October 2015
You are doing things out of order. It's DIH, URP, then indexer. Any
attempt to subvert that order for the record being indexed will end in
problems.
Have you considered doing a dual path? Index, then update. Of course,
your fields all need to be stored for that.
Also, perhaps you need to rethink
When you run a full-import, Solr will try to delete old documents
before importing the new ones. If there is several top-level entities,
they step on each other foot.
Use preImportDeleteQuery to avoid that (as per
Hi Erik,
Thanks for the links, but the analyzers are called correctly. The problem
is that I need to get access to the whole set of terms through a searcher,
but the request searcher cannot retrieve any terms because the commit
method has not been called already.
My idea behind this is avoid two
Thanks for adding that to our collective knowledge store!
On Thu, Oct 22, 2015 at 2:44 AM, Arcadius Ahouansou
wrote:
> The leader election issue we were having was solved by passing
>
> -Djava.net.preferIPv4Stack=true
>
> to zookeeper startup script
>
> It seems our Linux
You will NOT get the stored fields from the child record
with the join operation, it's called "pseudo join" for a
good reason.
It's usually a mistake to try to force Solr to performa just
like a database. I would seriously consider flattening
(denormalizing) the data if at all possible.
Best,
It is still not clear what problem you are really trying to solve. This is
what we call an XY problem - you are focusing on your intended solution but
not describing the original, underlying problem, the application itself.
IOW, there may be a much more appropriate solution for us to suggest if
Hi,
Some times I see OOM happening on replicas,but does not trigger script
oom_solr.sh which was passed in as
-XX:OnOutOfMemoryError=/actualLocation/solr/bin/oom_solr.sh 8091.
These OOM happened while DIH importing data from database. Is this known
issue? is there any quick fix?
Here are stack
Well, I guess I imagined three steps:
1) Run DIH
2) Get the tokenized representation for each document using facets or
other approaches
3) Submit document partial-update request with additional custom
processing through URP
Your example seems to be skipping step 2, so the URP chain does not
know
Hello,
recently I've deployed Solr 5.2.1 and I've observed the following issue:
My documents have two fields: id and text. Solr is configured to use
FastVectorHighlighter (I've tried StandardHighlighter too, no
difference). I've created the schema.xml, solrconfig.xml hasn't been
changed in
Setting “update.chain” in the DataImportHandler handler defined in
solrconfig.xml should allow you to specify which update chain is used. Can you
confirm that works, Shawn?
This is from DataImportHandler.java:
UpdateRequestProcessorChain processorChain =
On 10/22/2015 10:32 AM, Erik Hatcher wrote:
> Setting “update.chain” in the DataImportHandler handler defined in
> solrconfig.xml should allow you to specify which update chain is used. Can
> you confirm that works, Shawn?
I tried this a couple of years ago without luck. Does it work now?
We have a strange behavior of our Sorlcloud related code after upgrading from
from Solr 4.4 to Solr 4.10 (as part of upgrading from Cloudera CDH 4.6 to
Cloudera CDH 5.4.5).
We have a Solrcloud collection with three replicas of one shard.
Our code does batch indexing, then submits a soft commit
Solr 4.6.1 cloud
Looking into thread dump 4-5 threads causing cpu to go very high and
causing issues. These are tomcat's http threads and are locking. Can
anybody help me understand what is going on here? I see that incoming
connections coming in for updates and they are being passed on to
Hi Alexandre,
The DIH is executed correctly and the tokenized representation is obtained
correctly, but the URP chain is not executed with the call:
http://localhost:8983/solr/reed_jobs/update/details?commit=true
Isn't it the correct URL? is there any parameter missing?
Best,
Roxana
On 22
On 10/22/2015 10:09 AM, Roxana Danger wrote:
> The DIH is executed correctly and the tokenized representation is obtained
> correctly, but the URP chain is not executed with the call:
> http://localhost:8983/solr/reed_jobs/update/details?commit=true
> Isn't it the correct URL? is there any
Yes, it arrives there...
On 22 October 2015 at 17:32, Erik Hatcher wrote:
> Setting “update.chain” in the DataImportHandler handler defined in
> solrconfig.xml should allow you to specify which update chain is used. Can
> you confirm that works, Shawn?
>
> This is from
Prior to Solr 5.2, there were several inefficiencies when distributing
updates to replicas, see:
https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/.
The symptom was that there was significantly higher CPU utilization on
the followers
compared to the leader.
The
On 10/22/2015 10:37 AM, vitaly bulgakov wrote:
> But it returns no results when the query has a term which is not in a
> document.
> Say searching for "building constructor" I get a result, but
> searching for "good building constructor" returns no results because there
> are no documents
yes, it's working now... but I can not use the updateprocessor chain. I
need to use first the DIH and then use UPR, but I am not having luck in
updating my docs with the URL:
http://localhost:8983/solr/reed_jobs/update/jtdetails?commit=true
Do you manage to use an updateProcessor chain after use
56 matches
Mail list logo