Hi all,
On our servers we have been dealing with the CLOSE_WAIT problem. The
documentation of Solr says to use a static object and use it for all
connections. We are on Solr 3.3 and this approach seems to be creating
hanging queries for us and slowing down the website.
But when we create a new
Yes, I did this and the Words with the Umlaute went through the Stopfilter.
The ones without Umlaute were correctly removed.
On Thu, Nov 8, 2012 at 2:22 AM, Lance Norskog goks...@gmail.com wrote:
You can debug this with the 'Analysis' page in the Solr UI. You pick
'text_general' and then give
When I look at the text_de fieldType provided in the example schema i can
see:
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_de.txt format=snowball
enablePositionIncrements=true/
filter
Hi,
Thanks for the reply. Yes, I grouped the documents based on the
restaurant_id and got 1 result per group. Setting the group.format to
simple helped with the formatting.
Thanks,
Indika
On 8 November 2012 12:10, Rafał Kuć r@solr.pl wrote:
Hello!
Look at the field collapsing
Hi!
I'm trying to setup SolrCloud with replicated zookeeper, but have a problem.
I'm using Jetty 8 (not embedded), Zookeeper 3.3.6, SolrCloud 4.0 from
branch, Ubuntu 12.04 LTS.
My configs are:
Four Jetty instances running on ports 8080, 8081, 8082 and 8083
Jetty1.sh:
JAVA_OPTIONS=$JAVA_OPTIONS
Are you trying to do this in real time or offlline? Wouldn't mining your
access logs help? It may help to have your front end application pass in
some extra parameters that are not interpreted by Solr but are there for
stamping purposes for log analysis. One example could be a user id or
user
Look at the normal ngram tokenizer. Engine with ngram size 3 would yield
eng ngi gin ine so a search for engi should match. You can play
around with the min/max values. Edge ngram is useful for prefix matching
but sounds like you want intra-word matching too? (eng should match
ResidentEngineer)
On Wed, Nov 7, 2012 at 5:16 PM, Walter Underwood wun...@wunderwood.org wrote:
You are probably thinking of SweetSpotSimilarity. You might also want to look
at pivoted document normalization.
Thanks, I'll take a look at that.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
Hello,
how could i re-index stored values of my solr index if i change the
schema. Is there a gentle way to do this with stored values within solr
itself? Normally i have to grab the stored values of a field and put it
again to an update query for solr.
What does that to to copied fields?
On Wed, Nov 7, 2012 at 11:45 AM, Daniel Brügge
daniel.brue...@googlemail.com wrote:
Hi,
i am running a SolrCloud cluster with the 4.0.0 version. I have a stopwords
file
which is in the correct encoding.
What makes you think that?
Note: Because I can read it is not the correct answer.
I trust the 'file' command output. And if i can read there UTF-8 Unicode
I believe that this is correct. Don't know if this is the 'correct answer'
for you ;)
BTW: It works locally, but not with ZK. So it's maybe more a ZK issue, which
somehow destroys my file. Will check.
On Thu, Nov 8, 2012 at
Yes, that is true. We are looking for partial word matches. It seems like
we can achieve this by using edge ngram for prefixes and adding wild card
at the end for ignoring suffix. If we set the edge ngram to 3. eng will
match ResidentEng but not ResidentEngineer. But a search for eng* will
match
It's my understanding that your strategy is correct, although I expect that
zookeeper would need to be updated somehow with the new second and third
shards, no?
On Nov 8, 2012, at 2:36 AM, SuoNayi suonayi2...@163.com wrote:
Hi all,
Because it' unable to add or remove shard after solrcloud
Thanks Prithu.
But why would I use different settings for the index and query? I would think
that if the setting is not the same for both, then search results for end users
would be confusing, no? To illustrate my point (this maybe drastic) if I don't
solr.LowerCaseFilterFactory in one
I think that your approach would work, but you would want to have your master
server up when you bring the new server online. The new server will become
a follower of one of the shards, and get the shard content via replication.
Once complete, you could shut down the shard on the
Hi,
We do not want to store positions for some fields or omit term and positions
(or just tf) for other fields. Obviously we don't need/want explicit phrase
matching on the fields we want to configure without positions, but (e)dismax
doesn't let us. All text fields configured in the QF
HI,
Can someone help me understand the meaning of analyzer type=index and
analyzer type=query in schema.xml, how they are used and what do I get back
when the values are not the same?
For example, given:
fieldType name=text class=solr.TextField positionIncrementGap=100
Hello my name is Antony and I'm new to apache nutch and solr.
I want to crawl my website and therefore I downloaded nutch to do this.
This works fine. But no I would like to integrate nutch with solr. Im
running this on my unix system.
Im trying to follow this tutorial:
Hello again,
With the following config :
- 2 zookeeper ensemble
- 2 shards
- 2 main solr instances for the 2 shards
- I added 2, 3 replicates for fun.
While running and I stop one replicate, I see in admin ui graph updates
(replicate disabled/inactivated)...normal.
But if I stopped all solr
Weird, if i return the file contents in ZK with 'get' it returns me
w??rde | would
w??rden | would
for example. So the Umlaute are not shown. Does anyone have an idea if this
is because of Zookeepers cli or of the file contents itself?
Thanks regards.
On Thu, Nov 8,
Hi,
Your Nutch schema likely points to the old EnglishPorterFilter that doesn't
exist anymore. You can change that occurance to PorterStemFilterFactory, that
should fix the issue.
-Original message-
From:Antony Steiner ant.stei...@gmail.com
Sent: Thu 08-Nov-2012 14:05
To:
Is it possible to run a spellcheck on multiple fields. I am aware of using a
multivalued field for this
(http://lucene.472066.n3.nabble.com/spellcheck-on-multiple-fields-td1587327.html)
However, what I want is to return spellcheck alternatives based on the field
against which the query ran. So if
Ah, I have fixed it. It was necessary to import the files into Zookeeper
using the file.encoding system property and set it to UTF-8. Then it
worked. Hooray. :)
e.g.
java -Dfile.encoding=UTF-8 -Dbootstrap_confdir=/home/me/myconfdir
-Dcollection.configName=config1 -DzkHost=zkhost:2181
Hi,
Thank you for your sugestion. Nope, it didn't change anything. Should I
post the full stacktrace?
Regards
Antony
2012/11/8 Markus Jelsma markus.jel...@openindex.io
Hi,
Your Nutch schema likely points to the old EnglishPorterFilter that
doesn't exist anymore. You can change that
Hi - it fixes it here. Please post the full stack trace.
-Original message-
From:Antony Steiner ant.stei...@gmail.com
Sent: Thu 08-Nov-2012 15:16
To: solr-user@lucene.apache.org
Subject: Re: Apache Nutch 1.5.1 + Apache Solr 4.0
Hi,
Thank you for your sugestion. Nope, it didn't
You should probably start here:
http://lucene.apache.org/solr/4_0_0/tutorial.html
For indexing and analysis, i.e. how the text is
transformed for indexing and searching (things like
stemming, lowercasing etc.) that's all configured
in schema.xml (you'll find that file in
Yep, that's the usual process for growth planning.
Best
Erick
On Wed, Nov 7, 2012 at 4:01 AM, SuoNayi suonayi2...@163.com wrote:
Hi all,
Because we cannot add or remove shard when solrcloud cluster has been
set up,
so we have to predict a precise shard size at first, says we need 3
On 8 November 2012 16:10, Andreas Niekler
aniek...@informatik.uni-leipzig.de wrote:
Hello,
how could i re-index stored values of my solr index if i change the
schema. Is there a gentle way to do this with stored values within solr
itself? Normally i have to grab the stored values of a field
Dear All,
I'm using an external program (my own client) to access to my
Apache-SolR database.
I would like to restrict the SOLR access to a specific User-Agent
(defined in my program).
I would like to know if it's possible to do that directly in SolR config
or I must
process that in the
Many token filters will be used 100% identically for both index and
query analysis, but WordDelimiterFilter is a rare exception. The issue is
that at index time it has the ability to generate multiple tokens at the
same position (the catenate options), any of which can be queried, but at
query
Hmmm, I tried this with a 2 shard cluster and it works just fine, using
your schema, solrconfig and query so I'm puzzled. What happens when you
look at your cluster with the admin page? When you dive into collection1,
does it show any documents?
Also, look at admin/schema-browser and look at the
It is very easy to do this on Apache, but you need to be aware that
User-Agent is extremely easy to both sniff and spoof.
Have you thought of perhaps using Client and Server Certificates to protect
the connection and embedding those certificates into clients?
Regards,
Alex.
Personal blog:
Hi,
I just saw there is a schema-solr4.xml and a schema.xml in the nutch conf
directory. But with both schemas I get the same errors when starting up
solr.
Heres the stacktrace:
Nov 8, 2012 3:32:14 PM org.apache.solr.core.SolrConfig init
INFO: Loaded SolrConfig: solrconfig.xml
Nov 8, 2012
Sounds like a reasonable request for a new feature to add to Solr.
Question: Would you want the query to SKIP fields that don't have positions
enabled, or to treat a phrase as discrete terms? Or, is that another option
you might need to control for each field?
-- Jack Krupansky
Hm, i copied the schema from Nutch' trunk verbatim and only had to change the
stemmer. It seems like you have, for some reason, a float with an extra point
dangling around somewhere. Can you check?
-Original message-
From:Antony Steiner ant.stei...@gmail.com
Sent: Thu 08-Nov-2012
-Original message-
From:Jack Krupansky j...@basetechnology.com
Sent: Thu 08-Nov-2012 15:56
To: solr-user@lucene.apache.org
Subject: Re: positions and qf parameter in (e)dismax
Sounds like a reasonable request for a new feature to add to Solr.
Question: Would you want the query
Thank you for your answer. If you talk of various ways can you also
comment on some other aproaches?
Am 08.11.2012 15:37, schrieb Gora Mohanty:
On 8 November 2012 16:10, Andreas Niekler
aniek...@informatik.uni-leipzig.de wrote:
Hello,
how could i re-index stored values of my solr index if i
I forgot to mention DictionaryCompoundWordTokenFilterFactory. It does
require you to create a dictionary of terms, as opposed to using the terms
that have been encountered in the index.
-- Jack Krupansky
-Original Message-
From: Jack Krupansky
Sent: Wednesday, November 07, 2012 8:14
This would be an awesome feature to have, wouldn't it?
For now, the best you can do is to create a master dictionary that contains all
of the FirstNames and LastNames and use that as your dictionary's
spellcheck field. This is the copyField technique that you refer to in the
linked post.
On 8 November 2012 20:31, Andreas Niekler
aniek...@informatik.uni-leipzig.de wrote:
Thank you for your answer. If you talk of various ways can you also
comment on some other aproaches?
I am not that familiar with SolrJ, but I think that it should
also be possible to use it to read the stored
You can certainly save the results themselves yourself as well as the
explanations for scoring and then compare them yourself. Add
debugQuery=true to your query and there will be an explain section that
gives all the values used in computing the scores of the top documents.
-- Jack Krupansky
The side attribute must be front or back. Sorry, no both, although
that sounds like a reasonable feature request.
front is the default side.
-- Jack Krupansky
-Original Message-
From: Sohail Aboobaker
Sent: Tuesday, November 06, 2012 7:35 AM
To: solr-user@lucene.apache.org
Subject:
Hi,
We're testing a large multi lingual index with _LANG fields for each language
and using dismax to query them all. Users provide, explicit or implicit,
language preferences that we use for either additive or multiplicative boosting
on the language of the document. However, additive boosting
Hi,
Aha, I think I understand. Yes, you could collect all doc IDs from each
query and find the differences. There is nothing in Solr that can find
those differences or that would store doc IDs of returned hits in the first
place, so you would have to implement this yourself. Sematext's Search
Hi Markus: how are the languages distributed across documents?
Imagine I have a text_en field and a text_fr field. Lets say I have
100 documents, 95 are english and only 5 are french.
So the text_en field is populated 95% of the time, and the text_fr 5%
of the time.
But the default IDF
Looks like a bug. If Solr 4.0, maybe this needs to be in JIRA along with
some sample data you indexed + your schema, so one can reproduce it.
Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html
On Thu, Nov 8,
Thanks Otis,
Indeed here too zoo doc
http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServerSetup
, they advise to choose odd number of zk nodes this way To create a
deployment that can tolerate the failure of F machines, you should count on
deploying 2xF+1 machines...
I've seen the same exact behavior when using analyzed key fields, switching
to string as Erick recommends should solve your problem.
Cheers,
Tim
On Thu, Nov 8, 2012 at 7:45 AM, Erick Erickson erickerick...@gmail.comwrote:
Hmmm, I tried this with a 2 shard cluster and it works just fine, using
Too illustrate:
http://lucene.472066.n3.nabble.com/file/n4019103/SolrAdmin.png
Taking this example, 8983 and 8984 are Shard owner, 7501/7502 just
replicates.
If I stop all instance, then restart 8983 or 8984 first, they won't run and
asked for replicates too be started...
--
View this
I've been trying to boost fields using MoreLikeThis, I haven't been able to
cause the order of the products or their score to change by changing the
field boosts.
I've tried using mlt.ql=field1,field2mlt.qf=field1^2+field2^1 and several
other configurations of the url to try and boost fields, it
Hey all,
I have a query based on a value I'm getting:-
entity name=id transformer=RegexTransformer
query=select * from table
Where all my fields I want are populated correctly, including the multivalue
one which has the format:
field column=findID name=findID splitBy=, /
But
I think that solr by him self doesn't store the queries (correct me if I'm
wrong, about this) but you can accomplish what you want by processing the solr
log (its the only way I think). From the solr log you can get the queries and
then process the queries according to your needs, and change
Is there a service that I can pay to answer questions while I'm configuring and
troubleshooting a Solr deployment?
Solr is a webapp in a war, so you can deploy it in jboss as such.
Otis
--
Performance Monitoring - http://sematext.com/spm
On Nov 8, 2012 1:15 PM, Carlos Alexandro Becker caarl...@gmail.com
wrote:
How can I made SolrCloud work with JBoss? I only find examples with Jetty,
running the start.jar
On 9 November 2012 00:13, Jeff Rhines sen...@gmail.com wrote:
Is there a service that I can pay to answer questions while I'm configuring
and troubleshooting a Solr deployment?
http://wiki.apache.org/solr/Support
Regards,
Gora
Thanks for looking at it.
Id is usually going to be as follows:
some.domain.name_SOMELONGSHA1HASH:/FileName.ext/somechars/1
I indexed it so I could search for the domain name or the hash without storing
it a second time. I'll convert to a string and see if this fixes the problem.
On Nov 8,
Hm, but how I configure zookeeper?
Do I have to do any custom setup?
PS: I'm using solr maven repository, because I have some custom classes..
Thanks in advance.
On Thu, Nov 8, 2012 at 4:45 PM, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:
Solr is a webapp in a war, so you can deploy
Hi Markus,
No answers, but I am very interested in what you find out. We currently
index all languages in one index, which presents different IDF issues, but
are interested in exploring alternatives such as the one you describe.
Tom Burton-West
On 9 November 2012 00:17, Gora Mohanty g...@mimirtech.com wrote:
On 9 November 2012 00:13, Jeff Rhines sen...@gmail.com wrote:
Is there a service that I can pay to answer questions while I'm configuring
and troubleshooting a Solr deployment?
http://wiki.apache.org/solr/Support
This reminds
Yes. :)
Otis
--
Performance Monitoring - http://sematext.com/spm
On Nov 8, 2012 1:53 PM, Gora Mohanty g...@mimirtech.com wrote:
On 9 November 2012 00:17, Gora Mohanty g...@mimirtech.com wrote:
On 9 November 2012 00:13, Jeff Rhines sen...@gmail.com wrote:
Is there a service that I can pay to
Thanks, Jack. This filter should help for dealing with user input without
clear lexical boundaries. I.e. breaking compound-to-be-words into sub-words
on the query side. It does require still mining the dictionary, but is
doable by some simple camel case term frequency analysis.
But would it help
That did it, good sirs. Additionally, debugQuery=true no longer gives me an NPE.
Best Regards,
Jeff
On Nov 8, 2012, at 11:17 AM, Timothy Potter wrote:
I've seen the same exact behavior when using analyzed key fields, switching
to string as Erick recommends should solve your problem.
Greetings,
I have several custom QueryComponents that have high one-time startup costs
(hashing things in the index, caching things from a RDBMS, etc...)
Is there a way to prevent solr from accepting connections before all
QueryComponents are ready?
Especially, since many of our instance are
Hi - i think you're seeing:
https://issues.apache.org/jira/browse/SOLR-3993
-Original message-
From:Bill Au bill.w...@gmail.com
Sent: Thu 08-Nov-2012 21:16
To: solr-user@lucene.apache.org
Subject: best practice for restarting the entire SolrCloud cluster
I have a simple
My replicas are actually on different machines so they do come up. The
problem I found is that since they can't get the leader they just come up
but is not part of the cluster. I can still do local search with
distrib=false. They do not retry to get the leader so I have to restarted
them after
I think Solr does this by default and are you executing warming queries in
the firstSearcher so that these actions are done before Solr is ready to
accept real queries?
On Thu, Nov 8, 2012 at 11:54 AM, Aaron Daubman daub...@gmail.com wrote:
Greetings,
I have several custom QueryComponents
Amit,
I am using warming /firstSearcher queries to ensure this happens before any
external queries are received, however, unless I am misinterpreting the
logs, solr starts responding to admin/ping requests before firstSearcher
completes, and, the LB then puts the solr instance back in the pool,
Sorry I misunderstood. I am having difficulty finding this but it's never
clear the exact load order. It seems odd that you'd be getting requests
when the filter (DispatchFilter) hasn't 100% loaded yet.
I didn't think that the admin handler would allow requests while the
dispatch filter is still
(plus when I deploy, my deploy script
runs some actual simple test queries to ensure they return before enabling
the ping handler to return 200s) to avoid this problem.
What are you doing to programmatically disable/enable the ping handler?
This sounds like exactly what I should be doing as
Hi,
You should just set up ZK independently of JBoss/Solr and then point Solr
to it.
Check this: http://search-lucene.com/?q=solr+jboss
Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html
On Thu, Nov 8, 2012 at
On 11/6/2012 12:25 AM, Shawn Heisey wrote:
If I use this exact same code to talk to a Solr 3.5.0 server (older
version of the SOLR-1972 patch applied) with the ping handler in the
enabled state, I get the following exception. The /admin/ping
handler works in a browser on both Solr versions:
Shawn,
Could this be a side-effect from SOLR-4019, in branch_4.0 this was commit
r1405894 ? Prior to this commit, PingRequestHandler would throw a
SolrException for 503/Bad Request. The change is that the exception isn't
actually thrown but rather sent in place of the response. This
On 11/8/2012 3:25 PM, Dyer, James wrote:
Shawn,
Could this be a side-effect from SOLR-4019, in branch_4.0 this was commit r1405894 ?
Prior to this commit, PingRequestHandler would throw a SolrException for 503/Bad Request.
The change is that the exception isn't actually thrown but rather
Hi Team,
Just trying to find out how to configure AnalyzingQueryParser in Solr 4.0.
Please let me know.
Thanks,
Balaji
--
View this message in context:
http://lucene.472066.n3.nabble.com/Using-AnalyzingQueryParser-Solr-4-0-tp4019193.html
Sent from the Solr - User mailing list archive at
There isn't a QParserPlugIn for that query parser for Solr. You would have
to develop one yourself.
But, why do you think you need that query parser? I mean, the standard query
parsers/analyzers for Solr are now multi-term aware to permit some
combinations of case filtering and wildcards, for
Thank you everyone for your explanation. So for WordDelimiterFilter, let me
see if I got it right.
Given that out-of-the box setting for catenateWords is 0 for query but is 1
for index, then I don't see how this will give me any hits. That is, if my
document has wi-fi, at index time it
The default setting should index BOTH wi fi and wifi. Query for wi-fi,
either with or without quotes will query for wi fi. Incidentally, that is
known as autoGeneratePhraseQueries.
-- Jack Krupansky
-Original Message-
From: johnmu...@aol.com
Sent: Thursday, November 08, 2012 6:20 PM
Hi Alex, I'd like to know how to using Client and Server Certificates to
protect
the connection and embedding those certificates into clients?
Please kindly share your experience.
Floyd
2012/11/8 Alexandre Rafalovitch arafa...@gmail.com
It is very easy to do this on Apache, but you need to
I haven't _done_ this myself, but I believe it is a well supported
scenario. See, for example,
http://httpd.apache.org/docs/2.4/ssl/ssl_howto.html#accesscontrol
and
http://stackoverflow.com/questions/1666052/java-https-client-certificate-authentication
Basically, you create a set of self-signed
Hi Aaron,
Check out
http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/handler/PingRequestHandler.html
You'll see the ?action=enable/disable. I have our load balancers remove the
server out of rotation when the response code != 200 for some number of
times in a row which I suspect you
Hi James,
What i did:
* build a jar from the patch
* downloaded the BDB library
* added them to my classpath
* download a nightly 4.1 Sol build
* created a db config according to:
By specifying the tokenizer in question as a filter in schema.xml for your
text field type. In case it is your custom tokenizer, it must adhere to the
Lucene / SOLR API to submit tokens properly down the processing stream..
like these: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters.
Additional information: i just finished a test for 10.000 records (the db
containts 600K products), it took 25 minutes and all the parents records had
the same 'feature'.
--
View this message in context:
http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tp4015514p4019227.html
83 matches
Mail list logo