In my test case, it seems this new highlighter not working.
When field set multivalue=true, the stored text in this field can not be
highlighted.
Am I miss something? Or this is current limitation? I have no luck to find
any documentations mentioned this.
Floyd
Your query cache is far too small. Most of the default caches are too small.
We run with 10K entries and get a hit rate around 0.30 across four servers.
This rate goes up with more queries, down with less, but try a bigger cache,
especially if you are updating the index infrequently, like once p
Hi,
You could call the optimize command directly on slaves, but specify
the target number of segments, e.g.
/solr/update?optimize=true&maxSegments=10
Not sure I recommend doing this on slaves, but you could - maybe you
have spare capacity. You may also want to consider not doing it on
all yo
Also, in your position, I would be very curious what would happen to
highlighting performance, if I just took the EdgeNGramFilter out of the
analysis chain and reindexed. That would immediately tell you that the
problem lives there (or not).
-- Bryan
> -Original Message-
> From: Bryan Loo
We have SOLR master, primarily for indexing and SOLR slave primarily for
searching. I see that the merge factor plays a key factor in Indexing as
well as searching. I would like to have a high merge factor for my master
instance and low merge factor for slave.
As of now since I just replicate the
Andy,
OK, I get what you're doing. As far as alternate paths, you could index
normally and use WildcardQuery, but that wouldn't get you the boost on
exact word matches. That makes me wonder whether there's a way to use
edismax to combine the results of a wildcard search and a non-wildcard
search a
Hi Jack,
That seems like the solution I am looking for. Thanks so much!
//Can't find this "types" for WDF anywhere.
Ming-
On Tue, Jun 18, 2013 at 4:52 PM, Jack Krupansky wrote:
> The WDF has a "types" attribute which can specify one or more character
> type mapping files. You could create a f
The WDF has a "types" attribute which can specify one or more character type
mapping files. You could create a file like:
@ => ALPHA
_ => ALPHA
For example (from the book!):
Example - Treat at-sign and underscores as text
The file +at-under-alpha.txt+ would contain:
You can use keyword tokenizer..
Creates org.apache.lucene.analysis.core.KeywordTokenizer.
Treats the entire field as a single token, regardless of its content.
Example: "http://example.com/I-am+example?Text=-Hello"; ==>
"http://example.com/I-am+example?Text=-Hello";
--
View this message in co
We need to index and search lots of tweets which can like "@solr: solr is
great". or "@solr_lucene, good combination".
And we want to search with "@solr" or "@solr_lucene". How can we preserve
"@" and "_" in the index?
If using whitespacetokennizer followed by worddelimiterfilter, @solr_lucene
Hi
In continuing a previous conversation, I am attempting to not have to do
optimizes on our continuously updated index in solr3.6.1 and I came across the
mention of the reclaimDeletesWeight setting in this blog:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
We
I just finished a test with the patch, and it looks like all is working well.
On Jun 18, 2013, at 12:19 PM, Al Wold wrote:
> For the CREATE call, I'm doing it manually per the instructions here:
>
> http://wiki.apache.org/solr/SolrCloud
>
> Here's the exact URL I'm using:
>
> http://asu-solr-c
Hi,
I really need your input on my problem in constructing a view with
selective columns of table GENE and table TAXON.
I am importing data from database tables GENE and table TAXON into solr.
The two tables are
connected through 'taxon' column in table GENE and 'taxon_oid' column in
table TAXO
: Thanks Chris. That worked.. just one correction instead of *df -> qf *
if you're using multiple fields (with optional boosts) then yes, you need
qf ... but in your examle you knew exactly which (one) field you wanted,
and df should work fine for that -- because qf defaults to df.
-Hoss
Thanks Chris. That worked.. just one correction instead of *df -> qf *
On Tue, Jun 18, 2013 at 2:05 PM, Chris Hostetter-3 [via Lucene] <
ml-node+s472066n4071423...@n3.nabble.com> wrote:
>
> : query something like
> :
> :
> http://localhost:8983/solr/select?q=(category:lcd+OR+category:led+OR+cat
Erick,
We at AOL mail have been using SOLR for quiet a while and our system is pretty
write heavy and disk I/O is one of our bottlenecks. At present we use regular
SOLR in the lotsOfCore configuration and I am in the process of benchmarking
SOLR cloud for our use case. I don't have concrete d
bq: the replica can take over and maintain a durable
state of my index
This is not true. On an update, all the nodes in a slice
have already written the data to the tlog, not just the
leader. So if a leader goes down, the replicas have
enough local info to insure that data is not lost. Without
tlo
Not necessarily. If the auth tokens are available on some
other system (DB, LDAP, whatever), one could get them
in the PostFilter and cache them somewhere since,
presumably, they wouldn't be changing all that often. Or
use a UserCache and get notified whenever a new searcher
was opened and regenera
For the CREATE call, I'm doing it manually per the instructions here:
http://wiki.apache.org/solr/SolrCloud
Here's the exact URL I'm using:
http://asu-solr-cloud.elasticbeanstalk.com/admin/collections?action=CREATE&name=directory&numShards=2&replicationFactor=2&maxShardsPerNode=2
I'm testing ou
Hi Andre,
Wow that is astonishing! I will definitely also try that out! Just set the
facet method on a per field basis for the less used sparse facet fields eh?
Thanks for the tip.
Thanks
Robi
-Original Message-
From: Andre Bois-Crettez [mailto:andre.b...@kelkoo.com]
Sent: Tuesday,
On 6/18/2013 11:06 AM, Learner wrote:
My issue is that, I see that the documents are getting adding to server even
before it reaches the queue size. Am I doing anything wrong? Or is
queuesize not implemented yet?
Also I dont see a very big performance improvements when I increase /
decrease the
Looks like the javadoc on this parameter could use a little tweaking.
>From looking at the 4.3 source code (hoping I get this right :-), it appears
>the ConcurrentUpdateSolrServer will begin sending documents (on a single
>thread) as soon as the first document is added.
New threads (up to thread
: query something like
:
:
http://localhost:8983/solr/select?q=(category:lcd+OR+category:led+OR+category:plasma)+AND+(manufacture:sony+OR+manufacture:samsung+OR+manufacture:apple)&facet.field=category&facet.field=manufacture&fl=id&mm=2
Here's an example of something similar using the Solr 4.3 e
In reading the newer solrconfig in the example conf folder it seems like it is
saying this setting ' 10' is shorthand to putting
the below and that these both are the defaults? It says 'The default since
Solr/Lucene 3.3 is TieredMergePolicy.' So isn't this setting already in effect
for me?
Hi Otis,
Yes the query results cache is just about worthless. I guess we have too
diverse of a set of user queries. The business unit has decided to let bots
crawl our search pages too so that doesn't help either. I turned it way down
but decided to keep it because my understanding was tha
Thanks, Roman. I'm going to do some digging...
On Mon, Jun 17, 2013 at 9:53 PM, Roman Chyla wrote:
> Hello Yanis,
>
> We are probably using something similar - eg. 'functional operators' - eg.
> edismax() to treat everything inside the bracket as an argument for
> edismax, or pos() to search f
I am using ConcurrentUpdateSolrserver to create 4 threads (threadCount=4)
with queueSize of 3.
Indexing works fine as expected.
My issue is that, I see that the documents are getting adding to server even
before it reaches the queue size. Am I doing anything wrong? Or is
queuesize not implem
Looks like zk does not contain the configuration called: collection1.
You can use zkCli.sh to see what's inside "configs" zk node. You can
manually push config via zkCli's upconfig (not very sure how it works).
Try adding this arg: " -Dbootstrap_conf=true" in place of
"-Dbootstrap_confdir=./solr/c
Beautiful. Thanks!
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring - http://sematext.com/spm/index.html
On Tue, Jun 18, 2013 at 12:34 PM, Mark Miller wrote:
> No, the hash ranges are split and new docs go to both new shards.
>
> - Mark
>
> On Jun 18, 201
No, the hash ranges are split and new docs go to both new shards.
- Mark
On Jun 18, 2013, at 12:25 PM, Otis Gospodnetic
wrote:
> Hi,
>
> Imagine a (common) situation where you use document routing and you
> end up with 1 large shards (e.g. 1 large user with lots of docs).
> Shard splitting wi
Hi,
Imagine a (common) situation where you use document routing and you
end up with 1 large shards (e.g. 1 large user with lots of docs).
Shard splitting will help here, because we can break up that 1 shard
in 2 smaller shards (and maybe do that "recursively" to make shards
sufficiently small).
B
Hello,
SimplyHired.com, a job search engine with the biggest job index in the
world is looking for engineers to help us with our core search and auction
systems.
Some of the problems you will be working on are,
a) Scaling to millions of requests
b) Working with millions of jobs
c) Maximizing the
June 2013, Apache Solr™ 4.3.1 available
The Lucene PMC is pleased to announce the release of Apache Solr 4.3.1
Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted searc
Hi,
I think you are after indexing tokens with begin/end markers.
e.g.
"This is a sample string" becomes:
_This$
_is$
_a$
_sample$
_string$
+ (edge) ngrams of the above tokens
Then a query for /This string/ could become:
_This$^100 _string$^100 this string
(or something along those lines)
So th
You can in fact have multiple collections in Solr and do a limited amount of
joining, and Solr has multivalued fields as well, but none of those
techniques should be used to avoid the process of flattening and
denormalizing a relational data model. It is hard work, but yes, it is
required to us
SolrJ already has access to zookeeper cluster state. Network I/O bottleneck can
be avoided by parallel requests.
You are only as slow as your slowest responding server, which could be your
single leader with the current set up.
Wouldn't this lessen the burden of the leader, as he does not have
Mark,
All I am doing are inserts, afaik search side deadlocks should not be an issue.
I am using Jmeter, standard test driver we use for most of our benchmarks and
stats collection.
My jmeter.jmx file- http://apaste.info/79IS , maybe i overlooked something
Is there a benchmark script that sol
There are two newer parameters that work better than "onlyMorePopular":
spellcheck.alternativeTermCount
- This is the # of suggestions you want for terms that exist in the index. You
can set it the same as "spellcheck.count", or less if you don't want as many
suggestions for these.
http://wiki.
debugQuery=true adds an extra block of XML to the bottom that will give
you extra info.
Alternatively, add fl=*,[explain] to your URL. That'll give you an extra
field in your output. Then, view the source to see it structured
properly.
Upayavira
On Tue, Jun 18, 2013, at 02:52 PM, Joe Zhang wrote
On 06/18/2013 09:20 AM, Alexandre Rafalovitch wrote:
On Tue, Jun 18, 2013 at 7:44 AM, Michael Sokolov
wrote:
I'm pleased to announce the first public release of Lux (version 0.9.1), an
XML search engine embedding Saxon 9 and Lucene/Solr 4.
Congratulations, this looks very interestin
Hi Jack,
Thanks, for you kind comment.
I am truly in the beginning of data modeling my schema over an existing
working DB.
I have used the school-teachers-student db as an example scenario.
(a, I have written it as a disclaimer in my first post. b. I really do not
know anyone that has 300 hobbies
I did include "debugQuery=on" in the query, but nothing extra showed up in
the response.
On Mon, Jun 17, 2013 at 10:29 PM, Gora Mohanty wrote:
> On 18 June 2013 10:49, Joe Zhang wrote:
> > I issued a simple query ("apple") to my collection and got 201 documents
> > back, all of which are score
OK, I think I see what's happening. If you do
NOT specify an instanceDir on the create
(and I'm doing this via the core admin
interface, not SolrJ) then the default is
used, but not persisted. If you _do_
specify the instance dir, it will be persisted.
I've put up another quick patch (tested
only
It sounds like you still have a lot of work to do on your data model. No
matter how you slice it, 8 billion rows/fields/whatever is still way too
much for any engine to search on a single server. If you have 8 billion of
anything, a heavily sharded SolrCloud cluster is probably warranted. Don't
On Tue, Jun 18, 2013 at 7:44 AM, Michael Sokolov
wrote:
> I'm pleased to announce the first public release of Lux (version 0.9.1), an
> XML search engine embedding Saxon 9 and Lucene/Solr 4.
Congratulations, this looks very interesting. I am guessing, this
is/will be replacing MarkLogic that Safa
@Gora: yes.
User name and pass.
On Tue, Jun 18, 2013 at 2:57 PM, Gora Mohanty wrote:
> On 18 June 2013 17:16, Erick Erickson wrote:
> > What do you mean "encrypt"? The stored value?
> > the indexed value? Over the wire?
> [...]
>
> My understanding was that he wanted to encrypt the
> username/
great tip :-)
On Tue, Jun 18, 2013 at 2:36 PM, Erick Erickson wrote:
> if the _solr_ type is "string", then you aren't getting any
> tokenization, so "my dog has fleas" is indexed as
> "my dog has fleas", a single token. To search
> for individual words you need to use, say, the
> "text_general"
Just to make sure.
In my previous question I was referring to the user/pass that queries the
db.
Now I was referring to the user/pass that i want for the solr http request.
Think of it as if my user sends a request where he filter documents created
by another user.
I want to restrict that.
I curr
OK, I put up a very preliminary patch attached to the bug
if you want to try it out that addresses the extra junk being
put in the tag. Doesn't address the instanceDir issue
since I haven't reproduced it yet.
Erick
On Tue, Jun 18, 2013 at 8:46 AM, Erick Erickson wrote:
> Whoa! What's this junk?
Hi,
The unfortunate thing about this is what you still have to *pass* that
filter from the client to the server every time you want to use that
filter. If that filter is big/long, passing that in all the time has
some price that could be eliminated by using "server-side named
filters".
Otis
--
S
Whoa! What's this junk?
qt="/admin/cores" wt="javabin" version="2
That shouldn't be being preserved, and the instancedir should be!
So I'm guessing you're using SolrJ to create the core, but I just
reproduced the problem (at least the 'wt="json" ') bit from the
browser and even from one of my int
You might consider "post filters". The idea
is to write a custom filter that gets applied
after all other filters etc. One use-case
here is exactly ACL lists, and can be quite
helpful if you're not doing *:* type queries.
Best
Erick
On Mon, Jun 17, 2013 at 5:12 PM, Otis Gospodnetic
wrote:
> Btw.
On 18 June 2013 17:16, Erick Erickson wrote:
> What do you mean "encrypt"? The stored value?
> the indexed value? Over the wire?
[...]
My understanding was that he wanted to encrypt the
username/password in the DIH configuration file.
"Mysurf Mail", could you please clarify?
Regards,
Gora
I cannot seem to be able to get the default cloud setup to work properly.
What I did:
Downloaded the binaries, extracted.
Made the pwd example
Ran: java -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
And got the error message: Caus
What do you mean "encrypt"? The stored value?
the indexed value? Over the wire?
Here's the problem with indexing indexed terms...
you can't search it reliably. Any decent
encryption algorithm isn't going to let you, for
instance, search wildcards since the encrypted
value for "awesome" better prod
I'm pleased to announce the first public release of Lux (version 0.9.1),
an XML search engine embedding Saxon 9 and Lucene/Solr 4. Lux offers
many features found in XML databases: persistent XML storage,
index-optimized querying, an interactive query window, and some
application support feature
if the _solr_ type is "string", then you aren't getting any
tokenization, so "my dog has fleas" is indexed as
"my dog has fleas", a single token. To search
for individual words you need to use, say, the
"text_general" type, which would index
"my" "dog" "has" "fleas"
Best
Erick
On Mon, Jun 17, 201
On Tue, 2013-06-18 at 12:17 +0200, Prathik Puthran wrote:
> The 2nd query returns the complete matches as well. So I will have to
> filter out the complete matches from the partial match results.
Without testing:
(Brad OR Pitt) NOT (Brad AND Pitt)
Although that does require you to parse the query
The 2nd query returns the complete matches as well. So I will have to
filter out the complete matches from the partial match results.
On Tue, Jun 18, 2013 at 3:31 PM, Upayavira wrote:
> With two queries.
>
> I'm not sure there's another way to do it. Unless you were prepared to
> get coding, an
Recently we had steadily increasing memory usage and OOM due to facets
on dynamic fields.
The default facet.method=fc need to build a large array of maxdocs ints
for each field (a fieldCache or fieldValueCahe entry), whether it is
sparsely populated or not.
Once you have reduced your number of ma
With two queries.
I'm not sure there's another way to do it. Unless you were prepared to
get coding, and implement another SearchComponent, but given that you
can achieve it with two queries, that seems overkill to me.
Upayavira
On Tue, Jun 18, 2013, at 10:59 AM, Prathik Puthran wrote:
> Hi,
>
Hi,
I wanted to know if it is possible to tweak solr to return the results of
both complete and partial query matches.
For eg:
If the search query is "Brad Pitt" and if the query parser is "AND" Solr
returns all documents indexed against the term "Brad Pitt".
If the query parser is "OR" Solr retu
What version of Solr? I had something like this on 4.2.1. Upgraging to
4.3 sorted it.
Upayavira
On Tue, Jun 18, 2013, at 09:37 AM, Ophir Michaeli wrote:
> Hi,
>
> I built a 2 shards and 2 replicas system that works ok on a local
> machine, 1
> zookeeper on shard 1.
> It appears ok on the solar
Hello,
just for information: the Solution might look like (1st approach):
I take the sourcecode of the BinaryResponsewriter and surround the
serialization with some tracking methods.
Then I create a custom QueryResponseWriter, which implements the binary
Response writer and voila, i get my sta
Hi,
I built a 2 shards and 2 replicas system that works ok on a local machine, 1
zookeeper on shard 1.
It appears ok on the solar monitor page, cloud tab
(http://localhost:8983/solr/#/~cloud).
When I move to using different machines, each shard/replica on a different
machine I get a wrong cloud-
On 18 June 2013 13:10, Mysurf Mail wrote:
> Hi,
> In order to add solr to my prod environmnet I have to implement some
> security restriction.
> Is there a way to add user/pass to the requests and to keep them
> *encrypted*in a file.
As mentioned earlier, no there is no built-in way of doing that
Hi,
In order to add solr to my prod environmnet I have to implement some
security restriction.
Is there a way to add user/pass to the requests and to keep them
*encrypted*in a file.
thanks.
67 matches
Mail list logo