PostingsSolrHighlighter not working on Multivalue field

2013-06-18 Thread Floyd Wu
In my test case, it seems this new highlighter not working. When field set multivalue=true, the stored text in this field can not be highlighted. Am I miss something? Or this is current limitation? I have no luck to find any documentations mentioned this. Floyd

Re: yet another optimize question

2013-06-18 Thread Walter Underwood
Your query cache is far too small. Most of the default caches are too small. We run with 10K entries and get a hit rate around 0.30 across four servers. This rate goes up with more queries, down with less, but try a bigger cache, especially if you are updating the index infrequently, like once p

Re: Merge tool based on mergefactor

2013-06-18 Thread Otis Gospodnetic
Hi, You could call the optimize command directly on slaves, but specify the target number of segments, e.g. /solr/update?optimize=true&maxSegments=10 Not sure I recommend doing this on slaves, but you could - maybe you have spare capacity. You may also want to consider not doing it on all yo

RE: Slow Highlighter Performance Even Using FastVectorHighlighter

2013-06-18 Thread Bryan Loofbourrow
Also, in your position, I would be very curious what would happen to highlighting performance, if I just took the EdgeNGramFilter out of the analysis chain and reindexed. That would immediately tell you that the problem lives there (or not). -- Bryan > -Original Message- > From: Bryan Loo

Merge tool based on mergefactor

2013-06-18 Thread Learner
We have SOLR master, primarily for indexing and SOLR slave primarily for searching. I see that the merge factor plays a key factor in Indexing as well as searching. I would like to have a high merge factor for my master instance and low merge factor for slave. As of now since I just replicate the

RE: Slow Highlighter Performance Even Using FastVectorHighlighter

2013-06-18 Thread Bryan Loofbourrow
Andy, OK, I get what you're doing. As far as alternate paths, you could index normally and use WildcardQuery, but that wouldn't get you the boost on exact word matches. That makes me wonder whether there's a way to use edismax to combine the results of a wildcard search and a non-wildcard search a

Re: preserve special characters

2013-06-18 Thread Mingfeng Yang
Hi Jack, That seems like the solution I am looking for. Thanks so much! //Can't find this "types" for WDF anywhere. Ming- On Tue, Jun 18, 2013 at 4:52 PM, Jack Krupansky wrote: > The WDF has a "types" attribute which can specify one or more character > type mapping files. You could create a f

Re: preserve special characters

2013-06-18 Thread Jack Krupansky
The WDF has a "types" attribute which can specify one or more character type mapping files. You could create a file like: @ => ALPHA _ => ALPHA For example (from the book!): Example - Treat at-sign and underscores as text The file +at-under-alpha.txt+ would contain:

Re: preserve special characters

2013-06-18 Thread Learner
You can use keyword tokenizer.. Creates org.apache.lucene.analysis.core.KeywordTokenizer. Treats the entire field as a single token, regardless of its content. Example: "http://example.com/I-am+example?Text=-Hello"; ==> "http://example.com/I-am+example?Text=-Hello"; -- View this message in co

preserve special characters

2013-06-18 Thread Mingfeng Yang
We need to index and search lots of tweets which can like "@solr: solr is great". or "@solr_lucene, good combination". And we want to search with "@solr" or "@solr_lucene". How can we preserve "@" and "_" in the index? If using whitespacetokennizer followed by worddelimiterfilter, @solr_lucene

TieredMergePolicy reclaimDeletesWeight

2013-06-18 Thread Petersen, Robert
Hi In continuing a previous conversation, I am attempting to not have to do optimizes on our continuously updated index in solr3.6.1 and I came across the mention of the reclaimDeletesWeight setting in this blog: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html We

Re: Solr cloud: zkHost in solr.xml gets wiped out

2013-06-18 Thread Al Wold
I just finished a test with the patch, and it looks like all is working well. On Jun 18, 2013, at 12:19 PM, Al Wold wrote: > For the CREATE call, I'm doing it manually per the instructions here: > > http://wiki.apache.org/solr/SolrCloud > > Here's the exact URL I'm using: > > http://asu-solr-c

Need help in constructing a view with selective columns of two table

2013-06-18 Thread Jenny Huang
Hi, I really need your input on my problem in constructing a view with selective columns of table GENE and table TAXON. I am importing data from database tables GENE and table TAXON into solr. The two tables are connected through 'taxon' column in table GENE and 'taxon_oid' column in table TAXO

Re: mm (Minimum 'Should' Match)

2013-06-18 Thread Chris Hostetter
: Thanks Chris. That worked.. just one correction instead of *df -> qf * if you're using multiple fields (with optional boosts) then yes, you need qf ... but in your examle you knew exactly which (one) field you wanted, and df should work fine for that -- because qf defaults to df. -Hoss

Re: mm (Minimum 'Should' Match)

2013-06-18 Thread anand_solr
Thanks Chris. That worked.. just one correction instead of *df -> qf * On Tue, Jun 18, 2013 at 2:05 PM, Chris Hostetter-3 [via Lucene] < ml-node+s472066n4071423...@n3.nabble.com> wrote: > > : query something like > : > : > http://localhost:8983/solr/select?q=(category:lcd+OR+category:led+OR+cat

Re: SOLR Cloud - Disable Transaction Logs

2013-06-18 Thread Rishi Easwaran
Erick, We at AOL mail have been using SOLR for quiet a while and our system is pretty write heavy and disk I/O is one of our bottlenecks. At present we use regular SOLR in the lotsOfCore configuration and I am in the process of benchmarking SOLR cloud for our use case. I don't have concrete d

Re: SOLR Cloud - Disable Transaction Logs

2013-06-18 Thread Erick Erickson
bq: the replica can take over and maintain a durable state of my index This is not true. On an update, all the nodes in a slice have already written the data to the tlog, not just the leader. So if a leader goes down, the replicas have enough local info to insure that data is not lost. Without tlo

Re: Solr large boolean filter

2013-06-18 Thread Erick Erickson
Not necessarily. If the auth tokens are available on some other system (DB, LDAP, whatever), one could get them in the PostFilter and cache them somewhere since, presumably, they wouldn't be changing all that often. Or use a UserCache and get notified whenever a new searcher was opened and regenera

Re: Solr cloud: zkHost in solr.xml gets wiped out

2013-06-18 Thread Al Wold
For the CREATE call, I'm doing it manually per the instructions here: http://wiki.apache.org/solr/SolrCloud Here's the exact URL I'm using: http://asu-solr-cloud.elasticbeanstalk.com/admin/collections?action=CREATE&name=directory&numShards=2&replicationFactor=2&maxShardsPerNode=2 I'm testing ou

RE: yet another optimize question

2013-06-18 Thread Petersen, Robert
Hi Andre, Wow that is astonishing! I will definitely also try that out! Just set the facet method on a per field basis for the less used sparse facet fields eh? Thanks for the tip. Thanks Robi -Original Message- From: Andre Bois-Crettez [mailto:andre.b...@kelkoo.com] Sent: Tuesday,

Re: ConcurrentUpdateSolrserver - Queue size not working

2013-06-18 Thread Shawn Heisey
On 6/18/2013 11:06 AM, Learner wrote: My issue is that, I see that the documents are getting adding to server even before it reaches the queue size. Am I doing anything wrong? Or is queuesize not implemented yet? Also I dont see a very big performance improvements when I increase / decrease the

RE: ConcurrentUpdateSolrserver - Queue size not working

2013-06-18 Thread James Thomas
Looks like the javadoc on this parameter could use a little tweaking. >From looking at the 4.3 source code (hoping I get this right :-), it appears >the ConcurrentUpdateSolrServer will begin sending documents (on a single >thread) as soon as the first document is added. New threads (up to thread

Re: mm (Minimum 'Should' Match)

2013-06-18 Thread Chris Hostetter
: query something like : : http://localhost:8983/solr/select?q=(category:lcd+OR+category:led+OR+category:plasma)+AND+(manufacture:sony+OR+manufacture:samsung+OR+manufacture:apple)&facet.field=category&facet.field=manufacture&fl=id&mm=2 Here's an example of something similar using the Solr 4.3 e

RE: yet another optimize question

2013-06-18 Thread Petersen, Robert
In reading the newer solrconfig in the example conf folder it seems like it is saying this setting ' 10' is shorthand to putting the below and that these both are the defaults? It says 'The default since Solr/Lucene 3.3 is TieredMergePolicy.' So isn't this setting already in effect for me?

RE: yet another optimize question

2013-06-18 Thread Petersen, Robert
Hi Otis, Yes the query results cache is just about worthless. I guess we have too diverse of a set of user queries. The business unit has decided to let bots crawl our search pages too so that doesn't help either. I turned it way down but decided to keep it because my understanding was tha

Re: New operator.

2013-06-18 Thread Yanis Kakamaikis
Thanks, Roman. I'm going to do some digging... On Mon, Jun 17, 2013 at 9:53 PM, Roman Chyla wrote: > Hello Yanis, > > We are probably using something similar - eg. 'functional operators' - eg. > edismax() to treat everything inside the bracket as an argument for > edismax, or pos() to search f

ConcurrentUpdateSolrserver - Queue size not working

2013-06-18 Thread Learner
I am using ConcurrentUpdateSolrserver to create 4 threads (threadCount=4) with queueSize of 3. Indexing works fine as expected. My issue is that, I see that the documents are getting adding to server even before it reaches the queue size. Am I doing anything wrong? Or is queuesize not implem

Re: Running solr cloud

2013-06-18 Thread Utkarsh Sengar
Looks like zk does not contain the configuration called: collection1. You can use zkCli.sh to see what's inside "configs" zk node. You can manually push config via zkCli's upconfig (not very sure how it works). Try adding this arg: " -Dbootstrap_conf=true" in place of "-Dbootstrap_confdir=./solr/c

Re: Shard splitting and document routing

2013-06-18 Thread Otis Gospodnetic
Beautiful. Thanks! Otis -- Solr & ElasticSearch Support -- http://sematext.com/ Performance Monitoring - http://sematext.com/spm/index.html On Tue, Jun 18, 2013 at 12:34 PM, Mark Miller wrote: > No, the hash ranges are split and new docs go to both new shards. > > - Mark > > On Jun 18, 201

Re: Shard splitting and document routing

2013-06-18 Thread Mark Miller
No, the hash ranges are split and new docs go to both new shards. - Mark On Jun 18, 2013, at 12:25 PM, Otis Gospodnetic wrote: > Hi, > > Imagine a (common) situation where you use document routing and you > end up with 1 large shards (e.g. 1 large user with lots of docs). > Shard splitting wi

Shard splitting and document routing

2013-06-18 Thread Otis Gospodnetic
Hi, Imagine a (common) situation where you use document routing and you end up with 1 large shards (e.g. 1 large user with lots of docs). Shard splitting will help here, because we can break up that 1 shard in 2 smaller shards (and maybe do that "recursively" to make shards sufficiently small). B

Looking for Search Engineers

2013-06-18 Thread Jagdish Nomula
Hello, SimplyHired.com, a job search engine with the biggest job index in the world is looking for engineers to help us with our core search and auction systems. Some of the problems you will be working on are, a) Scaling to millions of requests b) Working with millions of jobs c) Maximizing the

[ANNOUNCE] Apache Solr 4.3.1 released

2013-06-18 Thread Shalin Shekhar Mangar
June 2013, Apache Solr™ 4.3.1 available The Lucene PMC is pleased to announce the release of Apache Solr 4.3.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted searc

Re: Different scores for exact and non-exact matching

2013-06-18 Thread Otis Gospodnetic
Hi, I think you are after indexing tokens with begin/end markers. e.g. "This is a sample string" becomes: _This$ _is$ _a$ _sample$ _string$ + (edge) ngrams of the above tokens Then a query for /This string/ could become: _This$^100 _string$^100 this string (or something along those lines) So th

Re: How to define my data in schema.xml

2013-06-18 Thread Jack Krupansky
You can in fact have multiple collections in Solr and do a limited amount of joining, and Solr has multivalued fields as well, but none of those techniques should be used to avoid the process of flattening and denormalizing a relational data model. It is hard work, but yes, it is required to us

Re: SOLR Cloud - Disable Transaction Logs

2013-06-18 Thread Rishi Easwaran
SolrJ already has access to zookeeper cluster state. Network I/O bottleneck can be avoided by parallel requests. You are only as slow as your slowest responding server, which could be your single leader with the current set up. Wouldn't this lessen the burden of the leader, as he does not have

Re: Solr Cloud Hangs consistently .

2013-06-18 Thread Rishi Easwaran
Mark, All I am doing are inserts, afaik search side deadlocks should not be an issue. I am using Jmeter, standard test driver we use for most of our benchmarks and stats collection. My jmeter.jmx file- http://apaste.info/79IS , maybe i overlooked something Is there a benchmark script that sol

RE: How spell checker used if indexed document is containing misspelled words

2013-06-18 Thread Dyer, James
There are two newer parameters that work better than "onlyMorePopular": spellcheck.alternativeTermCount - This is the # of suggestions you want for terms that exist in the index. You can set it the same as "spellcheck.count", or less if you don't want as many suggestions for these. http://wiki.

Re: what does a zero score mean?

2013-06-18 Thread Upayavira
debugQuery=true adds an extra block of XML to the bottom that will give you extra info. Alternatively, add fl=*,[explain] to your URL. That'll give you an extra field in your output. Then, view the source to see it structured properly. Upayavira On Tue, Jun 18, 2013, at 02:52 PM, Joe Zhang wrote

Re: [ANN] Lux XML search engine

2013-06-18 Thread Michael Sokolov
On 06/18/2013 09:20 AM, Alexandre Rafalovitch wrote: On Tue, Jun 18, 2013 at 7:44 AM, Michael Sokolov wrote: I'm pleased to announce the first public release of Lux (version 0.9.1), an XML search engine embedding Saxon 9 and Lucene/Solr 4. Congratulations, this looks very interestin

Re: How to define my data in schema.xml

2013-06-18 Thread Mysurf Mail
Hi Jack, Thanks, for you kind comment. I am truly in the beginning of data modeling my schema over an existing working DB. I have used the school-teachers-student db as an example scenario. (a, I have written it as a disclaimer in my first post. b. I really do not know anyone that has 300 hobbies

Re: what does a zero score mean?

2013-06-18 Thread Joe Zhang
I did include "debugQuery=on" in the query, but nothing extra showed up in the response. On Mon, Jun 17, 2013 at 10:29 PM, Gora Mohanty wrote: > On 18 June 2013 10:49, Joe Zhang wrote: > > I issued a simple query ("apple") to my collection and got 201 documents > > back, all of which are score

Re: Solr cloud: zkHost in solr.xml gets wiped out

2013-06-18 Thread Erick Erickson
OK, I think I see what's happening. If you do NOT specify an instanceDir on the create (and I'm doing this via the core admin interface, not SolrJ) then the default is used, but not persisted. If you _do_ specify the instance dir, it will be persisted. I've put up another quick patch (tested only

Re: How to define my data in schema.xml

2013-06-18 Thread Jack Krupansky
It sounds like you still have a lot of work to do on your data model. No matter how you slice it, 8 billion rows/fields/whatever is still way too much for any engine to search on a single server. If you have 8 billion of anything, a heavily sharded SolrCloud cluster is probably warranted. Don't

Re: [ANN] Lux XML search engine

2013-06-18 Thread Alexandre Rafalovitch
On Tue, Jun 18, 2013 at 7:44 AM, Michael Sokolov wrote: > I'm pleased to announce the first public release of Lux (version 0.9.1), an > XML search engine embedding Saxon 9 and Lucene/Solr 4. Congratulations, this looks very interesting. I am guessing, this is/will be replacing MarkLogic that Safa

Re: Is there a way to encrypt username and pass in the solr config file

2013-06-18 Thread Mysurf Mail
@Gora: yes. User name and pass. On Tue, Jun 18, 2013 at 2:57 PM, Gora Mohanty wrote: > On 18 June 2013 17:16, Erick Erickson wrote: > > What do you mean "encrypt"? The stored value? > > the indexed value? Over the wire? > [...] > > My understanding was that he wanted to encrypt the > username/

Re: Need assistance in defining solr to process user generated query text

2013-06-18 Thread Mysurf Mail
great tip :-) On Tue, Jun 18, 2013 at 2:36 PM, Erick Erickson wrote: > if the _solr_ type is "string", then you aren't getting any > tokenization, so "my dog has fleas" is indexed as > "my dog has fleas", a single token. To search > for individual words you need to use, say, the > "text_general"

Re: implementing identity authentication in SOLR

2013-06-18 Thread Mysurf Mail
Just to make sure. In my previous question I was referring to the user/pass that queries the db. Now I was referring to the user/pass that i want for the solr http request. Think of it as if my user sends a request where he filter documents created by another user. I want to restrict that. I curr

Re: Solr cloud: zkHost in solr.xml gets wiped out

2013-06-18 Thread Erick Erickson
OK, I put up a very preliminary patch attached to the bug if you want to try it out that addresses the extra junk being put in the tag. Doesn't address the instanceDir issue since I haven't reproduced it yet. Erick On Tue, Jun 18, 2013 at 8:46 AM, Erick Erickson wrote: > Whoa! What's this junk?

Re: Solr large boolean filter

2013-06-18 Thread Otis Gospodnetic
Hi, The unfortunate thing about this is what you still have to *pass* that filter from the client to the server every time you want to use that filter. If that filter is big/long, passing that in all the time has some price that could be eliminated by using "server-side named filters". Otis -- S

Re: Solr cloud: zkHost in solr.xml gets wiped out

2013-06-18 Thread Erick Erickson
Whoa! What's this junk? qt="/admin/cores" wt="javabin" version="2 That shouldn't be being preserved, and the instancedir should be! So I'm guessing you're using SolrJ to create the core, but I just reproduced the problem (at least the 'wt="json" ') bit from the browser and even from one of my int

Re: Solr large boolean filter

2013-06-18 Thread Erick Erickson
You might consider "post filters". The idea is to write a custom filter that gets applied after all other filters etc. One use-case here is exactly ACL lists, and can be quite helpful if you're not doing *:* type queries. Best Erick On Mon, Jun 17, 2013 at 5:12 PM, Otis Gospodnetic wrote: > Btw.

Re: Is there a way to encrypt username and pass in the solr config file

2013-06-18 Thread Gora Mohanty
On 18 June 2013 17:16, Erick Erickson wrote: > What do you mean "encrypt"? The stored value? > the indexed value? Over the wire? [...] My understanding was that he wanted to encrypt the username/password in the DIH configuration file. "Mysurf Mail", could you please clarify? Regards, Gora

Running solr cloud

2013-06-18 Thread Daniel Mosesson
I cannot seem to be able to get the default cloud setup to work properly. What I did: Downloaded the binaries, extracted. Made the pwd example Ran: java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar And got the error message: Caus

Re: Is there a way to encrypt username and pass in the solr config file

2013-06-18 Thread Erick Erickson
What do you mean "encrypt"? The stored value? the indexed value? Over the wire? Here's the problem with indexing indexed terms... you can't search it reliably. Any decent encryption algorithm isn't going to let you, for instance, search wildcards since the encrypted value for "awesome" better prod

[ANN] Lux XML search engine

2013-06-18 Thread Michael Sokolov
I'm pleased to announce the first public release of Lux (version 0.9.1), an XML search engine embedding Saxon 9 and Lucene/Solr 4. Lux offers many features found in XML databases: persistent XML storage, index-optimized querying, an interactive query window, and some application support feature

Re: Need assistance in defining solr to process user generated query text

2013-06-18 Thread Erick Erickson
if the _solr_ type is "string", then you aren't getting any tokenization, so "my dog has fleas" is indexed as "my dog has fleas", a single token. To search for individual words you need to use, say, the "text_general" type, which would index "my" "dog" "has" "fleas" Best Erick On Mon, Jun 17, 201

Re: Returning both partial and complete match results in solr

2013-06-18 Thread Toke Eskildsen
On Tue, 2013-06-18 at 12:17 +0200, Prathik Puthran wrote: > The 2nd query returns the complete matches as well. So I will have to > filter out the complete matches from the partial match results. Without testing: (Brad OR Pitt) NOT (Brad AND Pitt) Although that does require you to parse the query

Re: Returning both partial and complete match results in solr

2013-06-18 Thread Prathik Puthran
The 2nd query returns the complete matches as well. So I will have to filter out the complete matches from the partial match results. On Tue, Jun 18, 2013 at 3:31 PM, Upayavira wrote: > With two queries. > > I'm not sure there's another way to do it. Unless you were prepared to > get coding, an

Re: yet another optimize question

2013-06-18 Thread Andre Bois-Crettez
Recently we had steadily increasing memory usage and OOM due to facets on dynamic fields. The default facet.method=fc need to build a large array of maxdocs ints for each field (a fieldCache or fieldValueCahe entry), whether it is sparsely populated or not. Once you have reduced your number of ma

Re: Returning both partial and complete match results in solr

2013-06-18 Thread Upayavira
With two queries. I'm not sure there's another way to do it. Unless you were prepared to get coding, and implement another SearchComponent, but given that you can achieve it with two queries, that seems overkill to me. Upayavira On Tue, Jun 18, 2013, at 10:59 AM, Prathik Puthran wrote: > Hi, >

Returning both partial and complete match results in solr

2013-06-18 Thread Prathik Puthran
Hi, I wanted to know if it is possible to tweak solr to return the results of both complete and partial query matches. For eg: If the search query is "Brad Pitt" and if the query parser is "AND" Solr returns all documents indexed against the term "Brad Pitt". If the query parser is "OR" Solr retu

Re: Shard identification

2013-06-18 Thread Upayavira
What version of Solr? I had something like this on 4.2.1. Upgraging to 4.3 sorted it. Upayavira On Tue, Jun 18, 2013, at 09:37 AM, Ophir Michaeli wrote: > Hi, > > I built a 2 shards and 2 replicas system that works ok on a local > machine, 1 > zookeeper on shard 1. > It appears ok on the solar

Re: How to get SolrJ-serialization / binary-size statistics ?

2013-06-18 Thread Ralf Heyde
Hello, just for information: the Solution might look like (1st approach): I take the sourcecode of the BinaryResponsewriter and surround the serialization with some tracking methods. Then I create a custom QueryResponseWriter, which implements the binary Response writer and voila, i get my sta

Shard identification

2013-06-18 Thread Ophir Michaeli
Hi, I built a 2 shards and 2 replicas system that works ok on a local machine, 1 zookeeper on shard 1. It appears ok on the solar monitor page, cloud tab (http://localhost:8983/solr/#/~cloud). When I move to using different machines, each shard/replica on a different machine I get a wrong cloud-

Re: implementing identity authentication in SOLR

2013-06-18 Thread Gora Mohanty
On 18 June 2013 13:10, Mysurf Mail wrote: > Hi, > In order to add solr to my prod environmnet I have to implement some > security restriction. > Is there a way to add user/pass to the requests and to keep them > *encrypted*in a file. As mentioned earlier, no there is no built-in way of doing that

implementing identity authentication in SOLR

2013-06-18 Thread Mysurf Mail
Hi, In order to add solr to my prod environmnet I have to implement some security restriction. Is there a way to add user/pass to the requests and to keep them *encrypted*in a file. thanks.