Solr Cluster - Is it wise to run optimize() on the master after each update

2012-01-23 Thread Maxim Veksler
I'm planning on having 1 Master and multiple slaves (cloud based, slaves are going up / down randomly). The slaves should be constantly available, meaning searching performance should optimally not be affected by the updates at all. It's unclear to me how the Cluster based replication works, does

Re: Solr Cluster - Is it wise to run optimize() on the master after each update

2012-01-23 Thread Andrew Harvey
We found that optimising too often killed our slave performance. An optimise will cause you to merge and ship the whole index rather than just the relevant portions when you replicate. The change on our slaves in terms of IO and CPU as well as RAM was marked. Andrew Sent on the run. On

Re: Tika0.10 language identifier in Solr3.5.0

2012-01-23 Thread Ted Dunning
Jan's point that keeping different fields can make some statistical issues more correct is sound. The basic idea is that a common word in a rare language should be treated as a common word if you are working in that language. The simplest way to make that happen is by having a different field

Solr Indexing Running Time 32bit vs 64bit

2012-01-23 Thread Husain, Yavar
I was running 32 bit Java (JDK, JRE Tomcat) on my 64 bit Windows. For indexing I was not able to allocate more than 1.5GB Heap Space on my machine. Each time my tomcat process used to touch the upper bound (i.e. 1.5GB) very quickly so I thought of working on 64 bit Java/Tomcat. Now I dont see

Filtering search results by an external set of values

2012-01-23 Thread John, Phil (CSS)
Hi, We're building quite a large shared index of resources, using Solr. The application that makes use of these resources is a multitenant one (i.e., many customers using the same index). For resources that are private to a customer, it's fairly easy to tag a document with their customer ID

Re: Getting a word count frequency out of a page field

2012-01-23 Thread solr user
Thanks for the article. I am indexing each page of a document as if it were a document. I think the answer is to configure SOLR for use of the TermVector Component: http://wiki.apache.org/solr/TermVectorComponent I have not tried it yet, but someone told me on StackExchange forum to try this

Re: Parameter for database host in DIH?

2012-01-23 Thread Chantal Ackermann
Hi wunder, for us, it works with internal dots when specifying the properties in $SOLR_HOME/[core]/conf/solrcore.properties: like this: db.url=xxx db.user=yyy db.passwd=zzz $SOLR_HOME/[core]/conf/data-config.xml: dataSource type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver

Re: Validating solr user query

2012-01-23 Thread Chantal Ackermann
Hi Dipti, just to make sure: are you aware of http://wiki.apache.org/solr/DisMaxQParserPlugin This will handle the user input in a very conventional and user friendly way. You just have to specify on which fields you want it to search. With the 'mm' parameter you have a powerfull option to

Re: Search within words

2012-01-23 Thread Lee Carroll
check your defaultOperator, ensure its OR On 23 January 2012 05:56, jawedshamshedi jawedshamsh...@gmail.com wrote: Hi Thanks for the reply.. I am using NGramFilterFactory for this. But it's not working as desired. Like I have a  field article_type that has been indexed using the below

Re: Filtering search results by an external set of values

2012-01-23 Thread Jan Høydahl
Hi, Do you have any kind of group membership for you users? If you have, a resource's list of security access tokens could be smaller and avoid re-indexing most resources when adding normal users which mostly belong to groups. The common way is to add filters on the query. You may do it

Re: Trying to understand SOLR memory requirements

2012-01-23 Thread Lee Carroll
on selection issue another query to get your additional data (if i follow what you want) On 22 January 2012 18:53, Dave dla...@gmail.com wrote: I take it from the overwhelming silence on the list that what I've asked is not possible? It seems like the suggester component is not well supported

Highlighting stopwords

2012-01-23 Thread O. Klein
Im using trunk and FVH and eventhough I filter stopwords when searching, I would like to highlight stopwords in fragments. Using a different field without the stopwords filter did not have the desired effect. Is there a way to do this? -- View this message in context:

Re: Improving Solr Spell Checker Results

2012-01-23 Thread David Radunz
Hey, Thanks for that, I have uploaded a new patch as advised. Cheers, David On 23/01/2012 1:01 PM, Erick Erickson wrote: David: There's some good info here: http://wiki.apache.org/solr/HowToContribute#Working_With_Patches But the short form is to go into solr_home and issue this

Re: How to Sort By a PageRank-Like Complicated Strategy?

2012-01-23 Thread Shashi Kant
You can update the document in the index quite frequently. IDNK what your requirement is, another option would be to boost query time. On Sun, Jan 22, 2012 at 5:51 AM, Bing Li lbl...@gmail.com wrote: Dear Shashi, Thanks so much for your reply! However, I think the value of PageRank is not a

RE: Improving Solr Spell Checker Results

2012-01-23 Thread Dyer, James
David, Thank you for taking the time to evaluate SOLR-2585. Perhaps the title of the issue advertises more than it delivers? (The name is borrowed from a section in the first book listed here: http://wiki.apache.org/lucene-java/InformationRetrieval) In any case, I think SOLR-2585 is a step

Limiting term frequency in a document to a specific term

2012-01-23 Thread solr user
0 down vote favorite share [fb] share [tw] What is the proper query URL to limit the term frequency to just one term in a document? Below is an example query to search for the term frequency in a document, but it is returning the frequency for all the terms. [

edismax/dismax/Lucene Query Parser converts some fields to be mandatory

2012-01-23 Thread Michael Jakl
Hi, I've been wondering why some of my queries did not return the results I expected. A debugQuery resulted in the following: str name=querystring java^0.0 OR haskell^0.0 OR python^0.0 OR (ruby^0.0) AND ((programming^0.0)) OR programming language^0.0 OR code coding^0.0 OR -mobile^0.0 OR

Re: Solr Cluster - Is it wise to run optimize() on the master after each update

2012-01-23 Thread Erick Erickson
In general, do not optimize unless you 1 have a very static index 2 actually test the search performance afterwards. First, as Andrew says, optimizing will force a complete copy of the entire index at replication. If you do NOT optimize, only the most recent segments to be written are copied.

Re: Search within words

2012-01-23 Thread Erick Erickson
Please provide more info. In particular what is the output when you attach debugQuery=on? Best Erick On Mon, Jan 23, 2012 at 5:11 AM, Lee Carroll lee.a.carr...@googlemail.com wrote: check your defaultOperator, ensure its OR On 23 January 2012 05:56, jawedshamshedi jawedshamsh...@gmail.com

Re: Filtering search results by an external set of values

2012-01-23 Thread Erick Erickson
A second, but arguably quite expert option, is to use the no-cache option. See: https://issues.apache.org/jira/browse/SOLR-2429 The idea here is that you can specify that a filter is expensive and it will only be run after all the other filters etc have been applied. Furthermore, it will not be

Re: edismax/dismax/Lucene Query Parser converts some fields to be mandatory

2012-01-23 Thread Erick Erickson
Count your parentheses (anyone here speak Lisp?) I think that + is outside the entire clause, meaning it's saying that there is a single mandatory clause, and it's the whole thing But boosting by 0.0 is probably a really bad thing. This may be dropping all the scores to 0, which means no

Re: Solr Cores

2012-01-23 Thread Erick Erickson
You can have a large number of cores, some people have multiple hundreds. Having multiple cores is preferred over having multiple JVMs since it's more efficient at sharing system resources. If you're running a 32 bit JVM, you are limited in the amount of memory you can let the JVM use, so that's a

Re: Solr Cluster - Is it wise to run optimize() on the master after each update

2012-01-23 Thread Maxim Veksler
Wonderful input. Thank you very much Erick. One question, I've been told that Solr supports an operation mode of multi core where you build the index on the master (optimize or not) then pass it to the stand by core on the slaves. Once the synchronization is complete you switch on the slave

Re: Solr Cluster - Is it wise to run optimize() on the master after each update

2012-01-23 Thread Erick Erickson
My first reaction is that, unless you have a specific use-case, this is unnecessary. When using a slave the Solr replication goes on in the background. Autowarming also is carried out in the background. Only when the autowarming is done are queries sent to the new (internal-to-solr) searcher. All

Oregon (OR) cities facet query issue, maybe related to OR being a reserved word?

2012-01-23 Thread asi123
Hi, I would really appreciate any hint/guide to fix this query issue. A Java webapp hits solr with a query that does not returns any result but works for other states. (FL, CA for instance) From logs: [code] solr path=/select

Re: edismax/dismax/Lucene Query Parser converts some fields to be mandatory

2012-01-23 Thread Michael Jakl
Hi! On Mon, Jan 23, 2012 at 18:42, Erick Erickson erickerick...@gmail.com wrote: Count your parentheses (anyone here speak Lisp?) I think that + is outside the entire clause, meaning it's saying that there is a single mandatory clause, and it's the whole thing You're right in that case

Re: Oregon (OR) cities facet query issue, maybe related to OR being a reserved word?

2012-01-23 Thread Ahmet Arslan
I would really appreciate any hint/guide to fix this query issue. A Java webapp hits solr with a query that does not returns any result but works for other states. (FL, CA for instance) From logs: [code] solr path=/select

Re: edismax/dismax/Lucene Query Parser converts some fields to be mandatory

2012-01-23 Thread Erick Erickson
Right. Essentially, the precedence is given to AND, so this is parsed as though it were python OR (ruby AND programming) OR programming language Best Erick On Mon, Jan 23, 2012 at 10:55 AM, Michael Jakl jakl.mich...@gmail.com wrote: Hi! On Mon, Jan 23, 2012 at 18:42, Erick Erickson

ExractionHandler/Cell ignore just 2 fields defined in schema 3.5.0

2012-01-23 Thread Wayne W
Hi, Im been trying to figure this out now for a few days and I'm just not getting anywhere, so any pointers would be MOST welcome. I'm in the process of upgrading from 1.3 to the latest and greatest version of Solr and I'm getting there slowly. However I have this (final) problem that when

Re: Limiting term frequency in a document to a specific term

2012-01-23 Thread Ahmet Arslan
Below is an example query to search for the term frequency in a document, but it is returning the frequency for all the terms. [ http://localhost:8983/solr/select/?fl=documentPageIdq=documentPageId:49667.3qt=tvrhtv.tf=truetv.fl=contents][1 ] I would like to be able to limit the query to

Hierarchical faceting in UI

2012-01-23 Thread Yuhao
I have some hierarchical data that I want to represent in the Solr UI (/browse).  I've read through many discussions on this topic, including http://wiki.apache.org/solr/HierarchicalFaceting and http://packtlib.packtpub.com/library/9781849516068/ch06lvl1sec09 .  However, I didn't see a

Re: edismax/dismax/Lucene Query Parser converts some fields to be mandatory

2012-01-23 Thread Michael Jakl
On Mon, Jan 23, 2012 at 22:05, Erick Erickson erickerick...@gmail.com wrote: Right. Essentially, the precedence is given to AND, so this is parsed as though it were python OR (ruby AND programming) OR programming language That's exactly what I'd expect, but the problem is that ruby is marked as

Solr Java client API

2012-01-23 Thread jingjung Ng
Hi, I implemented the facet using query.addFacetQuery query.addFilterQuery to facet on: gender:male state:DC This works fine. How can I facet on multi-values using Solrj API, like following: gender:male gender:female state:DC I've tried, but return 0. Can anyone help ? Thanks, -jingjung

Re: Highlighting stopwords

2012-01-23 Thread Koji Sekiguchi
(12/01/23 23:14), O. Klein wrote: Im using trunk and FVH and eventhough I filter stopwords when searching, I would like to highlight stopwords in fragments. Using a different field without the stopwords filter did not have the desired effect. Please provide more info. In particular, how your

RE: Oregon (OR) cities facet query issue, maybe related to OR being a reserved word?

2012-01-23 Thread Ritzman, James
Hello, I'm no expert here (just started learning/using Solr a few months ago) but I ran into the same issue of needing to search for and facet on the OR abbreviation. What worked for me was to double-escape OR (a la :\\OR) for queries and single escape (:\OR) when doing a facet query. The

Re: ExractionHandler/Cell ignore just 2 fields defined in schema 3.5.0

2012-01-23 Thread Jan Høydahl
Hi, It's because lowernames=true by default in solrconfig.xml, and it will convert any - into _ in field names. So try adding a request parameter lowernames=false or change the default in solrconfig.xml. Alternatively, leave as is but name your fields project_id and company_id :)

Re: Hierarchical faceting in UI

2012-01-23 Thread darren
On Mon, 23 Jan 2012 14:33:00 -0800 (PST), Yuhao nfsvi...@yahoo.com wrote: Programmatically, something like this might work: for each facet field, add another hidden field that identifies its parent.  Then, program additional logic in the UI to show only the facet terms at the currently

Re: Hierarchical faceting in UI

2012-01-23 Thread Johannes Goll
another way is to store the original hierarchy in a sql database (in the form: id, parent_id, name, level) and in the Lucene index store the complete hierarchy (from root to leave node) for each document in one field using the ids of the sql database. In that way you can get documents at any level

Re: Highlighting stopwords

2012-01-23 Thread O. Klein
Koji Sekiguchi wrote (12/01/23 23:14), O. Klein wrote: Im using trunk and FVH and eventhough I filter stopwords when searching, I would like to highlight stopwords in fragments. Using a different field without the stopwords filter did not have the desired effect. Please provide more

hot deploy of newer version of solr schema in production

2012-01-23 Thread roz dev
Hi All, I need community's feedback about deploying newer versions of solr schema into production while existing (older) schema is in use by applications. How do people perform these things? What has been the learning of people about this. Any thoughts are welcome. Thanks Saroj

Re: Ngram autocompleter and term frequency boosting

2012-01-23 Thread Cuong Hoang
Thanks for your replies. I can't apply index-time boost because I don't know the term frequencies in advance. Additionally, new documents come in every few minutes which make maintaining term frequencies outside Solr a difficult task. Facet prefix would probably help in this case. I thought there

Size of index to use shard

2012-01-23 Thread Anderson vasconcelos
Hi Has some size of index (or number of docs) that is necessary to break the index in shards? I have a index with 100GB of size. This index increase 10GB per year. (I don't have information how many docs they have) and the docs never will be deleted. Thinking in 30 years, the index will be with

Re: edismax/dismax/Lucene Query Parser converts some fields to be mandatory

2012-01-23 Thread Erick Erickson
Well, at root the Lucene query parser makes no claim of enforcing boolean logic. Think in terms of MUST, SHOULD and NOT instead. Here's a good writeup... http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/ Best Erick On Mon, Jan 23, 2012 at 2:43 PM, Michael Jakl

Re: CLOSE_WAIT after connecting to multiple shards from a primary shard

2012-01-23 Thread Ranveer
Hi Mukund, Since I am getting this issue for long time, I had done some hit and run. In my case I am connecting the local tomcat server using solrJ. SolrJ has max connection perhost 20 and per client 2. As I have heavy load and lots of dependency on solr so it seems very low. To increase the

How to increase maxConnection and maxConnectionPerHost in SolrJ

2012-01-23 Thread Jonty Rhods
Hi All, I have two tomcat on same server. One is for Solr and other is my application server. I am conneting solr server with solrj from application server. As I am connecting locally so the default connection seems to be very less. My server stop responding every few hour only up when I reset

Re: java.net.SocketException: Too many open files

2012-01-23 Thread Jonty Rhods
Hi Kuli, Did you get the solution of this problem? I am still facing this problem. Please help me to overcome this problem. regards On Wed, Oct 26, 2011 at 1:16 PM, Michael Kuhlmann k...@solarier.de wrote: Hi; we have a similar problem here. We already raised the file ulimit on the server

Re: CLOSE_WAIT after connecting to multiple shards from a primary shard

2012-01-23 Thread Mikhail Khludnev
Hello, AFAIK by setting connectionManager.closeIdleConnections(0L); you preventing your http connecitons from caching aka disabling keep-alive. If you increase it enough you won't see many CLOSE_WAIT connections. Some explanation and solution for jdk's http client (URL Connection), not for your

Re: edismax/dismax/Lucene Query Parser converts some fields to be mandatory

2012-01-23 Thread Michael Jakl
On Tue, Jan 24, 2012 at 06:27, Erick Erickson erickerick...@gmail.com wrote: Well, at root the Lucene query parser makes no claim of enforcing boolean logic. Think in terms of MUST, SHOULD and NOT instead. Here's a good writeup...