query about the server configuration

2011-06-19 Thread Jonty Rhods
Dear all, I am quite new and not work on solr for heavy request. I have following server configuration: 16GB RAM 16 CPU I need to index update in every minutes and at least more than 5000 docs per day. Size of the data per day will be around 50 MB. I am expecting 10 to 30 concurrent hit on

Re: about the SolrServer server = new CommonsHttpSolrServer(URL);

2011-06-19 Thread Jonty Rhods
for heavy use (30 to 40 concurrent user) will it work. How to open and maintain more connection at a time like connection pool. So user cat receive fast response.. regards On Fri, Jun 17, 2011 at 12:50 PM, Ahmet Arslan iori...@yahoo.com wrote: SolrServer server = new

Jonty Rhods wants to chat

2011-06-19 Thread Jonty Rhods
--- Jonty Rhods wants to stay in better touch using some of Google's coolest new products. If you already have Gmail or Google Talk, visit: http://mail.google.com/mail/b-26ddccf9dc-56859aec19-TvU2zC9tjv8Q_u4jzhyceWuZkgs You'll

Re: about the SolrServer server = new CommonsHttpSolrServer(URL);

2011-06-19 Thread Ahmet Arslan
for heavy use (30 to 40 concurrent user) will it work. How to open and maintain more connection at a time like connection pool. So user cat receive fast response.. It uses HttpClient under the hood. You can pass httpClient to its constructor too. It seems that

Weird optimize performance degradation

2011-06-19 Thread Santiago Bazerque
Hello! Here is a puzzling experiment: I build an index of about 1.2MM documents using SOLR 3.1. The index has a large number of dynamic fields (about 15.000). Each document has about 100 fields. I add the documents in batches of 20, and every 50.000 documents I optimize the index. The first 10

Re: Is it true that I cannot delete stored content from the index?

2011-06-19 Thread François Schiettecatte
That is correct, but you only need to commit, optimize is not a requirement here. François On Jun 18, 2011, at 11:54 PM, Mohammad Shariq wrote: I have define uniqueKey in my solr and Deleting the docs from solr using this uniqueKey. and then doing optimization once in a day. is this right

site: feature in Solr?

2011-06-19 Thread Gabriele Kahlout
Hello, Beside creating an index with just the site in question, is it possible like with Google to search for results only in a given domain? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours

Re: site: feature in Solr?

2011-06-19 Thread Ahmet Arslan
Beside creating an index with just the site in question, is it possible like with Google to search for results only in a given domain? If you have an appropriate field that is indexed, yes. fq=site:foo.com http://wiki.apache.org/solr/CommonQueryParameters#fq

example doesnt run from source?

2011-06-19 Thread Jason Toy
I'm trying to run the example app from the svn source, but it doesn't seem to work. I am able to run : java -jar start.jar and Jetty starts with: INFO::Started SocketConnector@0.0.0.0:8983 But then when I go to my browser and go to this address: http://localhost:8983/solr/ I get a 404 error.

Re: example doesnt run from source?

2011-06-19 Thread Stefan Matheis
Jason, which source did you use for the checkout and how did you build solr? Regards Stefan Am 19.06.2011 15:00, schrieb Jason Toy: I'm trying to run the example app from the svn source, but it doesn't seem to work. I am able to run : java -jar start.jar and Jetty starts with: INFO::Started

Re: Multiple indexes

2011-06-19 Thread lee carroll
your data is being used to build an inverted index rather than being stored as a set of records. de-normalising is fine in most cases. what is your use case which requires a normalised set of indices ? 2011/6/18 François Schiettecatte fschietteca...@gmail.com: You would need to run two

Re: Weird optimize performance degradation

2011-06-19 Thread Erick Erickson
First, there's absolutely no reason to optimize this often, if at all. Older versions of Lucene would search faster on an optimized index, but this is no longer necessary. Optimize will reclaim data from deleted documents, but is generally recommended to be performed fairly rarely, often at

Re: Is it true that I cannot delete stored content from the index?

2011-06-19 Thread Erick Erickson
That'll work, but you could just as easily simply add the document. Solr will take care of deleting any other documents with the same uniqueKey as a document being added automatically. Optimizing once a day is reasonable, but note that about all you're doing here is reclaiming some space. So if

Re: example doesnt run from source?

2011-06-19 Thread Erick Erickson
Right, run ant example first to build the example code. You have to run it from the solr_install/solr directory. Best Erick On Sun, Jun 19, 2011 at 9:00 AM, Jason Toy jason...@gmail.com wrote: I'm trying to run the example app from the svn source, but it doesn't seem to work. I am able to run

Re: Optimize taking two steps and extra disk space

2011-06-19 Thread Michael McCandless
With LogXMergePolicy (the default before 3.2), optimize respects mergeFactor, so it's doing 2 steps because you have 37 segments but 35 mergeFactor. With TieredMergePolicy (default on 3.2 and after), there is now a separate merge factor used for optimize (maxMergeAtOnceExplicit)... so you could

Re: Weird optimize performance degradation

2011-06-19 Thread Santiago Bazerque
Hello Erick, thanks for your answer! Yes, our over-optimization is mainly due to paranoia over these strange commit times. The long optimize time persisted in all the subsequent commits, and this is consistent with what we are seeing in other production indexes that have the same problem. Once

Solr Multithreading

2011-06-19 Thread Rahul Warawdekar
Hi, I am currently working on a search based project which involves indexing data from a SQL Server database including attachments using DIH. For indexing attachments (varbinary DB objects), I am using TikaEntityProcessor. I am trying to use the multithreading to speed up the indexing but it

fq vs adding to query

2011-06-19 Thread Jamie Johnson
Are there any hard and fast rules about when to use fq vs adding to the query? For instance if I started with a search of camera then wanted to add another keyword say digital, is it better to do q=camera AND digital or q=camerafq=digital I know that fq isn't taken into account when doing

Re: fq vs adding to query

2011-06-19 Thread Mohammad Shariq
fq is filter-query, search based on category, timestamp, language etc. but I dont see any performance improvement if use 'keyword' in fq. useCases : fq=lang:Englishq=camera AND digital OR fq=time:[13023567 TO 13023900]q=camera AND digital On 19 June 2011 20:17, Jamie Johnson jej2...@gmail.com

Building Solr 3.2 from sources - can't get war

2011-06-19 Thread Yuriy Akopov
Hi, This is my first post here so excuse me please if it is not really related. At the moment I'm using Solr 1.4.1 with SOLR-236 (https://issues.apache.org/jira/browse/SOLR-236) patch applied to support field collapsing. One of the mandatory fields of documents indexed is generated from the

Re: Optimize taking two steps and extra disk space

2011-06-19 Thread Shawn Heisey
On 6/19/2011 7:32 AM, Michael McCandless wrote: With LogXMergePolicy (the default before 3.2), optimize respects mergeFactor, so it's doing 2 steps because you have 37 segments but 35 mergeFactor. With TieredMergePolicy (default on 3.2 and after), there is now a separate merge factor used for

Re: Weird optimize performance degradation

2011-06-19 Thread Mohammad Shariq
I also have the solr with around 100mn docs. I do optimize once in a week, and it takes around 1 hour 30 mins to optimize. On 19 June 2011 20:02, Santiago Bazerque sbazer...@gmail.com wrote: Hello Erick, thanks for your answer! Yes, our over-optimization is mainly due to paranoia over these

Re: fq vs adding to query

2011-06-19 Thread Markus Jelsma
If you wan't to make good use of the filter cache then use filter queries. fq is filter-query, search based on category, timestamp, language etc. but I dont see any performance improvement if use 'keyword' in fq. useCases : fq=lang:Englishq=camera AND digital OR fq=time:[13023567 TO

Re: Building Solr 3.2 from sources - can't get war

2011-06-19 Thread Shawn Heisey
On 6/19/2011 9:32 AM, Yuriy Akopov wrote: For 3.2, I can't see a similar build option. First, there is no release-3.2 folder, so I tried to checkout http://svn.apache.org/repos/asf/lucene/dev/trunk supposing this is the latest stable release (and I might be wrong there). However, there is no

Re: fq vs adding to query

2011-06-19 Thread Shawn Heisey
On 6/19/2011 10:00 AM, Markus Jelsma wrote: If you wan't to make good use of the filter cache then use filter queries. Additionally, information in filter queries will not affect relevancy ranking. If you want the terms you are using to affect the document scores, include them in the main

Re: query about the server configuration

2011-06-19 Thread Ranveer
Please help I am also in same situation. regards On Sunday 19 June 2011 12:59 PM, Jonty Rhods wrote: Dear all, I am quite new and not work on solr for heavy request. I have following server configuration: 16GB RAM 16 CPU I need to index update in every minutes and at least more than 5000

Re: about the SolrServer server = new CommonsHttpSolrServer(URL);

2011-06-19 Thread Ranveer
thanks.. however few more query. How to maintain connections threads (max and min settings)? What would be ideal setting for max in setMaxConnectionsPerHost method. Will it be ok for 30 to 40 concurrent user. How thread will be maintain for MultiThreadedHttpConnectionManager class. On

Re: Building Solr 3.2 from sources - can't get war

2011-06-19 Thread Yuriy Akopov
In the checked out lucene (either trunk or one of the 3.x branches) source is a solr/ directory. You just cd into that directory, and dist-war becomes a build option. Thanks, Shawn! That worked and by invoking dist-war build I have received apache-solr-4.0-SNAPSHOT.war file successfully -

RE: Building Solr 3.2 from sources - can't get war

2011-06-19 Thread Steven A Rowe
https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_2/ -Original Message- From: Yuriy Akopov [mailto:ako...@hotmail.co.uk] Sent: Sunday, June 19, 2011 4:38 PM To: solr-user@lucene.apache.org Subject: Re: Building Solr 3.2 from sources - can't get war In the checked out

Re: Solr and Tag Cloud

2011-06-19 Thread Alexey Serba
Consider you have multivalued field _tag_ related to every document in your corpus. Then you can build tag cloud relevant for all data set or specific query by retrieving facets for field _tag_ for *:* or any other query. You'll get a list of popular _tag_ values relevant to this query with

paging and maintaingin a cursor just like ScrollableResultSet

2011-06-19 Thread Hiller, Dean x66079
As you probably know, using Query in hibernate/JPA gets slower and slower each page since it starts all over on the index tree :( WHILE ScrollableResultSet does NOT because the database maintains a cursor into the index that just picks up where it left off so as you go to the next page, next

Re: Why are not query keywords treated as a set?

2011-06-19 Thread lee carroll
do you mean a phrase query? past past can you give some more detail? On 18 June 2011 13:02, Gabriele Kahlout gabri...@mysimpatico.com wrote: q=past past 1.0 = (MATCH) sum of: *  0.5 = (MATCH) fieldWeight(content:past in 0), product of:*   1.0 = tf(termFreq(content:past)=1)   1.0 =

Re: paging and maintaingin a cursor just like ScrollableResultSet

2011-06-19 Thread Michael Sokolov
One technique I've used to page through huge result sets that could help: if you have a sortable key (like an id), you can just fetch all docs, sorted by the key, and then on subsequent page requests use the last value from the previous page as a filter in a range term like: id:[last-id TO *]

Re: solr highliting feature

2011-06-19 Thread Jan Høydahl
Hi, First, you should consider SolrJ API if you're working from Java/JSP. Then, say you want to highlight title. In you loop across the N hits, instead of pulling the title from the hits themselves, check if you find a highlighted result with the same ID in the highlighting section. -- Jan

why too many open files?

2011-06-19 Thread Jason, Kim
Hi, All I have 12 shards and ramBufferSizeMB=512, mergeFactor=5. But solr raise java.io.FileNotFoundException (Too many open files). mergeFactor is just 5. How can this happen? Below is segments of some shard. That is too many segments over mergFactor. What's wrong and How should I set the

Re: query about the server configuration

2011-06-19 Thread Jonty Rhods
I forgot an important point that I need to commit the server in 2 to 5 minutes.. please help.. regards On Sun, Jun 19, 2011 at 11:29 PM, Ranveer ranveer.s...@gmail.com wrote: Please help I am also in same situation. regards On Sunday 19 June 2011 12:59 PM, Jonty Rhods wrote: Dear

score of Infinity on dismax query

2011-06-19 Thread Chris Book
Hello, I have a solr search server running and in at least one very rare case, I'm seeing a strange scoring result. The following example will cause solr to return a score of Infinity: Query: {!dismax tie=0.1 qf=lyrics pf=lyrics ps=5}drugs the drugs Here is the debug output: Infinity = (MATCH)

Re: Why are not query keywords treated as a set?

2011-06-19 Thread Gabriele Kahlout
str name=rawquerystringpast past/str str name=querystring*past past*/str str name=parsedquery*content:past content:past*/str I was expecting the query to get parsed into content:past only and not content:past content:past. On Mon, Jun 20, 2011 at 12:12 AM, lee carroll

Re: score of Infinity on dismax query

2011-06-19 Thread Robert Muir
This is a bug, thanks for including all the information necessary to reproduce! https://issues.apache.org/jira/browse/LUCENE-3215 On Sun, Jun 19, 2011 at 10:24 PM, Chris Book chrisb...@gmail.com wrote: Hello, I have a solr search server running and in at least one very rare case, I'm seeing a

Re: solr highliting feature

2011-06-19 Thread Romi
yes, I find title in highlighting section. If i am getting results say by parsing json object then do i need to parse highlighting? - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/solr-highliting-feature-tp3079239p3084890.html Sent from the Solr -