Re: DIH Http input bug - problem with two-level RSS walker

2008-11-03 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Tue, Nov 4, 2008 at 1:31 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > Thank you for the "rootEntity" tip. Does this mean that the inner loop only > walks the first item and breaks out of the loop? This is very good because it > allows me to drill down a few levels without downloading 10,000

combining negative queries and OR

2008-11-03 Thread Joe Pollard
I am trying to decide if this is a solr or a lucene problem, using solr 1.3: take this example -- (-productName:"whatever") OR (anotherField:"Johnny") I would expect to get back records that have anotherField=Johnny, but also, any records that don't have 'whatever' as the productName. However

Re: MySql / Solr 1.3 / Tomcat55 - Full Import for 8,5M of data >> Exception in thread "Thread-33"

2008-11-03 Thread Noble Paul നോബിള്‍ नोब्ळ्
>From the data-config.xml it is obvious that the your indexing will take a lot of time. MySql has very poor join performance. It is not a very good idea to run this on a production database. I would suggest you to configure another mysql server and do mysql replication to that and run the import f

Re: XML vs mysql import with DataImportHandler

2008-11-03 Thread Noble Paul നോബിള്‍ नोब्ळ्
The attribute name is batchSize="-1" (it is case sensitive) . Tjis ensures that Mysql driver fetcches row by row http://wiki.apache.org/solr/DataImportHandlerFaq On Mon, Nov 3, 2008 at 9:17 PM, sunnyfr <[EMAIL PROTECTED]> wrote: > > Hi Shalin, > * > I would like to know if you just used batchsize

RE: SOLR Performance

2008-11-03 Thread Lance Norskog
The logistics of handling giant index files hit us before search performance. We switched to a set of indexes running inside one server (tomcat) instance with the Multicore+Distributed Search tools, with a frozen old index and a new index actively taking updates. The smaller new index takes much le

Re: Question about score...

2008-11-03 Thread Otis Gospodnetic
Hi, You could look at the scoring explanation with &debugQuery=true, and I think you'd see that this is because of the TF (term frequency) for terms blues and brothers. You can think/visualize this as "two for two" for that first hit - the field has 2 terms and both of them match your search t

Re: Solr 1.3 Maven Artifact Problem

2008-11-03 Thread Jeff Ramsdale
Chris Hostetter wrote: : > I'm not sure if there's any reason for solr-core to declare a maven : > dependency on solr-solrj. : When creating the POMs, I had (incorrectly) assumed that the core jar does : not contain SolrJ classes, hence the dependency. I consider it a totally justifiable assump

Re: SOLR Performance

2008-11-03 Thread Mike Klaas
If you never execute any queries, a gig should be more than enough. Of course, I've never played around with a .8 billion doc corpus on one machine. -Mike On 3-Nov-08, at 2:16 PM, Alok Dhir wrote: in terms of RAM -- how to size that on the indexer? --- Alok K. Dhir Symplicity Corporation

Question about score...

2008-11-03 Thread Craig Stadler
We have one field that is a simple text field, not multivalue. content0 We are populating music, atrist song etc in one string. content0:(blues brothers) Returns : (default desc score) BluesBrothers01.mp3 Breaux_Brothers_Tiger_Rag_Blues.mp3 Blues Brothers - Theme From Rawhide V1.mp

Re: SOLR Performance

2008-11-03 Thread Otis Gospodnetic
That depends largely on your ramBufferSizeMB setting in solrconfig.xml and the memory you are willing to give to the JVM via -Xmx. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Alok Dhir <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.o

Re: SOLR Performance

2008-11-03 Thread Alok Dhir
in terms of RAM -- how to size that on the indexer? --- Alok K. Dhir Symplicity Corporation www.symplicity.com (703) 351-0200 x 8080 [EMAIL PROTECTED] On Nov 3, 2008, at 4:07 PM, Walter Underwood wrote: The indexing box can be much smaller, especially in terms of CPU. It just needs one fast th

Seeking a SOLR consultant

2008-11-03 Thread Craig Stadler
Seeking a SOLR consultant.. We have a working model web based search engine that uses SOLR/java/apache. However the relevance isnt exactly what we would like... The system was built by a contractor no longer available for work and we ha= ve tried to hack around but would prefer to hire someone w

Re: SOLR Performance

2008-11-03 Thread Walter Underwood
The indexing box can be much smaller, especially in terms of CPU. It just needs one fast thread and enough disk. wunder On 11/3/08 2:58 PM, "Alok Dhir" <[EMAIL PROTECTED]> wrote: > I was afraid of that. Was hoping not to need another big fat box like > this one... > > --- > Alok K. Dhir > Symp

Question about CoreContainer

2008-11-03 Thread Matt Mitchell
Hi, i'm using CoreContainer in jRuby. I'd like my data directory to be the standard solr-home/data. But since CoreContainer == multi-core, I need to supply a core name. Is it possible to use CoreContainer without a "core"? is it possible to set the dataDir? Also, it seems that no matter what I set

Re: SOLR Performance

2008-11-03 Thread Alok Dhir
I was afraid of that. Was hoping not to need another big fat box like this one... --- Alok K. Dhir Symplicity Corporation www.symplicity.com (703) 351-0200 x 8080 [EMAIL PROTECTED] On Nov 3, 2008, at 4:53 PM, Feak, Todd wrote: I believe this is one of the reasons that a master/slave configu

RE: SOLR Performance

2008-11-03 Thread Feak, Todd
I believe this is one of the reasons that a master/slave configuration comes in handy. Commits to the Master don't slow down queries on the Slave. -Todd -Original Message- From: Alok Dhir [mailto:[EMAIL PROTECTED] Sent: Monday, November 03, 2008 1:47 PM To: solr-user@lucene.apache.org Su

SOLR Performance

2008-11-03 Thread Alok Dhir
We've moved past this issue by reducing date precision -- thanks to all for the help. Now we're at another problem. There is relatively constant updating of the index -- new log entries are pumped in from several applications continuously. Obviously, new entries do not appear in searches

Query on distributed search ...

2008-11-03 Thread souravm
Hi, I'm new to Solr. Here is a query on distributed search. I have huge volume of log files which I would like to search. Apart from generic test search I would also like to get statistics - say each record has a field telling request processing time and I would like to get average of processi

Re: Getting a document by primary key

2008-11-03 Thread Marc Sturlese
Hey your are right, I'm trying to migrate my app to solr. For the moment I am using solr for the searching part of the app but i am using my own lucene app for indexing, Shoud have posted in lucene forum for this trouble. Sorry about that. Iam trying to use termdocs properly now. Thanks for your a

Re: DocSet: BitDocSet or HashDocSet ?

2008-11-03 Thread Mike Klaas
On 28-Oct-08, at 5:36 AM, Jérôme Etévé wrote: Hi all, In my code, I'd like to keep a subset of my 14M docs which is around 100k large. What is according to you the best option in terms of speed and memory usage ? Some basic thoughts tells me the BitDocSet should be the fastest for lookup

RE: DIH Http input bug - problem with two-level RSS walker

2008-11-03 Thread Lance Norskog
Thank you for the "rootEntity" tip. Does this mean that the inner loop only walks the first item and breaks out of the loop? This is very good because it allows me to drill down a few levels without downloading 10,000 feeds. (Public API sites tend to dislike this behavior :) The URL is wrong be

Re: Getting a document by primary key

2008-11-03 Thread Yonik Seeley
On Mon, Nov 3, 2008 at 2:49 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Is this your code or something from Solr? > That indexSearcher = new IndexSearcher(path_index) ; is very suspicious > looking. Good point... if this is a Solr plugin, then get the SolrIndexSearcher from the request obje

Re: Getting a document by primary key

2008-11-03 Thread Otis Gospodnetic
Is this your code or something from Solr? That indexSearcher = new IndexSearcher(path_index) ; is very suspicious looking. Are you creating a new IndexSearcher for every search request? If so, that's the cause of your memory problem. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nu

Re: Getting a document by primary key

2008-11-03 Thread Yonik Seeley
On Mon, Nov 3, 2008 at 2:40 PM, Marc Sturlese <[EMAIL PROTECTED]> wrote: > As hits is deprecated I tried to use termdocs and top docs... Try using searcher.getFirstMatch(t) as Jonathan is. It should be faster than Hits. > but the memory > problem never disapeared... > If I call the garbage colle

Re: Getting a document by primary key

2008-11-03 Thread Marc Sturlese
Hey there, I never run out of memory but I think the app always run to the limit... The problem seems to be in here (searching by term): try { indexSearcher = new IndexSearcher(path_index) ; QueryParser queryParser = new QueryParser("id_field", getAnalyzer(stop

Re: Custom sort (score + custom value)

2008-11-03 Thread Yonik Seeley
On Mon, Nov 3, 2008 at 12:37 PM, George <[EMAIL PROTECTED]> wrote: > Ok Yonik, thank you. > > I've tried to execute the following query: "{!boost b=log(myrank) > defType=dismax}q" and it works great. > > Do you know if I can do the same (combine a DisjunctionMaxQuery with a > BoostedQuery) in solrc

RE: Custom sort (score + custom value)

2008-11-03 Thread Feak, Todd
Have you looked into the "bf" and "bq" arguments on the DisMaxRequestHandler? http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=(dismax)#head -6862070cf279d9a09bdab971309135c7aea22fb3 -Todd -Original Message- From: George [mailto:[EMAIL PROTECTED] Sent: Monday, November 03, 200

Re: Custom sort (score + custom value)

2008-11-03 Thread George
Ok Yonik, thank you. I've tried to execute the following query: "{!boost b=log(myrank) defType=dismax}q" and it works great. Do you know if I can do the same (combine a DisjunctionMaxQuery with a BoostedQuery) in solrconfig.xml? George On Sun, Nov 2, 2008 at 3:01 PM, Yonik Seeley <[EMAIL PROTEC

Huge increase in index size adding just 2 fields

2008-11-03 Thread Phillip Farber
Hi, We're indexing a lot of dirty OCR. So the index is really huge due to the size of the position file. We still get ok response time though with a median of 100ms. Phrase queries are a different matter obviously. But we're seeing some really large increases in index size as we add a cou

MySql / Solr 1.3 / Tomcat55 - Full Import for 8,5M of data >> Exception in thread "Thread-33"

2008-11-03 Thread sunnyfr
Hi, I've put a batchsize parameter at -1, it works fine, the point is I will monopolize the MySql's database for 10hours. And other request on it like update, or other process will be stack. And if I don't use batchsize -1 I will have an OOM error like below. I tried to put batchsize 1000 or 1 bu

Re: XML vs mysql import with DataImportHandler

2008-11-03 Thread sunnyfr
Hi Shalin, * I would like to know if you just used batchsize = -1. When I do that I use all Mysql's memory and it's a problem for the database and other process on it like update insert ... It will keep my database busy for 10hours, it's too much, is there a way to manage it differently ? Thank

Re: DataImportHandler running out of memory

2008-11-03 Thread sunnyfr
Hi, I tried batchSize =-1 but when I'm doing that I will use all mysql's memory and it's a problem for mysql's database. :s Noble Paul നോബിള്‍ नोब्ळ् wrote: > > I've moved the FAQ to a new Page > http://wiki.apache.org/solr/DataImportHandlerFaq > The DIH page is too big and editing has become

Re: Getting a document by primary key

2008-11-03 Thread Yonik Seeley
On Sun, Nov 2, 2008 at 8:09 PM, Marc Sturlese <[EMAIL PROTECTED]> wrote: > I am doing the same and I am experimenting some trouble. I get the document > data searching by term. The problem is that when I do it several times > (inside a huge for) the app starts increasing the memory use until I use

Re: Doing a fast ID only search !

2008-11-03 Thread Yonik Seeley
On Mon, Nov 3, 2008 at 6:21 AM, Kraus, Ralf | pixelhouse GmbH <[EMAIL PROTECTED]> wrote: > I have a "little" performence problem with SOLR (again). > My searches delivering 30 rows per site (for my webpage) BUT I need to get > about 500 primary keys (IDs). > Right now I am searching for my 30 compl

Re: Solr1.3 / MySql / Tomcat55 multiple delta-import inside a big full-import

2008-11-03 Thread sunnyfr
Do you have an idea ? sunnyfr wrote: > > Sorry I wasn't clear, > The stack is not on solr database or index query, stack request are on our > main database MySql, > When I do a full import to create indexes for solr, MySql honnor it and > won't drive it OOM, but with a batchsize -1, it uses My

Doing a fast ID only search !

2008-11-03 Thread Kraus, Ralf | pixelhouse GmbH
Hello, I have a "little" performence problem with SOLR (again). My searches delivering 30 rows per site (for my webpage) BUT I need to get about 500 primary keys (IDs). Right now I am searching for my 30 complete rows (all fields) and another search with the setting "fl=primaryID". Unfortunat