Re: Exception on the use of dataimport.jar in Full Import Example

2008-05-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, May 21, 2008 at 6:27 AM, Julio Castillo <[EMAIL PROTECTED]> wrote: > I wanted to learn how to index data that I have on my dB. > I followed the instructions on the wiki page for the Data Import Handler > (Full Import Example -example-solr-home.jar). I got an exception running it > as is (se

Fetching the first 10 results and the last result

2008-05-21 Thread Tim Mahy
Hi all, is there a way to let Solr not only return the total number of found articles, but also the data of the last document when for example only requesting the first 10 documents ? we could do this with a seperate query by either letting the second query fetch 1 row from position = previous

Re: What are stopwords and protwords ???

2008-05-21 Thread Grant Ingersoll
Stopwords are commonly occurring words that don't add _much_ value to search, such as the, an, a and are usually removed during analysis. Protwords (protected words) are words that would be stemmed by the English porter stemmer that you do not want to be stemmed. In the end, removing stop

SOLR OOM (out of memory) problem

2008-05-21 Thread gurudev
Hi We currently host index of size approx 12GB on 5 SOLR slaves machines, which are load balanced under cluster. At some point of time, which is after 8-10 hours, some SOLR slave would give Out of memory error, after which it just stops responding, which then requires restart and after restart it

Re: What are stopwords and protwords ???

2008-05-21 Thread Akeel
Thank you very much for such a detailed reply. can you please tell me how can i interact with solr from within my Java/JSP application ? I mean how to query the solr running at localhost and getting results back in the application. Do i have to change something there in solrconfig.xml ? Please help

Re: SOLR OOM (out of memory) problem

2008-05-21 Thread gurudev
Just to add more: The JVM heap allocated is 6GB with initial heap size as 2GB. We use quadro(which is 8 cpus) on linux servers for SOLR slaves. We use facet searches, sorting. document cache is set to 7 million (which is total documents in index) filtercache 1 gurudev wrote: > > Hi > >

Re: What are stopwords and protwords ???

2008-05-21 Thread gurudev
Hi Akeel -Stopwords are general words of language, which, as such do not contain any meaning in searches like; a,an, the, where, who, am etc. The analyzer in lucene ignores such words and do not index them. You can also specify you own stopwords in stopwords.txt in SOLR -Protwords are the words

Re: Release date of SOLR 1.3

2008-05-21 Thread Dan Thomas
On Mon, May 19, 2008 at 2:49 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : solr release in some time, would it be worth looking at what outstanding > : issues are critical for 1.3 and perhaps pushing some over to 1.4, and > : trying to do a release soon? > > That's what is typically done whe

Re: Release date of SOLR 1.3

2008-05-21 Thread Alexander Ramos Jardim
It is difficult to say such a thing when we consider that Solr is developed by voluntaries that use their free time or time as part of a working project to dedicate to Solr. I think that Solr development is giving us outstanding results. 2008/5/21 Dan Thomas <[EMAIL PROTECTED]>: > On Mon, May 19

Re: Release date of SOLR 1.3

2008-05-21 Thread Andrew Savory
Hi, 2008/5/21 Dan Thomas <[EMAIL PROTECTED]>: > One year between releases is a very long time for such a useful and > dynamic system. Are project leaders willing to (re)consider the > development process to prioritize improvements/features scope into > chunks that can be accomplished in shorter

Solr Text Vs String

2008-05-21 Thread Yerraguntla
Hi, I have incoming field stored both as Text and String field in solr indexed data. When I search the following cases, string field returns documents(from Solr client) and not text fields. NAME:T - no results Name_Str:T - returns documents Similarly for the following cases - CPN*, DPS*, S, I

Re[2]: the time factor

2008-05-21 Thread JLIST
Hello Chris, > it sounds like you only attempted tweaking the boost value, and not > tweaking the function params ... you can change the curve so that really > new things get a large score increase, but older things get less of an > increase. recip(rord(creationDate),1,a,b)^w I was tweaking the

Re: Release date of SOLR 1.3

2008-05-21 Thread Umar Shah
On Wed, May 21, 2008 at 7:40 PM, Andrew Savory <[EMAIL PROTECTED]> wrote: > Hi, > > 2008/5/21 Dan Thomas <[EMAIL PROTECTED]>: > > > One year between releases is a very long time for such a useful and > > dynamic system. Are project leaders willing to (re)consider the > > development process to pr

Re: expression in an fq parameter fails

2008-05-21 Thread Daniel Papasian
Ezra Epstein wrote: storeAvailableDate:[* TO NOW] storeExpirationDate:[NOW TO *] ... This works perfectly. Only trouble is that the two data fields may actually be empty, in which case this filters out such records and we want to include them. I think the easiest thing to do w

RE: expression in an fq parameter fails

2008-05-21 Thread Ezra Epstein
As a work-around that'd work. It means either changing the contents of the data sets or changing the schema and how data are fed to SOLR/Lucene. I'm hoping to be able to put an expression in the fq param instead, if that's supported. -Original Message- From: Daniel Papasian [mailto:[EMAI

Re: SOLR OOM (out of memory) problem

2008-05-21 Thread Otis Gospodnetic
Hi, Does this happen while a new searcher is warming up by any chance? Have you tried decreasing your document cache size? Try that... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: gurudev <[EMAIL PROTECTED]> > To: solr-user@lucene.apach

RE: Exception on the use of dataimport.jar in Full Import Example

2008-05-21 Thread Julio Castillo
Noble Paul, I took a look at the jar files included in the nightly builds and they do not include the dataimport.jar content. So, I assume then that my best approach is to download the corresponding dataimport sources used and build my own dataimport.jar? Thanks ** julio -Original Message---

RE: Exception on the use of dataimport.jar in Full Import Example

2008-05-21 Thread Julio Castillo
OK, I just downloaded the source tree and discovered that the sources for the dataimport handler are not there. I guess I have to download the SOLR-469-contrib.patch I suppose that later the source tree will have a contrib directory formally and not as a patch? Thanks ** julio -Original Me

RE: Exception on the use of dataimport.jar in Full Import Example

2008-05-21 Thread Julio Castillo
You have to excuse me here, but I can't find the contrib sources. I have nothing the apply the patch to. I used the following URL to get the SVN sources (per the website): http://svn.apache.org/repos/asf/lucene/solr/. Sorry, I'm a newbie with Solr, but intend to use it to index my data on the dB

Re: Exception on the use of dataimport.jar in Full Import Example

2008-05-21 Thread Shalin Shekhar Mangar
Hi Julio, Please download the SOLR-469.patch (not the contrib patch) from the SOLR-469 jira issue and apply it to the latest trunk code. I apologize for not keeping the example in the wiki in sync with the latest code. Please let us know here if you face a problem. On Wed, May 21, 2008 at 10:46 P

Re: What are stopwords and protwords ???

2008-05-21 Thread Shalin Shekhar Mangar
Hi Akeel, Take a look at SolrJ which is a Java client library for Solr. It is packaged with the Solr nightly binary downloads. This can be used by your Java/JSP application to add documents or query Solr. No changes to any config files is needed. On Wed, May 21, 2008 at 5:15 PM, Akeel <[EMAIL PRO

Re: What are stopwords and protwords ???

2008-05-21 Thread Shalin Shekhar Mangar
Here's the link to wiki documentation on SolrJ http://wiki.apache.org/solr/Solrj On Wed, May 21, 2008 at 11:09 PM, Shalin Shekhar Mangar <[EMAIL PROTECTED]> wrote: > Hi Akeel, > > Take a look at SolrJ which is a Java client library for Solr. It is > packaged with the Solr nightly binary downloads

RE: expression in an fq parameter fails

2008-05-21 Thread Chris Hostetter
: I'm hoping to be able to put an expression in the fq param instead, if : that's supported. you have to invert your logic. docs that "have not yet expired, or will never expire" match the negacted query for "docs expired in the past"... fq = -storeExpirationDate:[* TO NOW] -Hoss

Re: SOLR OOM (out of memory) problem

2008-05-21 Thread solr
But that means that it can't fit all documents in the cache, doesn't it? The index is 12GB and your allocated heap is 6GB... 12GB > 6GB... /Jimi Quoting gurudev <[EMAIL PROTECTED]>: Just to add more: The JVM heap allocated is 6GB with initial heap size as 2GB. We use quadro(which is 8 cpus

Re: Fetching the first 10 results and the last result

2008-05-21 Thread Mike Klaas
On 21-May-08, at 2:35 AM, Tim Mahy wrote: Hi all, is there a way to let Solr not only return the total number of found articles, but also the data of the last document when for example only requesting the first 10 documents ? we could do this with a seperate query by either letting the se

Re: Release date of SOLR 1.3

2008-05-21 Thread Chris Hostetter
: One year between releases is a very long time for such a useful and : dynamic system. Are project leaders willing to (re)consider the : development process to prioritize improvements/features scope into : chunks that can be accomplished in shorter time frames - say 90 days? : In my experience,

Re: SOLR OOM (out of memory) problem

2008-05-21 Thread Mike Klaas
On 21-May-08, at 4:46 AM, gurudev wrote: Just to add more: The JVM heap allocated is 6GB with initial heap size as 2GB. We use quadro(which is 8 cpus) on linux servers for SOLR slaves. We use facet searches, sorting. document cache is set to 7 million (which is total documents in index) filte

dismax handler and WordDelimiterFilterFactory

2008-05-21 Thread peter360
Hi, Let's say I have an index with two fields: f1 and f2, and queries to both are analyzed using WhiteSpaceTokenizerFactory and WordDelimiterFilterFactory. I use dismax handler for queries and observed the following anomally. Suppose I have a document with f1="american" and f2="idol". Then a s

RE: SOLR OOM (out of memory) problem

2008-05-21 Thread Yongjun Rong
I had the same problem some weeks before. You can try these: 1. Check the hit ratio for the cache via the solr/admin/stats.jsp. If the hit ratio is very low. Just disable those cache. It will save you some memory. 2. set -Xms and -Xmx to the same size will help improve GC performance. 3. Check wha

RE: SOLR OOM (out of memory) problem

2008-05-21 Thread Lance Norskog
We have had major OOM problems doing facet searches. Having 20 searches at once used up maybe 5G and one faceting request would blow up at 12. More important, when a facet request throws an OOM it seems like the memory is not released. When a normal search throws an OOM, the memory is released and

Re: SOLR OOM (out of memory) problem

2008-05-21 Thread Mike Klaas
Facet searches cache a filter per unique term for multivalued fields. There are many ways to reduce memory consumption in these scenarios, but it usually requires a case-by-case solution. -Mike On 21-May-08, at 12:08 PM, Lance Norskog wrote: We have had major OOM problems doing facet searc

Re: Problem getting spelling suggestions to work

2008-05-21 Thread Chris Hostetter
: Thats true, but that's not the problem. The problem is that you can't call : qt=spellchecker if you redefine /select in solrconfig.xml. I was wondering : how I could add qt functionality back. If you override "/select" to bind it to a specific handler, then you lose the abiliy to pick a handle

Re: How to limit number of pages per domain

2008-05-21 Thread Chris Hostetter
: : I'm indexing pages from multiple domains. In any given : result set, I don't want to return more than two links : from the same domain, so that the first few pages won't : be all from the same domain. I suppose I could get more : (say, 100) pages from solr, then sort in memory in the : front-e

Re: How to limit number of pages per domain

2008-05-21 Thread Jonathan Ariel
Sorry. But how field collapsing works? Is there documentation about this anywhere? Thanks! On Wed, May 21, 2008 at 7:02 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > : > : I'm indexing pages from multiple domains. In any given > : result set, I don't want to return more than two links > : from

Re: How to limit number of pages per domain

2008-05-21 Thread Koji Sekiguchi
There is a documentation: http://wiki.apache.org/solr/FieldCollapsing Koji Jonathan Ariel wrote: Sorry. But how field collapsing works? Is there documentation about this anywhere? Thanks!

Delete by multiple query doesn't seem to work

2008-05-21 Thread Tracy Flynn
I'm trying to exploit 'Delete by Query' with multiple IDs in the query. I'm using vanilla SOLR 1.2 My schema specifies. document_id My unique document ids are of the form 'A-xxx' , 'T-xxx" and so on. The following individual delete works: curl http://work:8983/solr/update -H "Content-Type:

Re: What are stopwords and protwords ???

2008-05-21 Thread Grant Ingersoll
See http://lucene.apache.org/solr/tutorial.html. You can also see the wiki for a whole bunch of docs, including links to tutorials, etc. Also, just for future reference, please separate out questions so that they can be addressed separately, and more easily found by others in the future.

Re: Delete by multiple query doesn't seem to work

2008-05-21 Thread Shalin Shekhar Mangar
Not sure, but try using: document_id:"A-395" OR document_id:"A-1949" On Thu, May 22, 2008 at 7:46 AM, Tracy Flynn <[EMAIL PROTECTED]> wrote: > > I'm trying to exploit 'Delete by Query' with multiple IDs in the query. > > I'm using vanilla SOLR 1.2 > > My schema specifies. > > document_id > > My u

Re: What are stopwords and protwords ???

2008-05-21 Thread Akeel
thanks everyone On Thu, May 22, 2008 at 7:18 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > See http://lucene.apache.org/solr/tutorial.html. You can also see the > wiki for a whole bunch of docs, including links to tutorials, etc. > > Also, just for future reference, please separate out questi

Use of entities in the DataImportHandler config file

2008-05-21 Thread Julio Castillo
I'm trying to configure a document config file using the example data-config.xml mentioned in the wiki. One question I have is when to nest the entity tags/nodes in the xml file? The proposed example has them nested as Why didn't the example had a

Re: How to limit number of pages per domain

2008-05-21 Thread Otis Gospodnetic
Actually, the best documentation are really the comments in the JIRA issue itself. Is there anyone actually using Solr with this patch? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Koji Sekiguchi <[EMAIL PROTECTED]> > To: solr-user@lucen

Re: Use of entities in the DataImportHandler config file

2008-05-21 Thread Shalin Shekhar Mangar
Hi Julio, Entities are nested when they have parent-child relationships as in a SQL Join. For example, if your product has categories, you will create an entity for products and a child entity for categories. However, if your entities are totally independent of each other, then you can keep them a

Re: Use of entities in the DataImportHandler config file

2008-05-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
Julio, This is to convert the 1:n and m:n relationships in a DB to multivalued fields in solr. A single sql query ends up giving a 2D matrix where each cell holds one value. It would be harder to denormalize and extract the multivalued fields from a single result set. Check the architecture to see