RE: Using multiple filters

2008-01-03 Thread Rakesh Shete
Hey thanks Eric. This should help me. --Regards Rakesh S > Date: Thu, 3 Jan 2008 13:31:32 -0500 > From: [EMAIL PROTECTED] > To: java-user@lucene.apache.org > Subject: Re: Using multiple filters > > You have to put lucene-misc.jar (which you should have in your > lucene/contrib/misc directory

lucene performance issues

2008-01-03 Thread Oscar Usifer
Folks, We're running into some performance bottle neck issues while running lucene search against our indices (approx 1.5 GB in size after optimization), and the search query seems to block on a sychronized read as follows. Obviously we can upgrade to the latest as a first step. When Tomcat r

RE: Eclipse bundle

2008-01-03 Thread Beyer,Nathan
Taking a guess here, I think this question could be rephrased as ... Is there an OSGi bundle(s) the exposes the Lucene APIs and is available in a Maven repository? For OSGi stuff, check out the Eclipse Orbit project [1], which wraps up third-party libraries into OSGi bundles and there is a Lucene

Re: boost scores with non-content based information

2008-01-03 Thread Ted Chen
Mark and Steven, thanks for the link. It's very helpful. And, sorry for overlooking mark's reply. Ted - Original Message From: Steven A Rowe <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, January 3, 2008 12:44:14 PM Subject: RE: boost scores with non-content base

Re: Synonyms and Ranking

2008-01-03 Thread Michael Stoppelman
Hi all, Would this approach be recommended for stemmed words as well. For example let say the original word is 'mower', I want matches on 'mow', 'mowing' and 'mowers' but the most relevance would obviously be matches for 'mower'. Should I index my documents unstemmed and then stem at the query wor

Re: Reading field parameters from XML

2008-01-03 Thread Michael Mitiaguin
My initial concern was not about efficiency , but that using xml to make code more generic results in far from from elegant if/else checks for each of Field.Store , Field.Index , Field.TermVector. I'd prefer suggested way to make initial preparations to create the maps and then have an assignm

Re: Suggested number of fields limit per Index

2008-01-03 Thread Briggs
Sorry. Anyway, back on track. On Jan 3, 2008 3:25 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : Ummm, Chris, I don't know why you posted this here. We're all on > : track as far as I can tell. Or is this a trap to say that I have > : changed the subject and am now talking about thread hija

RE: boost scores with non-content based information

2008-01-03 Thread Steven A Rowe
Hi Ted, On 01/03/2008 at 3:35 PM, Ted Chen wrote: > I'd like to make sure that my search engine can take into > account of some non-content based factors. [snip] > P.S. My last email didn't get any response. Au contraire, mon frère: http://www.nabble.com/modify-search-result-scores-td14588970.h

boost scores with non-content based information

2008-01-03 Thread Ted Chen
Hi, I'd like to make sure that my search engine can take into account of some non-content based factors. For example, I'd like to give more score to popular docs based on # of views each document had. Another example would be to return results with search history (e.g. if we found that mos

Re: Suggested number of fields limit per Index

2008-01-03 Thread Chris Hostetter
: Ummm, Chris, I don't know why you posted this here. We're all on : track as far as I can tell. Or is this a trap to say that I have : changed the subject and am now talking about thread hijacking? But, I : suppose that would have been you. ;-) This is my standard reply to anyone (i notice) wh

Re: Doubts about indexing the localhost ROOT using Dutch 0.8.1

2008-01-03 Thread Chris Hostetter
: To: java-user@lucene.apache.org : Subject: Doubts about indexing the localhost ROOT using Dutch 0.8.1 : : Hello everyone, : : I'm seeing the tutorial about Dutch WebSearcher in this URL: : http://peterpuwang.googlepages.com/NutchGuideForDummies.htm I'm not sure what "Dutch" is, but your quest

Re: Suggested number of fields limit per Index

2008-01-03 Thread Briggs
Ummm, Chris, I don't know why you posted this here. We're all on track as far as I can tell. Or is this a trap to say that I have changed the subject and am now talking about thread hijacking? But, I suppose that would have been you. ;-) Briggs. On Jan 3, 2008 2:10 PM, Chris Hostetter <[EMAIL

Re: Eclipse bundle

2008-01-03 Thread Chris Hostetter
: Is there a mavenized Eclipse bundle that exposes the Lucene API? there might be ... it depends on what a "mavenized Eclipse bundle" is. (googling for those keywords turns up this exact question, and a bunch of threads/docs on building ecplise plugins with maven ... so i'm really not sure wha

RE: Prioiritze new documents

2008-01-03 Thread Seneviratne_Yasoja
IMHO it would be nice if Lucene's Similarity formula took the indexed-date of the document into account. Ideally as an optional setting, where the user can provide a date field as well. Some of the other search engines do - for example Fast's Instream. It makes sense that as documents age over t

Eclipse bundle

2008-01-03 Thread tgospodinov
Is there a mavenized Eclipse bundle that exposes the Lucene API? -- View this message in context: http://www.nabble.com/Eclipse-bundle-tp14603460p14603460.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: Suggested number of fields limit per Index

2008-01-03 Thread Chris Hostetter
: Subject: Suggested number of fields limit per Index : In-Reply-To: <[EMAIL PROTECTED]> : References: <[EMAIL PROTECTED]> http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing mess

Re: Doubts about indexing the localhost ROOT using Dutch 0.8.1

2008-01-03 Thread Jesiel Trevisan
wow I found the error, I have to http://localhost/MyWebSite/index.jsp into the Urls´s txt file... the dutch does not fond the JSP files starting /ROOT/ ... I need to put /ROOT/Index.jsp .. Well, thanks anyway ;-D On Jan 3, 2008 3:44 PM, Jesiel Trevisan <[EMAIL PROTECTED]> wrote: > Hello eve

RE: Is there a mavenized Lucene bundle in the apache maven repo and what's the url?

2008-01-03 Thread tgospodinov
Is there a mavenized eclipse bundle out there? Steven Rowe wrote: > > Hi, > > It's in the global maven repo at: > > http://repo1.maven.org/maven2/org/apache/lucene/ > > The 2.2.0 core jar is at: > > http://repo1.maven.org/maven2/org/apache/lucene/lucene-core/2.2.0/ > > Steve > > On 01/03/

Re: Suggested number of fields limit per Index

2008-01-03 Thread Grant Ingersoll
My suggestion would be: An "all" field that captures all your attributes and allows for generic, easy search across all products. Additionally, go ahead and index all your fields per documents. Then, for your default search, use the all field. _IF_ you know what category of products you a

Re: Using multiple filters

2008-01-03 Thread Erick Erickson
You have to put lucene-misc.jar (which you should have in your lucene/contrib/misc directory Erick On Jan 3, 2008 10:56 AM, Rakesh Shete <[EMAIL PROTECTED]> wrote: > > Here is the link I found on googgling: > > > http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/misc/ChainedFilter.html >

RE: Suggested number of fields limit per Index

2008-01-03 Thread Dai, Chunhe
Thank all of your guys that made suggestions. I greatly appreciate them. Our issue is that, our data have the notion of family, for example, a Product family could contains products like TV, Car, DVD, etc. Of course, each individual set of the product would have its own set of definition - which c

Re: Suggested number of fields limit per Index

2008-01-03 Thread Grant Ingersoll
Another issues is how to generate queries. If you have hundreds of fields, you may have to generate queries (e.g. using the MultfieldQueryParser) across all those fields just to find documents that _could_ have those fields. This can lead to the dreaded TooManyClausesException. That bei

Doubts about indexing the localhost ROOT using Dutch 0.8.1

2008-01-03 Thread Jesiel Trevisan
Hello everyone, I'm seeing the tutorial about Dutch WebSearcher in this URL: http://peterpuwang.googlepages.com/NutchGuideForDummies.htm I'm using ducth version 0.8.1 and JVM 1.4.2 I have some doubts about it, for example: I'm trying to index http://localhost/MyWebSite/ ... but I could do it b

Re: Is there a mavenized Lucene bundle in the apache maven repo and what's the url?

2008-01-03 Thread Briggs
Yeah, I forgot to mention that the stuff on the apache site is the 2.3 development stuff. Sorry about that. Heh, I forgot that it was actually out on the maven mirrors and such. Doh! On Jan 3, 2008 11:42 AM, Steven A Rowe <[EMAIL PROTECTED]> wrote: > Hi, > > It's in the global maven repo at: > >

Re: Suggested number of fields limit per Index

2008-01-03 Thread Briggs
I'll give a quick opinion, and remember that is all it is. Without more information of the types of documents your are storing, I would say you are definitely going in the wrong direction. In my opinion, an index should describe the common attributes of all the documents it contains. You should

Re: Suggested number of fields limit per Index

2008-01-03 Thread mark harwood
One thing to watch out for is the "norms" overhead which is one byte per field, per document. These are byte arrays used in scoring to account for the length of fields in individual documents. With hundreds of fields and millions of documents this can eat up memory. The good news is you can opt

Suggested number of fields limit per Index

2008-01-03 Thread Dai, Chunhe
I have been searching online could not find an exact answer; and wondering if anyone here knows whether there is a preferred max number of fields limit in lucene index? We are in the process of deciding how our index would look like in our lucene integration. For one of our approach, we could have

RE: Is there a mavenized Lucene bundle in the apache maven repo and what's the url?

2008-01-03 Thread Steven A Rowe
Hi, It's in the global maven repo at: http://repo1.maven.org/maven2/org/apache/lucene/ The 2.2.0 core jar is at: http://repo1.maven.org/maven2/org/apache/lucene/lucene-core/2.2.0/ Steve On 01/03/2008 at 11:26 AM, tgospodinov wrote: > > I couldn't find the url to the lucene maven repo if ther

Re: Is there a mavenized Lucene bundle in the apache maven repo and what's the url?

2008-01-03 Thread Briggs
Look at the news section for december 24: http://lucene.apache.org/java/docs/index.html It's @ http://people.apache.org/maven-snapshot-repository/org/apache/lucene/ On Jan 3, 2008 11:26 AM, tgospodinov <[EMAIL PROTECTED]> wrote: > > I couldn't find the url to the lucene maven repo if there's on

Is there a mavenized Lucene bundle in the apache maven repo and what's the url?

2008-01-03 Thread tgospodinov
I couldn't find the url to the lucene maven repo if there's one. There is an old version in the glabal maven repo (1.4.2, i think), but I need 2.2.0. Thanks -- View this message in context: http://www.nabble.com/Is-there-a-mavenized-Lucene-bundle-in-the-apache-maven-repo-and-what%27s-the-url--t

RE: Using multiple filters

2008-01-03 Thread Rakesh Shete
Here is the link I found on googgling: http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/misc/ChainedFilter.html > From: [EMAIL PROTECTED] > To: java-user@lucene.apache.org > Subject: RE: Using multiple filters > Date: Thu, 3 Jan 2008 21:24:35 +0530 > > > Hi Eric, Mark, > > I am usin

RE: Using multiple filters

2008-01-03 Thread Rakesh Shete
Hi Eric, Mark, I am using Lucene 2.2.0 and I dont see anything like ChainedFilter or BooleanFilter. Is that not supported now? Googling on it I found that the ChainedFilter is in some *misc* package. So I believe it has either been dropped or not shipped with the official version. -- Regards,

Re: Example using filters

2008-01-03 Thread Erick Erickson
Filters are certainly a valid way to go. There are other approaches that have been discussed at length on the user list. Here's a link to a searchable user-list archive... http://www.gossamer-threads.com/lists/lucene/java-user/ I'm certain that there are examples there. Best Erick On Jan 3, 200

Re: Example using filters

2008-01-03 Thread tgospodinov
I am using a boolean query that is composed of wildcard queries that have * around each search term (*search* and *term*). Is there another way to achieve the same result and stay away from wildcards? Thanks for the help. Erick Erickson wrote: > > See Lucene In Action, well worth it even if it

Re: Reading field parameters from XML

2008-01-03 Thread Erick Erickson
Do you have any evidence at all that this is worth the effort? I assume that you're worried about efficiency. In my experience, this is *very* often a mis-placed concern. And "efficient" code that saves, say, even 1% of my run time is *not* worth the hours/days/weeks spent creating and *maintaining

Re: Which nutch version works on JVM 1.4.x ???

2008-01-03 Thread Jesiel Trevisan
Thanks Grant, I was no access this site in this morning, but now I get this information. Tks. On Jan 3, 2008 10:27 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Have a look at Nutch: http://lucene.apache.org/nutch > > -Grant > > On Jan 3, 2008, at 6:21 AM, Jesiel Trevisan wrote: > > > Hi ev

Re: Which nutch version works on JVM 1.4.x ???

2008-01-03 Thread Grant Ingersoll
Have a look at Nutch: http://lucene.apache.org/nutch -Grant On Jan 3, 2008, at 6:21 AM, Jesiel Trevisan wrote: Hi everyone, I need some websearch functions, like Spider, than, I will use the Nutch API´s I would like to know witch JVM version support run the nutch I got the Nutch release 0

Which nutch version works on JVM 1.4.x ???

2008-01-03 Thread Jesiel Trevisan
Hi everyone, I need some websearch functions, like Spider, than, I will use the Nutch API´s I would like to know witch JVM version support run the nutch I got the Nutch release 0.9 and I´m seeing that support only JVM 1.5 I'm using the JVM 1.4.2 Does Nuch 0.8.1 work with it ? Tks.