RE: Leaking org.apache.lucene.index.* objects

2007-02-09 Thread Chris Hostetter
: Its funny, but I'm having a memory leak with Hibernate that I spent the : whole of yesterday banging my head against a wall about and so when : searching for emails with Leak in the title came across your message. : I'm probably going to hit the same problem as you for long running : multi-threa

Re: Merge factor problem,

2007-02-09 Thread Daniel Naber
On Friday 09 February 2007 17:14, Sairaj Sunil wrote: > I have increased the merge factor from 10 to 50. Please try increasing setMaxBufferedDocs() instead, does that help? Regards Daniel -- http://www.danielnaber.de - To un

Re : Re: Re : Re: Re : Re: Re : Re: Re : Re: Question concerning Analyzers

2007-02-09 Thread Xavier To
Thanks a lot for all your help. I guess this temporary fix will have to do until I have clearance to post some code. For the current index (that was last modified over a year ago), it works fine, but I know it's not properly done. Thank you all very much, especially you Mr Erickson. Xavier Tô B

Re: Reduction based "more like this"?

2007-02-09 Thread Bill Janssen
> For example, given terms "female", "John" and "London" - all 3 may > have equal IDF but should a document representing a female in London > be given equal weighting to a document representing the rarer example > of a female who happens to be called "John"? Not to mention multi-word phrase tokeni

Re: Highlighter returning incomplete field text

2007-02-09 Thread Erick Erickson
Also, there's a default of 10,000 tokens per field at index time Erick On 2/9/07, mark harwood <[EMAIL PROTECTED]> wrote: See Highlighter.setMaxDocBytesToAnalyze(int byteCount) It's default setting is limited in order to avoid excessive response times. Cheers Mark - Original Messag

Re: Re : Re: Re : Re: Re : Re: Re : Re: Question concerning Analyzers

2007-02-09 Thread Erick Erickson
The query should be tokenized *by the query parser*. You shouldn't have to do the tokenizing yourself. When you print out the results of the parsing, you should see something like field:value1 field:value2, which are built up under the covers to be a BooleanQuery with a bunch of clauses. I think,

Re: Highlighter returning incomplete field text

2007-02-09 Thread mark harwood
See Highlighter.setMaxDocBytesToAnalyze(int byteCount) It's default setting is limited in order to avoid excessive response times. Cheers Mark - Original Message From: Fred Eaker <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Friday, 9 February, 2007 4:28:36 PM Subject: High

Highlighter returning incomplete field text

2007-02-09 Thread Fred Eaker
Is there a limit to how many characters a Highlighter or NullFragmenter will return? I have indexed an entire HTML document (145kb). When I use the highlighter with a NullFragmenter, the getBestFragment and getBestFragments methods return the text of the field up to 51316 characters. I have tried

RE: Lucene Web Service

2007-02-09 Thread Graham Stead
Solr provides an XML interface to everything: index adds, deletes, updates, searches, highlights, explanations, facets, commits, and optimize statements. I'm sure I've forgotten some :) It also supports JSON, as well as some other formats, if you prefer that. The Solr wiki explains how it works.

Merge factor problem,

2007-02-09 Thread Sairaj Sunil
Hi all, I have increased the merge factor from 10 to 50. I thought the indexing performance will be better. But the time taken taken to index is more than the time taken for the merge factor of 10. The documentation and some articles say that the time taken to index will improve if the merge facto

Re: categorisation

2007-02-09 Thread Erik Hatcher
On Feb 9, 2007, at 9:13 AM, Kainth, Sachin wrote: What does solr provide and how can I use it with dotLucene? Have a 10 minute dedicated look at http://lucene.apache.org/solr - download the latest binary distribution, follow along with the tutorial. After that, you'll know almost everythi

RE: Lucene Web Service

2007-02-09 Thread Kainth, Sachin
But would it still use the Java version of Lucene? Are you saying that unlike Lucene Web Service, Solr can be used via .NET code? Do they both still use the Java version of Lucene though? Let me explain what I want to do. I want to be able to set up a dedicated machine for dotLucene so that ind

Replication of RAMDirectory across multiple WebSphere servers

2007-02-09 Thread Philip Brown
Does anybody have any experience with setting up a Lucene RAMDirectory index for replication across multiple WebSphere servers and taking advantage of WebSphere's built-in Object Cache? We are currently re-building/refreshing from the source the entire RAMDirectory index on each WebSphere server

Re: Lucene Web Service

2007-02-09 Thread Patrick Kimber
Hi You could try SOLR http://lucene.apache.org/solr/ This is obviously Java but you can access it using .NET... Hope this helps Patrick On 09/02/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: Hello all, Does anyone know if there is a .NET version of Lucene Web Service? Cheers This email a

Lucene Web Service

2007-02-09 Thread Kainth, Sachin
Hello all, Does anyone know if there is a .NET version of Lucene Web Service? Cheers This email and any attached files are confidential and copyright protected. If you are not the addressee, any dissemination of this communication is strictly prohibited. Unless otherwise expressly agreed in w

RE: Leaking org.apache.lucene.index.* objects

2007-02-09 Thread Halsey, Stephen
Hi Otis, Its funny, but I'm having a memory leak with Hibernate that I spent the whole of yesterday banging my head against a wall about and so when searching for emails with Leak in the title came across your message. I'm probably going to hit the same problem as you for long running multi-thread

RE: categorisation

2007-02-09 Thread Kainth, Sachin
What does solr provide and how can I use it with dotLucene? -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 09 February 2007 14:11 To: java-user@lucene.apache.org Subject: Re: categorisation On Feb 9, 2007, at 9:08 AM, Kainth, Sachin wrote: > Are you saying that w

Re: categorisation

2007-02-09 Thread Erik Hatcher
On Feb 9, 2007, at 9:08 AM, Kainth, Sachin wrote: Are you saying that without solr I will have caching problems under load? no, not at all. i'm saying you'll likely reinvent a lot of what solr already provides, in order to _scale_ that is. --

RE: categorisation

2007-02-09 Thread Kainth, Sachin
Are you saying that without solr I will have caching problems under load? -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 09 February 2007 14:06 To: java-user@lucene.apache.org Subject: Re: categorisation On Feb 9, 2007, at 7:07 AM, Kainth, Sachin wrote: > But doe

Re: categorisation

2007-02-09 Thread Erik Hatcher
On Feb 9, 2007, at 7:07 AM, Kainth, Sachin wrote: But does that not imply that a second search is made against the index by the line: BitSet all = (new QueryFilter(q)).bits(reader) Yeah, if you want to return facet counts and results in the same sweep, yes. If all you want are the counts,

Re : Re: Re : Re: Re : Re: Re : Re: Question concerning Analyzers

2007-02-09 Thread Xavier To
Hey, thanks a lot for taking so much time here... I did check the and they appear to be the same...at least they are same class and same package. I just noticed something : they are using LowerCaseFilter I was going to say "could it be the source of the numbers being ignored ?" but it shoul

RE: categorisation

2007-02-09 Thread Kainth, Sachin
But does that not imply that a second search is made against the index by the line: BitSet all = (new QueryFilter(q)).bits(reader) -Original Message- From: Kainth, Sachin [mailto:[EMAIL PROTECTED] Sent: 09 February 2007 12:05 To: java-user@lucene.apache.org Subject: RE: categorisation A

RE: categorisation

2007-02-09 Thread Kainth, Sachin
Ahhh it all makes sense to me now :-) -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 09 February 2007 12:01 To: java-user@lucene.apache.org Subject: Re: categorisation On Feb 9, 2007, at 5:40 AM, Kainth, Sachin wrote: > It makes sense to me only if you tell me th

Re: categorisation

2007-02-09 Thread Erik Hatcher
On Feb 9, 2007, at 5:40 AM, Kainth, Sachin wrote: It makes sense to me only if you tell me that all the bits in the BitSet "all" will be 1. well, ok, so the "all" may be misleading. call it queryBits instead then :) "all" means *all documents that match the query*, though. it wouldn't

Re: Reduction based "more like this"?

2007-02-09 Thread mark harwood
The distinguishing characteristics you mark out and put in a field may not be so distinguishing as more content is added to an index (e.g. use of new terminology like "podcast" becomes more prevalent). Maintaining/regenerating this field in anything other than a static index then starts to look

RE: Empty search

2007-02-09 Thread Kainth, Sachin
You are right I didn't think about it at all to be honest. -Original Message- From: karl wettin [mailto:[EMAIL PROTECTED] Sent: 09 February 2007 10:46 To: java-user@lucene.apache.org Subject: Re: Empty search 9 feb 2007 kl. 11.34 skrev Kainth, Sachin: > Yep it is the queryparser that

Re: Empty search

2007-02-09 Thread karl wettin
9 feb 2007 kl. 11.34 skrev Kainth, Sachin: Yep it is the queryparser that I'm referring to. Just sounds odd to me. An empty string search should be handled properly I think. It should simply to nothing. I did not look any closer at this than reading you post, but what about if you made

RE: categorisation

2007-02-09 Thread Kainth, Sachin
It makes sense to me only if you tell me that all the bits in the BitSet "all" will be 1. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 08 February 2007 18:37 To: java-user@lucene.apache.org Subject: Re: categorisation On Feb 8, 2007, at 12:36 PM, Kainth, Sachin

RE: Empty search

2007-02-09 Thread Kainth, Sachin
Yep it is the queryparser that I'm referring to. Just sounds odd to me. An empty string search should be handled properly I think. It should simply to nothing. -Original Message- From: karl wettin [mailto:[EMAIL PROTECTED] Sent: 08 February 2007 18:05 To: java-user@lucene.apache.org S

Reduction based "more like this"?

2007-02-09 Thread karl wettin
I just woke up thinking it would be cool to attempt reducing the data of all documents using PCA (or so) and store the output in a new field per dimention introduced in order to find similair documents by placing a simple proximity query. Did anyone attempt something like this? I did not