Problem finding similar documents with MoreLikeThis method.

2006-07-19 Thread Davide
Hi, I used the method MoreLikeThis (in search.similar package) of Lucene to find similar documents, but the result is 0 documents also when I index more times the same document. I don't understand why the search doesn't work... Here I give you the code I used: -

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-19 Thread mark harwood
Looks like the class defaults to only searching a field called "contents". Either: a) call setFieldNames() with null to force the class to use a list of all indexed fields derived from your IndexReader or b) call setFieldNames() with the explicit shortlist of field names you want to match on C

Re: question about custom sort method

2006-07-19 Thread Aleksey Serba
Erik, You can reproduce OutOfMemory easily. I've attach test files - this is altered DistanceSortingTest example from LIA book. Also you can profile it and see caching of distances arrays. I'll try to investigate the problem, make patch to trunk version (probably non caching option) and get back

Re: Part of Index (spezial Field) into Memory

2006-07-19 Thread neils
Hi, ok, i tried today and it works greate :-) Thanks a lot for your help. ...one question at least... Is sorting not possible with this Parallelreader? I get an error. Here my code: Private Sub LoadParallelIndex() Ram = New Lucene.Net.Store.RAMDirectory("C:\Lucene\index0_Name")

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-19 Thread Davide
mark harwood wrote: > Looks like the class defaults to only searching a field called "contents". > > Either: > a) call setFieldNames() with null to force the class to use a list of all > indexed fields derived from your IndexReader > or > b) call setFieldNames() with the explicit shortlist of fi

Re: Index-Format difference between 1.4.3 and 2.0

2006-07-19 Thread lude
Hi Nicolas, thanks for answering. You wrote: And about Luke, ASAIK too, is a Lucene-2 app, so it will be able to read a 1.4 What do you mean? The luke website stated: "Current version is 0.6. It has been released on 16 Feb 2005." How can Luke be a Lucene-2 application if it was released on F

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-19 Thread mark harwood
>>if (fr != null){ >>System.out.println("Parsing FileReader: " + fr); >>query = mlt.like(fr); Not clear from your code but "fr" isn't the same object as "fileReader" is it? If so, that could be positioned at the end of the file and MoreLikeThis would therefore read nothing. - Origi

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-19 Thread Davide
Thanks Mark, Yes "fr" is fileReader but I don't think is positioned at the end of file because the same problem occurs when I pass to MoreLikeThis the File (C:\\Document.txt) instead of a fileReader... So also if I write: MoreLikeThis mlt = new MoreLikeThis(ir); Query query = mlt.like(new File(

Re: Index-Format difference between 1.4.3 and 2.0

2006-07-19 Thread Nicolas Lalevée
Le Mercredi 19 Juillet 2006 12:32, lude a écrit : > Hi Nicolas, > > thanks for answering. > > You wrote: > > And about Luke, ASAIK too, is a Lucene-2 app, so it will be able to read > > a > > 1.4 > > What do you mean? > The luke website stated: "Current version is 0.6. It has been released on > 16

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-19 Thread mark harwood
Does your index have only the one document? MoreLikeThis will only generate queries with terms that occur in more than "minDocFreq" (default setting is 5). This is to avoid the large overheads associated with searching for very common words in your example text. - Original Message

Re: Accessing "term frequency information" for documents

2006-07-19 Thread Grant Ingersoll
You should take a look at the Term Vector classes. See the "Lucene In Action" book or my talk at ApacheCon last year on http:// www.cnlp.org/apachecon2005 -Grant On Jul 19, 2006, at 12:48 AM, ericbae wrote: Hello. What I want to access through Lucene is this. I search for documents by i

Lock obtain time out

2006-07-19 Thread Pasquale Imbemba
Hi, I am checking a txt file with entries against an index generated with Lucene. Of the enclosed Searcher.java class, I use the isInLex(String noun) method, i.e. I read every line of the txt file and compare using isInLex(String noun) against the index. If it's contained it returns true othe

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-19 Thread Davide
mark harwood wrote: > Does your index have only the one document? > > MoreLikeThis will only generate queries with terms that occur in more than > "minDocFreq" (default setting is 5). > > This is to avoid the large overheads associated with searching for very > common words in your example tex

Re: Empty fields ...

2006-07-19 Thread Dragon Fly
My index gets rebuilt every night so I probably can afford to construct the filters right after the index is rebuilt. How do I check each document (for empty fields) though? Would I use an IndexReader to loop through the documents? If so, which method(s) in the IndexReader class should I use? ter

Lucene with Simple Database

2006-07-19 Thread Puneet Lakhina
hi, I have till now used lucene mainly for searching through text files. I wanted to know if its sensible to use lucene with a database which does not have fields with large text values. for e.g. a table like id NameAddress 1Name1 name1,strreet1,city1,country1 2

Re: Empty fields ...

2006-07-19 Thread Erick Erickson
Try something like TermDocs termDocs = reader.termDocs(); termDocs.seek(new Term("", "")); while (termDocs.next()) { bits.set(termDocs.doc()); } I *think* (and I'm remembering things folks wrote, haven't done this myself) that the empty string for the Term matches all terms. If not, y

Re: Lucene with Simple Database

2006-07-19 Thread Erick Erickson
Well, it depends. Are you having performance problems with a database solution? If not, why in the world would you want to introduce another layer of complexity? Personally, while I think Lucene is great, I wouldn't recommend it in the situation you describe unless you are having problems with th

Re: Empty fields ...

2006-07-19 Thread Dragon Fly
Thank you very much. From: "Erick Erickson" <[EMAIL PROTECTED]> Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: Empty fields ... Date: Wed, 19 Jul 2006 09:48:04 -0400 Try something like TermDocs termDocs = reader.termDocs(); termDocs.seek(new Term("",

Re: Lock obtain time out

2006-07-19 Thread Michael McCandless
I would be grateful for some tip as this is my first approach to Lucene... Is it your IndexSearcher instantiation that's raising the Lock obtain time out exception? Can you look in your java.io.tmpdir and see if there are any Lucene lock files present even when Lucene is not running? If s

BooleanQuery question

2006-07-19 Thread Nicolas Labrot
Hi, I have made a simple class that parse an XML boolean expression to create predefined query . Here is an unroll construction from an xml topic which reduce the search on path "/bssrs" and exclude the file "abstract.htm" : subsubTermQuery1 = new TermQuery(new Term("FILE", "abstract

Re: Empty fields ...

2006-07-19 Thread Chris Hostetter
: TermDocs termDocs = reader.termDocs(); : termDocs.seek(new Term("", "")); : while (termDocs.next()) { : bits.set(termDocs.doc()); : } : : I *think* (and I'm remembering things folks wrote, haven't done this myself) : that the empty string for the Term matches all terms. If not, you m

Re: BooleanQuery question

2006-07-19 Thread Chris Hostetter
: If I search with boolQuery, Lucene doesn't find anything. : If I modify by hand the query from "+(-(FILE:abstract.htm)) : +(PATH:/bssrs)" to "-(FILE:abstract.htm) +(PATH:/bssrs)", Lucene find : the correct list of document. : : Does somebody know why ? you can't have a boolean query containing

Re: BooleanQuery question

2006-07-19 Thread Nicolas Labrot
In my mind this restriction only apply on a query with just a MUST_NOT clause and not to a composed query. I've wrong. thanks a lot, Nicolas : If I search with boolQuery, Lucene doesn't find anything. : If I modify by hand the query from "+(-(FILE:abstract.htm)) : +(PATH:/bssrs)" to "-(FILE:

Re: BooleanQuery question

2006-07-19 Thread Chris Hostetter
: In my mind this restriction only apply on a query with just a MUST_NOT : clause and not to a composed query. I've wrong. right ... it's an issue for any BooleanQuery, regardless of how that query may be wrapped in other boolean queries. -Hoss --

Re: Empty fields ...

2006-07-19 Thread Erick Erickson
Ok, I'm confused again, not unusual To create a bitset for the following condition Zip IS NOT NULL why invert the bitset? a token containing the empty string matches documents that contain that token Isn't this exactly what he wants? Or am I mis-reading this? I'm reading it as "any do

Re: Empty fields ...

2006-07-19 Thread Chris Hostetter
: Zip IS NOT NULL : : why invert the bitset? i think the orriginal request was to find all docs where the field did *not* have any value ... or in your vernacular: where Zip IS NULL : a token containing the empty string matches documents that : > contain that token : > : : Isn't this exactly what

Re: Empty fields ...

2006-07-19 Thread Erick Erickson
Chris: Thanks much for that clarification, it helps a lot. The original request was to find docs wthat were NOT NULL, so I'm glad I'm not the only one who remembers things...er...incongruently with reality About empty values for a field. That'll teach me to try to remember without looking

Re: Part of Index (spezial Field) into Memory

2006-07-19 Thread Yonik Seeley
Hmmm, Lucene.Net eh? There have been a number of bug fixes to ParallelReader over the last month or two... I wonder if they have been ported to Lucene.Net yet... (I'm not sure how it's maintained). You could file a Lucene.net bug, or make a test for the Java version of Lucene and try it out. -Yo

NFS/iSCSI SAN with Lucene

2006-07-19 Thread Peter Kim
Hi, I did a search on the Lucene list archives, found a lot of posts about the use of Lucene with NFS and how there are locking issues, but don't see anybody coming to a real solution to this. Here's the most promising thread I found: http://www.gossamer-threads.com/lists/lucene/java-user/8302?sea

Re: Lock obtain time out (&OT: Mailing list settings)

2006-07-19 Thread Pasquale Imbemba
Michael McCandless wrote: I would be grateful for some tip as this is my first approach to Lucene... Is it your IndexSearcher instantiation that's raising the Lock obtain time out exception? Yes that's true. I removed the lock files and that was the problem. Thanks a lot. BTW is it po

Re: Lock obtain time out (&OT: Mailing list settings)

2006-07-19 Thread Michael McCandless
BTW is it possible to set mailinglist so to obtain my own message in the inbox, i.e. when sending to the ML, I get a copy as well as all other subscribers? If you are subscribed to java-user then you should have received your own original message (and my response) to the list -- are you not

Re: Lock obtain time out (&OT: Mailing list settings)

2006-07-19 Thread Pasquale Imbemba
Michael McCandless wrote: If you are subscribed to java-user then you should have received your own original message (and my response) to the list -- are you not seeing that? No, I didn't receive mine --just yours (and those of others of course). Pasquale ---

Re: NFS/iSCSI SAN with Lucene

2006-07-19 Thread Michael McCandless
I did a search on the Lucene list archives, found a lot of posts about the use of Lucene with NFS and how there are locking issues, but don't see anybody coming to a real solution to this. We are trying to fix this. Many people seem to hit it. The current plan is to first decouple the Lockin

Re: NFS/iSCSI SAN with Lucene

2006-07-19 Thread Michael McCandless
I did a search on the Lucene list archives, found a lot of posts about the use of Lucene with NFS and how there are locking issues, but don't see anybody coming to a real solution to this. We are trying to fix this. Many people seem to hit it. The current plan is to first decouple the Lockin

Lucene support for OpenDocument?

2006-07-19 Thread marbux
Hello, The OpenDocument Fellowship attempts to maintain a directory of applicatiopns supporting OpenDocument file formats. < http://www.opendocumentfellowship.org/applicationsa>. I have been attempting, without success, to determine whether Lucene supports OpenDocument and if so to what extent, w

Re: Lucene support for OpenDocument?

2006-07-19 Thread Daniel Noll
marbux wrote: Hello, The OpenDocument Fellowship attempts to maintain a directory of applicatiopns supporting OpenDocument file formats. < http://www.opendocumentfellowship.org/applicationsa>. I have been attempting, without success, to determine whether Lucene supports OpenDocument and if so to

Query does not work past 26 characters?!

2006-07-19 Thread Michael Prichard
Tell me I am totally missing something here I created an index w/ StandardAnalyzer with two fields as follows: Document doc = new Document(); doc.add(new Field("to", "[EMAIL PROTECTED]", Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field("content", "blah3 blah3 blah3", Field.Store.Y

Can lucene query this result?

2006-07-19 Thread James liu
for example: $sql = "select count(*), user_group from groups where uid>0 group by user_group; can lucene query this result?

Re: Can lucene query this result?

2006-07-19 Thread Otis Gospodnetic
No. Lucene is not a relational database and doesn't speak SQL. Otis - Original Message From: James liu <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, July 19, 2006 11:34:00 PM Subject: Can lucene query this result? for example: $sql = "select count(*), user_group

Re: Can lucene query this result?

2006-07-19 Thread Otis Gospodnetic
Well, you could use a range query with the "right side" of the query open/null, but this is not really what Lucene is designed for. Otis - Original Message From: Otis Gospodnetic <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, July 20, 2006 12:31:29 AM Subject: Re: C

Re: Query does not work past 26 characters?!

2006-07-19 Thread Doron Cohen
> doc.add(new Field("to", > "[EMAIL PROTECTED]", > ... > PrefixQuery pq = new PrefixQuery(new Term("to", > "[EMAIL PROTECTED]")); Perhaps a typo in the query text - Indexed text: "[EMAIL PROTECTED]" Searched text: "[EMAIL PROTECTED]" The searched text is not a prefix of the indexe

Re: Can lucene query this result?

2006-07-19 Thread James liu
You misunderstand. This sql just tell what i wanna do. i have five user_group and i wanna group result which lucene do. 2006/7/20, Otis Gospodnetic <[EMAIL PROTECTED]>: No. Lucene is not a relational database and doesn't speak SQL. Otis - Original Message From: James liu <[EMAIL P

Re: Can lucene query this result?

2006-07-19 Thread Heng Mei
I don't think there's an easy "built-in" way for Lucene to do this. What you can do is implement a HitCollector to process each doc hit and maintain a count for each user_group. You'll need to preload a doc_id -> user_group mapping. (Take a look at the code for FieldCacheImpl.getInts() for samp