Re: Negative Filtering (such as for profanity)

2007-03-12 Thread Daniel Noll
I _think_ Lucene 2.1 (or is it trunk?, I lose track) has the ability to delete all documents containing a term. Actually it's been in IndexReader for longer than I can remember. We're still on 1.4.3 and it's in there. (Only difference in 2.1 is that it's now on IndexWriter as well.) Dan

Performance between Filter and HitCollector?

2007-03-12 Thread Antony Bowesman
There are (at least) two ways to generate a BitSet which can be used for filtering. Filter.bits() BitSet bits = new BitSet(reader.maxDoc()); TermDocs td = reader.termDocs(new Term("field", "text"); while (td.next()) { bits.set(td.doc()); } return bits; and HitCollector.collec

Re: Possible Memory leak in CSIndexInput

2007-03-12 Thread Chris Hostetter
: : I have a question regarding the close() CSIndexInput class, why there is : no close operation defined... : Like ... : base.close() i'm not expert on this kind of thing, but since CSIndexInput only reads from a portion of another IndexInput, closing base seems like a bad idea ... super.close()

Re: indexing .doc

2007-03-12 Thread ashwin kumar
thanks regards ashwin On 3/13/07, karl wettin <[EMAIL PROTECTED]> wrote: 13 mar 2007 kl. 04.01 skrev ashwin kumar: > can u please give any links to poi forums Sure, http://www.google.com/search?q=poi+forum -- karl > > On 3/13/07, karl wettin <[EMAIL PROTECTED]> wrote: >> >> >> 13 mar 200

Re: indexing .doc

2007-03-12 Thread karl wettin
13 mar 2007 kl. 04.01 skrev ashwin kumar: can u please give any links to poi forums Sure, http://www.google.com/search?q=poi+forum -- karl On 3/13/07, karl wettin <[EMAIL PROTECTED]> wrote: 13 mar 2007 kl. 03.51 skrev ashwin kumar: > hi all i have successfully indexed .txt and .pdf

Re: indexing .doc

2007-03-12 Thread ashwin kumar
can u please give any links to poi forums On 3/13/07, karl wettin <[EMAIL PROTECTED]> wrote: 13 mar 2007 kl. 03.51 skrev ashwin kumar: > hi all i have successfully indexed .txt and .pdf files using > lucene . Now i > want to index word documents , Excel sheets and power point > slides .and f

Re: indexing .doc

2007-03-12 Thread karl wettin
13 mar 2007 kl. 03.51 skrev ashwin kumar: hi all i have successfully indexed .txt and .pdf files using lucene . Now i want to index word documents , Excel sheets and power point slides .and for this i have downloaded POI api from the following link http://jakarta.apache.org/poi/ can some

indexing .doc

2007-03-12 Thread ashwin kumar
hi all i have successfully indexed .txt and .pdf files using lucene . Now i want to index word documents , Excel sheets and power point slides .and for this i have downloaded POI api from the following link http://jakarta.apache.org/poi/ can some one help me with sample codes for indexing the a

RE: Words not found, large file indexing

2007-03-12 Thread Walker, Keith 1
That worked for me too. Thanks! -Original Message- From: Steffen Heinrich [mailto:[EMAIL PROTECTED] Sent: Friday, March 09, 2007 1:39 PM To: java-user@lucene.apache.org Subject: Re: Words not found, large file indexing Hello Chris, this is incredible! I'm new to Lucene and did just su

RE: Wildcard query with untokenized punctuation

2007-03-12 Thread Chris Hostetter
: You're entirely correct about the analyzer (I'm using one that breaks on : non-alphanumeric characters, so all punctuation is ignored). To be : honest, I hadn't thought about altering this, but I guess I could; just : reticent that there might be unforeseen consequences. this is where the PerF

Re: date range querys

2007-03-12 Thread Chris Hostetter
: I suspect that if you stored your dates (use DateTools) as strings : with a resolution of a day you'd get much faster queries, assuming : that this is fine enough for your app. ...especialy if you use ConstanScoreRangeQUery ... if you find that isn't fast enough, having the various granularitie

Re: Query String for a phrase?

2007-03-12 Thread Chris Hostetter
: ok, so does that mean i can use both q1 and q2 for phrase query ie; for : searching words adjacent to each other. Actually that was my only concern, : as i wanted to use q1 for phrase query, rather than q2. : Regards, Your example "q1" is not hte correct syntax for a phrase query .. the correct

Possible Memory leak in CSIndexInput

2007-03-12 Thread Supriya Kumar Shyamal
Hi All, I have a question regarding the close() CSIndexInput class, why there is no close operation defined... Like ... base.close() If I analyze the memory dump of our server I can see lot of object of type CSIndexInput. So I am not sure if I call close() on IndexeSearcher it closes all in

Re: search for phrase with specail chars?

2007-03-12 Thread Steven Rowe
Hi Ruchika, Are there are any quote characters in your index (may the Luke be with you[1])? If not, you could just remove all quotes from your query (except the surrounding ones indicating phrase matching, of course), and things will work, as you have indicated. Which version of Lucene are you u

Re: pdf box help

2007-03-12 Thread Steven Rowe
This may help: http://www.pdfbox.org/userguide/text_extraction.html#Lucene+Integration ashwin kumar wrote: > hi all i am able to convert a pdf in to a text file using pdfbox. and this > is the code that i used > > import org.pdfbox.pdfparser.PDFParser; > import org.pdfbox.pdmodel.PDDocument; > i

Re: Scalability Issues with Indexing

2007-03-12 Thread mark harwood
As of Lucene 2.1 you can make optimal use of RAM by monitoring IndexWriter.ramSizeInBytes() and calling IndexWriter.flush() when memory is tight. This avoids the issue of trying to estimate a value for maxBufferedDocs which you think can fit into RAM. Cheers Mark - Original Message F

Re: Scalability Issues with Indexing

2007-03-12 Thread Laxmilal Menaria
I think you can try MergeFactor =1000 MaxMergeDocs=2147483647 MaxBufferedDocs=1000 --LM On 3/12/07, Harini Raghavan <[EMAIL PROTECTED]> wrote: Hi Everyone, We have been using Lucene integrated with our application for over a year now. The indexing and searching has been pretty fast until re

Scalability Issues with Indexing

2007-03-12 Thread Harini Raghavan
Hi Everyone, We have been using Lucene integrated with our application for over a year now. The indexing and searching has been pretty fast until recently. But now we are having some scalability issues. We have a job that indexes around 2 documents in to index every day. There are 2 processes

Re: ensuring search String availability in the content returned by lucene

2007-03-12 Thread Lukas Vlcek
Hi, I am not sure if I can help you a lot but you can check how Nutch does this (although it does not do exactly what you want). See *org.apache.nutch.summary.basic.BasicSummarizer * or *org.apache.nutch.summary.lucene.LuceneSummarizer* You should also check Highliter API ( http://lucene.apac

Re: Lucene Indexing - Getting Hited words in a query

2007-03-12 Thread mark harwood
>> where can i find such examples In the source distribution here: http://apache.rmplc.co.uk/lucene/java/lucene-2.1.0-src.zip See the "HighlighterTest.java" file Alternatively, read the Javadocs example for the Highlighter package here http://lucene.apache.org/java/docs/api/ - Original

Re: Lucene Indexing - Getting Hited words in a query

2007-03-12 Thread Chaminda Amarasinghe
Thanks mark harwood , I want something like Highlighter thing where can i find such examples Regards Chaminda mark harwood <[EMAIL PROTECTED]> wrote: >> Why nobody is anwering me? Apologies for your 2 hour delay earlier this morning. The Lucene 24 hour helpdesk was tempora

Re: Lucene Indexing - Getting Hited words in a query

2007-03-12 Thread mark harwood
>> Why nobody is anwering me? Apologies for your 2 hour delay earlier this morning. The Lucene 24 hour helpdesk was temporarily closed while we had a weekend/life/sleep. If you file an official complaint you will be entitled to an immediate and full refund of your support fee. As for your prob

Re: Lucene Indexing - Getting Hited words in a query

2007-03-12 Thread Chaminda Amarasinghe
Many thaks Vipin, I'l check Vipin <[EMAIL PROTECTED]> wrote: Hi chaminda, you just go through this link http://today.java.net/pub/a/today/2005/08/09/didyoumean.html?page=1 in this articles last portion(page 3) the author has suggested a way to handle such kind of things(Composit

Re: Lucene Indexing - Getting Hited words in a query

2007-03-12 Thread Vipin
Hi chaminda, you just go through this link http://today.java.net/pub/a/today/2005/08/09/didyoumean.html?page=1 in this articles last portion(page 3) the author has suggested a way to handle such kind of things(Composite didyoumean parser).. i think it will open up a way... Regard

Re: search for phrase with specail chars?

2007-03-12 Thread ruchi thakur
i am using StopAnalyszer. Yes something like "innere Organe bezeichnet" works exactly fine „ character should not be a probelm, as if i remove this character, i still get the error I get this error tiil the point i have Organe\" in my query. Guess it is the double quote inside a phrase, which is

Re: search for phrase with specail chars?

2007-03-12 Thread karl wettin
12 mar 2007 kl. 08.24 skrev ruchi thakur: yes that is exactly what i am doing in java String i have something like String aSearchStr = "\"„innere Organe\\\" bezeichnet\""; Query query = parser.parse(aSearchStr); I'm not sure why you get this exception. Perhaps it has something to do with th