Re: Good way of Indexing TextFiles

2008-03-12 Thread Sebastin
Hi All, I tried one Indexing Stratergy: 1.I am having unique numbers as the search column for ex : my search query should be 9840836588 AND dateSc:[13/03/2008 TO 16/03/2008] while Indexing the numbers i divide the number by 3 9840836588%3 = 26588 creating a fo

Re: Good way of Indexing TextFiles

2008-03-12 Thread Sebastin
Hi All, I tried one Indexing Stratergy: 1.I am having unique numbers as the search column for ex : my search query should be 9840836588 AND dateSc:[13/03/2008 TO 16/03/2008] while Indexing the numbers i divide the number by 3 9840836588%3 = 26588 creating a fo

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-12 Thread Daniel Noll
On Thursday 13 March 2008 00:42:59 Erick Erickson wrote: > I certainly found that lazy loading changed my speed dramatically, but > that was on a particularly field-heavy index. > > I wonder if TermEnum/TermDocs would be fast enough on an indexed > (UN_TOKENIZED???) field for a unique id. > > Mostl

Re: indexing api wrt Analyzer

2008-03-12 Thread Daniel Noll
On Thursday 13 March 2008 15:21:19 Asgeir Frimannsson wrote: > >I was hoping to have IndexWriter take an AnalyzerFactory, where the > > AnalyzerFactory produces Analyzer depending on some criteria of the > > document, e.g. language. > With PerFieldAnalyzerWrapper, you can specify which analyze

Re: indexing api wrt Analyzer

2008-03-12 Thread Asgeir Frimannsson
On Thu, Mar 13, 2008 at 10:40 AM, John Wang <[EMAIL PROTECTED]> wrote: > Hi all: > >Maybe this has been asked before: > >I am building an index consists of multiple languages, (stored as a > field), and I have different analyzers depending on the language of the > language to be indexed. B

indexing api wrt Analyzer

2008-03-12 Thread John Wang
Hi all: Maybe this has been asked before: I am building an index consists of multiple languages, (stored as a field), and I have different analyzers depending on the language of the language to be indexed. But the IndexWriter takes only an Analyzer. I was hoping to have IndexWriter t

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-12 Thread Daniel Noll
On Wednesday 12 March 2008 19:36:57 Michael McCandless wrote: > OK, I think very likely this is the issue: when IndexWriter hits an > exception while processing a document, the portion of the document > already indexed is left in the index, and then its docID is marked > for deletion. You can see

Re: IndexReader deleteDocument

2008-03-12 Thread varun sood
No. I haven't but I will. even though I would like to make my own implementation. So any idea of how to get the "doc num"? Thanks for replying. Varun On Wed, Mar 12, 2008 at 5:15 PM, Mark Miller <[EMAIL PROTECTED]> wrote: > Have you seen the work that Mark Harwood has done making a GWT version >

Re: Indexing Yes and No

2008-03-12 Thread Mark Miller
Well, if your using a stopword list, "no" is likely to be on it and "yes" is not. Raq wrote: Querying Lucene with includeNews:Yes Works fine and brings back expected results.. includeNews:No Does not work and brings back nothing.. There are definitely documents in my index that has the wor

Re: IndexReader deleteDocument

2008-03-12 Thread Mark Miller
Have you seen the work that Mark Harwood has done making a GWT version of Luke? I think its in the latest release. varun sood wrote: Hi, I am trying to delete a document without using the hits object. What is the unique field in the index that I can use to delete the document? I am trying to

IndexReader deleteDocument

2008-03-12 Thread varun sood
Hi, I am trying to delete a document without using the hits object. What is the unique field in the index that I can use to delete the document? I am trying to make a web interface where index can be modified, smaller subset of what Luke does but using JSPs and Servlet. to use deleteDocument(int

Indexing Yes and No

2008-03-12 Thread Raq
Querying Lucene with includeNews:Yes Works fine and brings back expected results.. includeNews:No Does not work and brings back nothing.. There are definitely documents in my index that has the word "No" in the includeNews field. Tested in Luke with all the analyzers. Any ideas? Any thought

Re: Highlighter Hits

2008-03-12 Thread Matthew Hall
I suspect you are using a different analyzer to highlight than you are using to search. A couple of things you can check: Immediately after your query simply print out hits.length, this should conclusively tell you that you query is in fact working, after that ensure that you are using the sa

cannot delete cfs files on windows

2008-03-12 Thread Ioannis Cherouvim
Hello I can index many times and delete the index files (manually). But if I search once, then the cfs file is locked and cannot be deleted. Subsequent indexings create new cfs files. Even if I undeploy the tomcat web application which holds the search code, the cfs file cannot be deleted.

Re: Highlighter Hits

2008-03-12 Thread Erick Erickson
What does your stack trace look like? I've never seen Lucene "just quit" without throwing an exception, and printStackTrace() is your friend. Or are you catching exceptions without logging them? If so, shame on you . Best Erick P.S. I can't recommend strongly enough that you get a good I

Highlighter Hits

2008-03-12 Thread JensBurkhardt
Hello everybody, I have s slight problem using lucenes highlighter. If i have the highlighter enabled, a query creates 0 hits, if i disable the highlighter i get the hits. It seems like, when i call searcher.search() and pass my Hits hits to the highlighter function, the program quits. All prints

Re: Searching for null (empty) fields, how to use -field:[* TO *]

2008-03-12 Thread thogau
Thanks Erick, I ended up by following your second suggestion. It has been a bit tricky since I had to plug into a MapConverter but it works as expected. Thanks to all. --thogau You could also think about making a filter, probably when you open your searcher. You can use TermDocs/TermEnum to fi

Re: Unique Fields

2008-03-12 Thread Erick Erickson
So, you're tokenizing the title field? If so, I don't understand how you expect this to work. Would the title "this is one order" and "is one order this" be considered identical? Would capitalization matter? Punctuation? Throwing all the terms of a title into a tokenized field and expecting some ma

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-12 Thread Erick Erickson
I certainly found that lazy loading changed my speed dramatically, but that was on a particularly field-heavy index. I wonder if TermEnum/TermDocs would be fast enough on an indexed (UN_TOKENIZED???) field for a unique id. Mostly, I'm hoping you'll try this and tell me if it works so I don't have

Using Lucene from scripting language without any java coding

2008-03-12 Thread Mathieu Lecarme
Here is a POC about using Lucene, via Compass, from PHP or Python (other languages will come later), with only XML configuration, object notation, and native use of scripting language. http://blog.garambrogne.net/index.php?post/2008/03/11/Using-Compass-without-dirtying-its-hands-with-java It's

Re: Unique Fields

2008-03-12 Thread Ion Badita
The "problem" is that my unique field is a title, many terms per field. I want to make an index with titles and i don't want to have duplicates. John Erick Erickson wrote: You can easily find whether a term is in the index with TermEnum/TermDocs (I think TermEnum is all you really need). Exce

Re: Specialized XML handling in Lucene

2008-03-12 Thread Eran Sevi
Indeed it seems like a problematic way. I would also have a problem searching for documents with more then one value. if the query is something simple like : "value1 AND value2" I would expect to get all xml docs with both values, but if I use the doc=element method, I won't get any result because

Re: Searching for null (empty) fields, how to use -field:[* TO *]

2008-03-12 Thread thogau
Thanks for your suggestion markmiller. When I try this query, I get both documents as hits. The one with the field having a value and also the one with the field not set... Any idea why? markrmiller wrote: > > You cannot have a purely negative query like you can in Solr. > > Try: *:* -MY_FIELD

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-12 Thread Michael McCandless
Daniel Noll wrote: I have filtered out lines in the log which indicated an exception adding the document; these occur when our Reader throws an IOException and there were so many that it bloated the file. OK, I think very likely this is the issue: when IndexWriter hits an exception whil