MaxFieldLength or MaxFields?

2005-10-25 Thread Jeff Rodenburg
I'm considering building out an index that will flatten a data structure, such that some Document "A" will have Fields 1,2 and 3. Fields 1 and 2 are indexed/tokenized field. Field 3 is indexed, and will contain many discrete values (up to possibly 5000). Couple of questions: 1. Does the DEFAULT_MA

RE: score formula in Similarity javadoc

2005-10-25 Thread Koji Sekiguchi
Attached file was deleted by mailing list server. The patch was: Index: src/java/org/apache/lucene/search/Similarity.java === --- src/java/org/apache/lucene/search/Similarity.java (繝ェ繝薙ず繝ァ繝ウ 328522) +++ src/java/org/apache/lucene/se

score formula in Similarity javadoc

2005-10-25 Thread Koji Sekiguchi
Hello, I apologize if this list is not appropriate for sending a patch. It seems there is an error on score formula in Similarity javadoc: score(q,d) = sigma( tf * idf^2 * ... ) should be score(q,d) = sigma( tf * idf * ... ) if my understanding is correct, I would appreciate it if someone cou

Re: Another index corruption problem

2005-10-25 Thread Bill Tschumy
I hate to plead, but I really need to do my best to recover my customer's data. Does anyone have any pointers for how to manually (or programmatically) repair this corrupted index? On Oct 24, 2005, at 11:23 PM, Bill Tschumy wrote: Many months ago I wrote this list about a corrupted index t

integrating lucene with derby

2005-10-25 Thread Rick Hillegas
If you would like to participate in the discussion about integrating Lucene with the Derby database, you're welcome to add your comments to the following wiki page: http://wiki.apache.org/db-derby/LuceneIntegration. Right now, we're gathering features and use cases which people would like to se

Re: Can Lucene be Used To Substitute Real Database?

2005-10-25 Thread Chris Lu
Someone donated his code to Sourceforge. But it's pretty rudimentary. You may check it out. Chris On 10/25/05, Sam Lee <[EMAIL PROTECTED]> wrote: > ok, I will keep mysql. Would someone suggest how do I > integrate mysql with lucene so that I can use lucene > to index mysql db using free or open

Re: Can Lucene be Used To Substitute Real Database?

2005-10-25 Thread Sam Lee
ok, I will keep mysql. Would someone suggest how do I integrate mysql with lucene so that I can use lucene to index mysql db using free or open source solution? Someone suggested DBsight, but it's not free when you index beyond 30MB. --- Daniel Naber <[EMAIL PROTECTED]> wrote: > On Dienstag 25

Re[2]: Cross-field multi-word and query

2005-10-25 Thread Chris Hostetter
: I have n fields, for simplicity let's say 3: f1, f2, f3. : I have an AND query with m words in it, lets' also simplify: w1, w2, w3. : : To cover all possible cases I should finally have the following : BooleanQuery: it really depends on what you want. if I understand what you mean in the below

Re: Non-scoring fields

2005-10-25 Thread Chris Hostetter
: > You can also use a filter to filter your results. As far as I know : > Filter does not effect the score : : Yes, but wouldn't a filter be at least a little slower in this simple case? : Perhaps I should just do a few timing tests... There is nothing intrinsic in the way Filters work that make

RE: Can Lucene be Used To Substitute Real Database?

2005-10-25 Thread Dalton, Jeffery
It depends on the application. Depending on the access pattern of you system you might be able to use Lucene. It's been done ;-). If you have a very few tables with very simple relationships, it might be an answer -- perhaps not the best one though. If you want to use advanced RDBMS feature

Re: Can Lucene be Used To Substitute Real Database?

2005-10-25 Thread Chris Lu
First of all, just using Lucene to replace rdms is quite possible in some specific cases. In addition to updating and string/number issues, Lucene also lacks many rdms functionalities. One of them is aggregation functions like SUM(), or "group by". Of course, in some case, you may be able to get

RE: Delete doesn't delete?

2005-10-25 Thread Peter Kim
Are you using a hit collector? I think if you use a hit collector rather than the Hits object for getting query results, deleted items will still be returned as results. My workaround for this was to run optimize after I finish a batch of deletes, which works fine for my system because I only run a

Re: Can Lucene be Used To Substitute Real Database?

2005-10-25 Thread Daniel Naber
On Dienstag 25 Oktober 2005 22:37, Sam Lee wrote: > Can Lucene to be used in place of mysql so that > website visitors can input data that will in turn > inserting row into Lucene just like mysql db? That's a bad idea. Lucene lacks a real update (you need to delete and re-add) and also sees ever

Can Lucene be Used To Substitute Real Database?

2005-10-25 Thread Sam Lee
Hi, I am wondering if I can use Lucene to substitute real database like mysql db? I know that many people use lucene only to index mysql db because of inferior full-text index of mysql. Can Lucene to be used in place of mysql so that website visitors can input data that will in turn inserting

Re: Lucene and SAX

2005-10-25 Thread Grant Ingersoll
Sounds like you need to make your articles XML or stop trying to use an XML parser to process the file, whichever is easier for you. I don't think your issues are Lucene related. I think you need to get a better handle on the XML processing. As I suggested on your Digester thread before, I w

Re: Thread safety question

2005-10-25 Thread Erik Hatcher
Right when you're searching or reading an index, there is no need for the client to be concerned with synchronization at all. Erik On 25 Oct 2005, at 13:19, Sharma, Siddharth wrote: Hi I have an instance (each) of IndexSearcher and StandardAnalyzer housed in a Singleton and I inte

Re: Using analyzers with term queries

2005-10-25 Thread Erik Hatcher
On 25 Oct 2005, at 09:46, Jeff Rodenburg wrote: I don't mean to take the thread off-topic, but is this the recommended approach for any of the Query objects, i.e. SpanQuery or PhraseQuery? In several applications I've built, a Query object is built via the API using an Analyzer directly and

RE: Extending 'sealed' classes & usage in comercial solution.

2005-10-25 Thread Pasha Bizhan
Hi, > From: Ivan Guzvinec [mailto:[EMAIL PROTECTED] > > Would it be ok, if we just published our modified library, or > are there any other requirements/restrictions regarding the > usage in our commercial solutions? > > Can anyone please elaborate on this issue. See http://apache.org/foun

Re: Lucene and SAX

2005-10-25 Thread Malcolm
I'm not in anyway an expert, in fact far from, but when I try to reference each article seperately it complains of entitites as the XML articles are not well-formed. Thanks, MC - To unsubscribe, e-mail: [EMAIL PROTECTED] For

Re: Lucene and SAX

2005-10-25 Thread Malcolm
Hi Grant, A highly shortened version of the volume is like below. ]> IEEE Annals of the History of Computing Spring 1995 (Vol. 17, No. 1) Published by the IEEE Computer Society About this Issue &A1003; Comments, Queries, and Debate &A1004; Articles &A1006;

Re: Lucene and SAX

2005-10-25 Thread Grant Ingersoll
From what I can see, you are only passing volume.xml to your parser. If I understand your code and questions correctly, the Volume file simply points to the actual articles that you want to parse. Seems like you need to parse the Volume file, get the name/location of the article file and then

Re: Lucene and SAX

2005-10-25 Thread Malcolm
It's XML like this. It has 120-ish volumes with references to 12,107 articles which are like this below: A1003 10.1041/A1003s-1995 IEEE Annals of the History of Computing 1058-6180/95/$4.00 © 1995 IEEE Vol. 17, No. 1 Spring1995 pp. 3-3 About this Issuepp. 3-3 J.A.N.LeeEditor‐in‐Chief The firs

Re: Lucene and SAX

2005-10-25 Thread Grant Ingersoll
I am not familiar with the INEX collection, could you post a sample? Malcolm Clark wrote: Hi again, I am desperately asking for aid!! I have used the sandbox demo to parse the INEX collection.The problem being it points to a volume file which references 50 other xml articles.Lucene only tre

Re: Funny results with Fuzzy

2005-10-25 Thread Marc Hadfield
hello - a fuzzy query related question: has there been any other implementations of "fuzzy" queries other than edit-distance? and/or modifications of edit-distance to less penalize common alternate spellings? - i.e. "couldn't" vs. "couldnt" -- here the apostrophe would get a smaller penalt

Lucene and SAX

2005-10-25 Thread Malcolm Clark
Hi again, I am desperately asking for aid!! I have used the sandbox demo to parse the INEX collection.The problem being it points to a volume file which references 50 other xml articles.Lucene only treats this as one document.Is there any method of which I'm overlooking that halts after each r

Thread safety question

2005-10-25 Thread Sharma, Siddharth
Hi I have an instance (each) of IndexSearcher and StandardAnalyzer housed in a Singleton and I intend to use this one single instance (of Searcher and Analyzer) for multiple concurrent search requests. I vaguely remember reading that I (as a client) do not have to synchronize. Lucene internals take

Re: Funny results with Fuzzy

2005-10-25 Thread mark harwood
> One thing I was thinking of doing was checking the > character frequency An alternative idea is index-time fuzzification rather than query-time. This is documented in one of the case studies in LIA - the principle is you don't index/search for whole words but use an NGram Analyzer to break them

Re: Funny results with Fuzzy

2005-10-25 Thread Rob Young
Rob Young wrote: mark harwood wrote: I'd be more inclined to guess that kylie->klyie falls below the 0.5f similarity threshold you pass. Try print out the results of fuzzyQuery.rewrite(indexReader).toString(); This will rewrite the fuzzyQuery to a BooleanQuery which explicitly lists the Term

Re: How to Integrate Lucene/Nutch with Mysql?

2005-10-25 Thread Sam Lee
I know of phpadnew, what else? But it doesn't suit my need, phpadnew is designed more for website publishers. I need something that is designed for both advertisers and website publishers. --- Stefan Groschupf <[EMAIL PROTECTED]> wrote: > BTW, there are some cool free ad servers available > as

Re: How to Integrate Lucene/Nutch with Mysql?

2005-10-25 Thread Sam Lee
No, I need to use mysql. thanks. --- Rick Hillegas <[EMAIL PROTECTED]> wrote: > Hi Sam, > > I asked a similar question yesterday, and Steven > Rowe kindly pointed me > at the following code and examples, which you can > use to integrate > Lucene with Derby: > >

Re: Funny results with Fuzzy

2005-10-25 Thread Rob Young
mark harwood wrote: I'd be more inclined to guess that kylie->klyie falls below the 0.5f similarity threshold you pass. Try print out the results of fuzzyQuery.rewrite(indexReader).toString(); This will rewrite the fuzzyQuery to a BooleanQuery which explicitly lists the TermQuery objects that

Re: Funny results with Fuzzy

2005-10-25 Thread mark harwood
I'd be more inclined to guess that kylie->klyie falls below the 0.5f similarity threshold you pass. Try print out the results of fuzzyQuery.rewrite(indexReader).toString(); This will rewrite the fuzzyQuery to a BooleanQuery which explicitly lists the TermQuery objects that the fuzzyQuery has foun

Re: Funny results with Fuzzy

2005-10-25 Thread Rob Young
Rob Young wrote: mark harwood wrote: It comes down to your choice of analyzer. Don't forget your "all" field is broken down into discreet terms by your choice of analyzer. Most often, you will want to use the same analyzer at query-time with the query parser to make sure the user's input mat

Re: Optimize/Indexing progress state - time remaining

2005-10-25 Thread Olivier Jaquemet
Thanks for the information Koji. Koji Sekiguchi wrote: Hi Olivier, This information may solve your problem, but it's a plan on Lucene 2.0: Lucene 2 Whiteboard http://wiki.apache.org/jakarta-lucene/Lucene2Whiteboard Other Changes 2. Implement a callback interface for processes which can run f

Extending 'sealed' classes & usage in comercial solution.

2005-10-25 Thread Ivan Gužvinec
Hi all, I'm under the impression that this list might also be used for questions regarding the .NET port of Lucene (Lucene.NET). I would have to extend some of the 'sealed' classes in Lucene.NET that we use extensively in our products. That would require of me to edit/change the library which is

Re: How to Integrate Lucene/Nutch with Mysql?

2005-10-25 Thread Rick Hillegas
Hi Sam, I asked a similar question yesterday, and Steven Rowe kindly pointed me at the following code and examples, which you can use to integrate Lucene with Derby: For the record, Derby is the Apache open source database. It's a full-featu

Re: Funny results with Fuzzy

2005-10-25 Thread Rob Young
mark harwood wrote: It comes down to your choice of analyzer. Don't forget your "all" field is broken down into discreet terms by your choice of analyzer. Most often, you will want to use the same analyzer at query-time with the query parser to make sure the user's input matches the stored doc

Re: Using analyzers with term queries

2005-10-25 Thread Jeff Rodenburg
I don't mean to take the thread off-topic, but is this the recommended approach for any of the Query objects, i.e. SpanQuery or PhraseQuery? On 10/25/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > > On 25 Oct 2005, at 07:00, Rob Young wrote: > > I am using TermQuery s (and FuzzyQuery s) on the s

Re: Funny results with Fuzzy

2005-10-25 Thread mark harwood
It comes down to your choice of analyzer. Don't forget your "all" field is broken down into discreet terms by your choice of analyzer. Most often, you will want to use the same analyzer at query-time with the query parser to make sure the user's input matches the stored document terms. If you get

Re: Funny results with Fuzzy

2005-10-25 Thread Rob Young
Rob Young wrote: Erik Hatcher wrote: On 25 Oct 2005, at 07:35, Rob Young wrote: Try setting the QueryParser.setFuzzyPrefixLength to 1. That would be a great start. How would I implement that if I'm using FuzzyQuery rather than QueryParser? Use the FuzzyQuery constructor that sets this

Re: Funny results with Fuzzy

2005-10-25 Thread Rob Young
Erik Hatcher wrote: On 25 Oct 2005, at 07:35, Rob Young wrote: Try setting the QueryParser.setFuzzyPrefixLength to 1. That would be a great start. How would I implement that if I'm using FuzzyQuery rather than QueryParser? Use the FuzzyQuery constructor that sets this value:

Re: Funny results with Fuzzy

2005-10-25 Thread Erik Hatcher
On 25 Oct 2005, at 07:35, Rob Young wrote: Try setting the QueryParser.setFuzzyPrefixLength to 1. That would be a great start. How would I implement that if I'm using FuzzyQuery rather than QueryParser? Use the FuzzyQuery constructor that sets this value:

Re: Using analyzers with term queries

2005-10-25 Thread Erik Hatcher
On 25 Oct 2005, at 07:00, Rob Young wrote: I am using TermQuery s (and FuzzyQuery s) on the searching side and I would like to keep doing so. However, I would like to use the MetaphoneReplacementAnalyzer (from Lucene in Action) when indexing. How can I allow for this in searching if I'm usi

Re: How to Integrate Lucene/Nutch with Mysql?

2005-10-25 Thread Stefan Groschupf
BTW, there are some cool free ad servers available as open source... Am 25.10.2005 um 09:14 schrieb Sam Lee: Hi, My network is designed to have a bunch of advertisers to enter their ads with keywords. I think of using mysql to store those, and then use lucene and part of nutch to index them fr

Re: Funny results with Fuzzy

2005-10-25 Thread Rob Young
Try setting the QueryParser.setFuzzyPrefixLength to 1. That would be a great start. How would I implement that if I'm using FuzzyQuery rather than QueryParser? Cheers Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For ad

Re: Funny results with Fuzzy

2005-10-25 Thread mark harwood
Try setting the QueryParser.setFuzzyPrefixLength to 1. That would at least insist on the first character being correct. Cheers Mark ___ Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide wit

Using analyzers with term queries

2005-10-25 Thread Rob Young
Hi, I am using TermQuery s (and FuzzyQuery s) on the searching side and I would like to keep doing so. However, I would like to use the MetaphoneReplacementAnalyzer (from Lucene in Action) when indexing. How can I allow for this in searching if I'm using TermQuery? Thanks Rob --

RE: Optimize/Indexing progress state - time remaining

2005-10-25 Thread Koji Sekiguchi
Hi Olivier, This information may solve your problem, but it's a plan on Lucene 2.0: Lucene 2 Whiteboard http://wiki.apache.org/jakarta-lucene/Lucene2Whiteboard Other Changes 2. Implement a callback interface for processes which can run for several minutes like IndexWriter.optimize(). The idea is

Funny results with Fuzzy

2005-10-25 Thread Rob Young
Hi, I've just set up a system with lucene to search our product database. I want to have fuzzy searching to help the many seemingly illiterate users I have. Just testing this out and the results are proving a little funny. If I search for the term klyie (hoping for kylie to be almost exclusi

Re: Non-scoring fields

2005-10-25 Thread Maik Schreiber
> You can also use a filter to filter your results. As far as I know > Filter does not effect the score Yes, but wouldn't a filter be at least a little slower in this simple case? Perhaps I should just do a few timing tests... -- Maik Schreiber * http://www.blizzy.de GPG public key: http://

Optimize/Indexing progress state - time remaining

2005-10-25 Thread Olivier Jaquemet
Hi all, Is there a nice way to get information regarding the current progress state of any/all those operations: - IndexWriter optimize - IndexWriter index - IndexReader delete For example, having a synchronized method to retrieve a percentage of completion from writer/reader in another thread

Re[2]: Cross-field multi-word and query

2005-10-25 Thread Maxim Patramanskij
Hello Chris, thanks for the tip. However, I'm not sure, how can I implement with MaxDisjunctionQuery the following: I have n fields, for simplicity let's say 3: f1, f2, f3. I have an AND query with m words in it, lets' also simplify: w1, w2, w3. To cover all possible cases I should finally have

How to Integrate Lucene/Nutch with Mysql?

2005-10-25 Thread Sam Lee
Hi, My network is designed to have a bunch of advertisers to enter their ads with keywords. I think of using mysql to store those, and then use lucene and part of nutch to index them from mysql db, so that the websites can find and show the ads. But how do I integrate lucene/nutch with mysql?