RE: Design Consideration for lucene index

2006-10-06 Thread Silvy Mathews
Chris, I need to search for multiple tags that match the search phrase. These tags can have multiple images associated with it. Hence I am looking for the image Ids that is associated with the matching tags. Thanks for sending me the DBSIght link. I will look into it. Thanks Mathews -Original

Re: Design Consideration for lucene index

2006-10-06 Thread Chris Hostetter
The mantra I tell people when they are trying to decide how to index their "relational" data is to start by asking yourself what you want the results to be. Is the primary list of "things" you want to return to your clients a list of "tags" or a list of "images" ... It's not clear to me what the

Re: Design Consideration for lucene index

2006-10-06 Thread Chris Lu
Regarding Question #1: If there is only Keyword matching for tags, you can achieve the same by creating a table with two fields like this: (one tag, a list of images) in database to mimic Erick's answer. No lucene really needed for this case. Of course this would not help if you want to search sev

Re: wildcard and span queries

2006-10-06 Thread Paul Elschot
Mark, On Friday 06 October 2006 22:46, Mark Miller wrote: > Paul's parser is beyond my feeble comprehension...but I would start by > looking at SrndTruncQuery. It looks to me like this enumerates each > possible match just like a SpanRegexQuery does...I am too lazy to figure > out what the visi

Re: wildcard and span queries

2006-10-06 Thread Mark Miller
Paul's parser is beyond my feeble comprehension...but I would start by looking at SrndTruncQuery. It looks to me like this enumerates each possible match just like a SpanRegexQuery does...I am too lazy to figure out what the visitor pattern is doing so I don't know if they then get added to a b

Re: wildcard and span queries

2006-10-06 Thread Paul Elschot
Erick, On Friday 06 October 2006 22:01, Erick Erickson wrote: > Paul: > > Splendid! Now if I just understood a single thing about the SrndQuery family > . > > I followed your link, and took a look at the text file. That should give me > enough to get started. > > But if you wanted to e-mail me

RE: Design Consideration for lucene index

2006-10-06 Thread smathews
Thanks Erick for your suggestions. I am sure that I might be thinking with the DB cap. Let me look into your suggestions for the question #1. I will get back to you if I need more inputs from you. -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Friday, October 06,

Re: wildcard and span queries

2006-10-06 Thread Erick Erickson
Paul: Splendid! Now if I just understood a single thing about the SrndQuery family . I followed your link, and took a look at the text file. That should give me enough to get started. But if you wanted to e-mail me any sample code or long explanations of what this all does, I would forever be y

Re: Design Consideration for lucene index

2006-10-06 Thread Erick Erickson
If you're *sure* that your database solution isn't adequate see below. On 10/6/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: I am a newbie to the lucene search area. I would like to best way to do the following using lucene in terms of efficiency and the size of the index. Question : #

Design Consideration for lucene index

2006-10-06 Thread smathews
I am a newbie to the lucene search area. I would like to best way to do the following using lucene in terms of efficiency and the size of the index. Question : #1 I have a table that contains some tags. These tags are tagged against multiple images that are in a different table (potentially 20 to

Re: wildcard and span queries

2006-10-06 Thread Paul Elschot
On Friday 06 October 2006 14:37, Erick Erickson wrote: ... > Fortunately, the PM agrees that it's silly to think about span queries > involving OR or NOT for this app. So I'm left with something like Jo*n AND > sm*th AND jon?es WITHIN 6. OR works much the same as term expansion for wildcards. > T

Re: Force a query to match at least two clauses

2006-10-06 Thread Erik Hatcher
On Oct 6, 2006, at 1:50 PM, Ryan Heinen wrote: Yonik Seeley wrote: See BooleanQuery.setMinimumNumberShouldMatch() There isn't currently any QueryParser support, so you have to create the query pragmatically. Thanks Yonik for your quick response; that is exactly what I was looking for. Next

Re: [BULK] Re: NPE thrown in invertDocument [RESOLVED]

2006-10-06 Thread Ryan Heinen
Daniel Naber wrote: On Thursday 28 September 2006 23:55, Ryan Heinen wrote: I am creating an index using a RAMDirectory, and am running across a situation where when I call IndexSearcher.addDocument it throws a NullPointerException. Could you create a small test case that reporduces this? Thi

Re: Force a query to match at least two clauses

2006-10-06 Thread Ryan Heinen
Yonik Seeley wrote: See BooleanQuery.setMinimumNumberShouldMatch() There isn't currently any QueryParser support, so you have to create the query pragmatically. Thanks Yonik for your quick response; that is exactly what I was looking for. Next time I'll check the docs a little more closely. I

Re: Force a query to match at least two clauses

2006-10-06 Thread Yonik Seeley
See BooleanQuery.setMinimumNumberShouldMatch() There isn't currently any QueryParser support, so you have to create the query pragmatically. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server On 10/6/06, Ryan Heinen <[EMAIL PROTECTED]> wrote: Hello, If I want m

Re: ask for a question about Lucene

2006-10-06 Thread Chris Lu
It's java based J2EE application. It can search any database that has a jdbc driver. You just need to put a DB2 jdbc driver in the directory and change the configuration file. You can talk to me directly if you have more specific questions. Chris Lu - Instant Lucen

Force a query to match at least two clauses

2006-10-06 Thread Ryan Heinen
Hello, If I want make sure that only documents that contain at least two of the N TermQueries A, B, C, and D (N=4) are considered matches, what is the best way to approach this? I know I can expand it out into several boolean clauses like so: (+A +B) (+A +C) (+A +D) (+B +C) (+B +D) (+C +D)

Re: Performing a like query

2006-10-06 Thread Steven Rowe
Steven Rowe wrote: >\s*(?:\b|(?<=\S)(?=\s)|(?<=\s)(?=\S))\s* Oops, here's an improved version to cover the beginning- and end-of-string non-alphanumeric cases (E.g. "=some text-"): \s*(?:\b|(?<=\S)(?=\s)|(?<=\s)(?=\S)|\A|\z)\s*

Re: Performing a like query

2006-10-06 Thread Steven Rowe
Hi Rahil, Rahil wrote: > I couldnt figure out a valid regular expression to write a valid > Pattern.compile(String regex) which can tokenise a string into "O/E - > visual acuity R-eye=6/24" into "O","/","E", "-", "visual", "acuity", > "R", "-", "eye", "=", "6", "/", "24". The following regular e

Re: Performing a like query

2006-10-06 Thread Erick Erickson
My intuition is that you'll have a real problem using regular expressions. It'll either be incredibly ugly (and unmaintainable) or just won't work since the regular expression tools tend to throw out the delimiters. I think you'll be much better off writing your own analyzer (see LIA, the synonym

Re: Performing a like query

2006-10-06 Thread Rahil
Hi Erick Im having trouble with writing a good regular expression for the PatternAnalyzer to deal with word and non-word characters.I couldnt figure out a valid regular expression to write a valid Pattern.compile(String regex) which can tokenise a string into "O/E - visual acuity R-eye=6/24"

Re: Case sensitive / insensitive

2006-10-06 Thread Steven Rowe
Marcus Falck wrote: > Any good approaches for allowing case sensitive and case insensitive > searches? > > Except adding an additional field and skipping the LowerCaseFilter. > Since this severely increases the index size (and the index already > is around 1 TB). Hi Marcus, How about a filter tha

Re: ask for a question about Lucene

2006-10-06 Thread lily yan
Hi Erick, Thanks for your advice. I guess your suggestion is right. Maybe it's more proper that I only use SQL query for now. Regards, lilyyan From: "Erick Erickson" <[EMAIL PROTECTED]> Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: ask for a question abou

Re: Sorting on dates using long

2006-10-06 Thread Yonik Seeley
On 10/6/06, Björn Ekengren <[EMAIL PROTECTED]> wrote: I changed to RangeFilter and now everything works fine. I havn't noticed any change in performance so I'm happy. Strange with the constantrangequery though... It is strange. If you can reproduce an error, please file a bug with the test c

Re: ask for a question about Lucene

2006-10-06 Thread lily yan
Hello Chris Lu, Thanks very much for your reply. The DBSight is a very cool search tool. Just got two question: is this a java-based application? it seems that DBSight doesn't support DB2 database? Regards, lilyyan From: "Chris Lu" <[EMAIL PROTECTED]> Reply-To: java-user@lucene.apache.org

wildcard and span queries

2006-10-06 Thread Erick Erickson
Well, we defined this problem away for one of our products, but it's back for a different product. Si.. I'm valiantly trying to get our product manager (hereinafter PM) to define this problem away, perhaps allowing me to deal with this by clever indexing and/or some variant on pre

Re: Advantage of putting lucene index in RDBMS

2006-10-06 Thread Karel Tejnora
One think, generally use RDBM for the STORED fields is good idea because every segment merging / optimize copies those data once or twice (cfs). I'm thinking about to put STORED fields in extra file and put pointers in cfs. Delete will just mark document as delete. And new operation omptimize_

Re: Case sensitive / insensitive

2006-10-06 Thread karl wettin
On 10/6/06, Marcus Falck <[EMAIL PROTECTED]> wrote: Any good approaches for allowing case sensitive and case insensitive searches? Except adding an additional field and skipping the LowerCaseFilter. Since this severely increases the index size (and the index already is around 1 TB). I would co

Re: Spam filter for lucene project

2006-10-06 Thread John Haxby
Rajiv Roopan wrote: Hello, I'm currently running a site which allows users to post. Lately posts have been getting out of hand. I was wondering if anyone knows of an open source spam filter that I can add to my project to scan the posts (which are just plain text) for spam? spamassassin shoul

Re: Case sensitive / insensitive

2006-10-06 Thread Erik Hatcher
On Oct 6, 2006, at 5:09 AM, Marcus Falck wrote: Any good approaches for allowing case sensitive and case insensitive searches? I had this requirement for one application, and implemented it with two different indexes. It could also be accomplished with different fields, but that would hav

SV: Case sensitive / insensitive

2006-10-06 Thread Marcus Falck
Except adding an additional field and skipping the LowerCaseFilter. Since this severely increases the index size (and the index already is around 1 TB). -Ursprungligt meddelande- Från: Marcus Falck [mailto:[EMAIL PROTECTED] Skickat: den 6 oktober 2006 11:09 Till: java-user@lucene.apach

Case sensitive / insensitive

2006-10-06 Thread Marcus Falck
Hi, Any good approaches for allowing case sensitive and case insensitive searches? / Regards Marcus

RE: Sorting on dates using long

2006-10-06 Thread Björn Ekengren
I changed to RangeFilter and now everything works fine. I havn't noticed any change in performance so I'm happy. Strange with the constantrangequery though... -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Yonik Seeley Sent: den 5 oktober 2006 17:06 To: ja