Re: Periodic Indexing DESIGN QUESTION

2007-05-08 Thread Chris Hostetter
: For example, when you are indexing every hour and large document set : is present, it takes >1 hr to index the documents. Now you are : already behind indexing for the next hour. How do you design : something that is robust? fundementally, this question is really about issues in a producer/con

Re: Periodic Indexing DESIGN QUESTION

2007-05-08 Thread Erick Erickson
Don't do it that way ? Is this an actual or theoretical scenario? And do you reasonably expect it to become actual? Otherwise, why bother? And you've got other problems here. If you're indexing that much data, you'll soon outgrow your disk. Unless you're replacing most of the documents. But assu

Periodic Indexing DESIGN QUESTION

2007-05-08 Thread Ram Peters
I am indexing documents periodically every hour. I have a scenario. For example, when you are indexing every hour and large document set is present, it takes >1 hr to index the documents. Now you are already behind indexing for the next hour. How do you design something that is robust? thanks.

Re: Design question

2007-04-13 Thread Peter W .
Dan, I agree Lucene users are repeatedly solving the same problems of reading, writing and creating indexes, building queries, scaling, parsing docs etc. There's a 'HowTo' section on the Wiki made for sharing tips and best practices: http://wiki.apache.org/lucene-java/HowTo but few new addition

Design question

2007-04-13 Thread Dan Wiggin
milar. My question for example is a design question. Every time, you talk about a single index to use, in my case (a platform with user groups no related among themselves), i thinked the best option is use an index for every group, and when I need to search something over all i will use a Multisea

Re: Search Design Question

2007-03-24 Thread Xiaocheng Luan
Hi Michael, if I understand your questions correctly - feels like I must have missed something - here is what can do to achieve what you want: index these fields: to from content subject all (includes text from all the above 4 fields) and use "all" as your default search field. Then when you

Re: Search Design Question

2007-03-23 Thread Chris Hostetter
: One final note, it may be much easier for you to throw all the : fields into a single uber-field and search that rather than implement : all four separate clauses, but it's a trade off between simplicity and : size. this would be a very simple way to get the behavior you describe straight f

Re: Search Design Question

2007-03-23 Thread Erick Erickson
I don't believe there's anything built into Lucene that helps you out here because you're really saying "do special things for my problem space in these situations". So about the only thing you can do that I know of is to construct the query yourself by making a series of additions to BooleanQuer

Search Design Question

2007-03-23 Thread Michael J. Prichard
Hello All, We allow our users to search through our index with a simple textfield. The search phrase has "content" as its default value. This allows them to search quickly through content but then when they type "to:blah AND from:foo AND content:boogie" it will know to parse,etc. What I wa

Re: Using Lucene - Design Question

2007-02-22 Thread Peter W.
rs throat you could get a book ;) Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: shai deljo <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, February 20, 2007 2:05:2

Re: Using Lucene - Design Question

2007-02-20 Thread Venkat Seeth
t; . . . . . > > Simpy -- http://www.simpy.com/ - Tag - Search > - Share > > > > - Original Message > > From: shai deljo <[EMAIL PROTECTED]> > > To: java-user@lucene.apache.org > > Sent: Tuesday, February 20, 2007 2:05:25 PM > > Subj

Re: Using Lucene - Design Question

2007-02-20 Thread orion
-- >> From: shai deljo <[EMAIL PROTECTED]> >> To: java-user@lucene.apache.org >> Sent: Tuesday, February 20, 2007 2:05:25 PM >> Subject: Re: Using Lucene - Design Question >> >> Hi, >> Thanks for the reply. >> * Regarding hardware I'll

Re: Using Lucene - Design Question

2007-02-20 Thread shai deljo
could get a book ;) Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: shai deljo <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, February 20, 2007 2:05:25 PM Subject: Re: Using Lucene - Des

Re: Using Lucene - Design Question

2007-02-20 Thread Otis Gospodnetic
work, but you'll have to make sure that the > segments file gets rsynced last. > > Otis > > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . > Simpy -- http://www.simpy.com/ - Tag - Search - Share > > - Original Message > From: shai deljo <[EMAIL PROTECTED]> > To: java-user@lucene.

Re: Using Lucene - Design Question

2007-02-20 Thread shai deljo
. . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: shai deljo <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, February 20, 2007 5:51:13 AM Subject: Using Lucene - Design Question Hi, I have no experience

Re: Using Lucene - Design Question

2007-02-20 Thread Otis Gospodnetic
ced last. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: shai deljo <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, February 20, 2007 5:51:13 AM Subject: Using Lucene - D

Using Lucene - Design Question

2007-02-20 Thread shai deljo
Hi, I have no experience with Lucene and I'm trying to collect some information in order to determine what solution is best for me. I need to index ~50M documents (starting with 10M), the size of each document is ~2k-~5k and I'll index a couple of fields per document. I expect ~20 queries per seco

Re: a design question

2006-10-13 Thread Mark Miller
erfectly suitable for the job and a lot simpler. Otis - Original Message From: "Chenini, Mohamed " <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, October 12, 2006 10:25:44 AM Subject: a design question Hello, This is a design question: For Lucene to

Re: a design question

2006-10-12 Thread Otis Gospodnetic
t; <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, October 12, 2006 10:25:44 AM Subject: a design question Hello, This is a design question: For Lucene to be able to process a million documents and in the purpose for the search application to be scalable and still have

Re: a design question

2006-10-12 Thread Chris Lu
way with it but I see no > particular reasons why using an EJB server should offer any benefits > over a Servlet container. > > Cheers > Mark > > - Original Message > From: "Chenini, Mohamed " <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Se

Re: a design question

2006-10-12 Thread Bill Taylor
From: "Chenini, Mohamed " <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, 12 October, 2006 3:25:44 PM Subject: a design question Hello, This is a design question: For Lucene to be able to process a million documents and in the purpose for the search a

Re: a design question

2006-10-12 Thread mark harwood
benefits over a Servlet container. Cheers Mark - Original Message From: "Chenini, Mohamed " <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, 12 October, 2006 3:25:44 PM Subject: a design question Hello, This is a design question: For Lucene to be able to p

a design question

2006-10-12 Thread Chenini, Mohamed
Hello, This is a design question: For Lucene to be able to process a million documents and in the purpose for the search application to be scalable and still have a good response time do we need to use an EJB container such as Weblogic or is a Servlet container such as Tomcat sufficient to do the

Re: Index design question

2005-08-06 Thread Otis Gospodnetic
Hi, > If you store only IDs in Lucene, you won't be able to search using > > keywords (text). > > > Let me explain better. Suppose I have an index field called > categories which > contains a list of ids for each post. For example - 1 2 45 198. Now I > can > search on the contents field but re

Re: Index design question

2005-08-06 Thread N. C. Deepak Ramesh
Hi, If you store only IDs in Lucene, you won't be able to search using > keywords (text). Let me explain better. Suppose I have an index field called categories which contains a list of ids for each post. For example - 1 2 45 198. Now I can search on the contents field but restricted to all po

Re: Index design question

2005-08-06 Thread Otis Gospodnetic
Hi, > Let me describe my issue taking a simpler model. Lets say I were to > build a > blog which allows each post to have multiple keywords. I want to > provide a > search over the posts but restricted to a subset of the keywords (say > - > python, windows, etc.). How can I structure the index

Index design question

2005-08-05 Thread N. C. Deepak Ramesh
Hi list, Let me describe my issue taking a simpler model. Lets say I were to build a blog which allows each post to have multiple keywords. I want to provide a search over the posts but restricted to a subset of the keywords (say - python, windows, etc.). How can I structure the index in this c

Re: Design question [too many fields?]

2005-07-02 Thread Chris Hostetter
: My head was thinking to find a generic solution to Lucene's : limitation: The TooManyClauses problem when using RangeQuery and there : are more than 1024 values. It should be another thread. It's been discussed in several threads, and i can think of 2 good solutions at this point... Using a Ra

Re: Vedr. Re: Design question [too many fields?]

2005-07-01 Thread markharw00d
about 4900 room units which I think is OK as far as Still we have optimization work to do. Assuming your availability is a year in advance and yours is a reputable chain of hotels that books rooms by the day, (not the hour!) You only need: 4900 * 365 bits of true/false info to cache all the ava

Vedr. Re: Design question [too many fields?]

2005-07-01 Thread Naimdjon Takhirov
Guys, thanks for your inputs. I think the solution Mark has suggested does solves the problem in an acceptable way. Its actually gonna be a little better than the solution the customer is has right now. Apart from the availability we have to also check if there is any price for room units saved in

Re: Design question [too many fields?]

2005-07-01 Thread Chris Lu
Erik, Mark and Naimdjon, Sorry I totally misunderstood the question, of multiple dates for a Document. I came to agree with Erik and Mark on this problem. My head was thinking to find a generic solution to Lucene's limitation: The TooManyClauses problem when using RangeQuery and there are more th

Re: Design question [too many fields?]

2005-07-01 Thread Erik Hatcher
On Jun 30, 2005, at 11:27 PM, Chris Lu wrote: Mark, your suggestion will incur another trip to the database. And if the search results is large, filtering in DB by pk is not really good. Chris - I disagree with that last comment. It can be a great solution when the filter is cached. Cer

Re: Vedr. Re: Design question [too many fields?]

2005-07-01 Thread Chris Lu
> It is anyway going to be too many fields then? Days of > year for the whole year ahead? Since the fromDate and > toDate can be across two months and the customer wants > the data be available for one year. It won't have too many fields. > > My suggestion is, use "year" + "month" + "day" three >

Vedr. Re: Design question [too many fields?]

2005-06-30 Thread Naimdjon Takhirov
Hi Chris, It is anyway going to be too many fields then? Days of year for the whole year ahead? Since the fromDate and toDate can be across two months and the customer wants the data be available for one year. Naimdjon --- Chris Lu <[EMAIL PROTECTED]> skrev: > Mark, your suggestion will incur a

Re: Design question [too many fields?]

2005-06-30 Thread Chris Lu
Mark, your suggestion will incur another trip to the database. And if the search results is large, filtering in DB by pk is not really good. Erik, your original "date" field is good when there is not many dates(<1024) in the database. Otherwise, Range Query can not handle it. My suggestion is

Re: Vedr. Re: Design question [too many fields?]

2005-06-29 Thread markharw00d
I suspect the most performant is as follows (but could require bags of RAM) : Heres the pseudo code . [on IndexReader open, initialize map] int []luceneDocIdsByDbKey=new int [largestDbKey]; //could be large array! for (int i=0;i;Should be super-quick but requires (int size* num db records) m

Vedr. Re: Design question [too many fields?]

2005-06-29 Thread Naimdjon Takhirov
Hi Jian, Thanks for your inputs. The (DB) datamodel is quite complex with rooms and room units (I skipped it to make the case easier to understand), so I guess the easiest and actually best way to do it is with the filter. Mark: yes, there are a lot of text fields the user should be able to searc

Re: Design question [too many fields?]

2005-06-29 Thread jian chen
Hi, Naimdjon, I have some suggestions as well along the lines of Mark Harwood. As an example, suppose for each hotel room there is a description, and you want the user to do free text search on the description field. You could do the following: 1) store hotel room reservation info as rows in a

Re: Design question [too many fields?]

2005-06-29 Thread Erik Hatcher
I second Mark's suggestion over the alternative I posted. My alternative was merely to invert the field structure originally described, but using a Filter for the volatile information is wiser. Erik On Jun 29, 2005, at 9:58 AM, mark harwood wrote: Presumably there is also a free-text

Re: Design question [too many fields?]

2005-06-29 Thread Erik Hatcher
On Jun 29, 2005, at 9:18 AM, Naimdjon Takhirov wrote: Hi, We are porting our search functionality over to lucene in our hotel solution which is java based. Today search is done directly against the database. There is a date search, i.e tourist would like to search for free rooms fromDate and to

Re: Design question [too many fields?]

2005-06-29 Thread mark harwood
Presumably there is also a free-text element to the search or you wouldn't be using Lucene. Multiple fields is not the way to go. A single Lucene field could contain multiple terms ( the available dates) but I still don't think that's the best solution. The availability info is likely to be pretty

Design question [too many fields?]

2005-06-29 Thread Naimdjon Takhirov
Hi, We are porting our search functionality over to lucene in our hotel solution which is java based. Today search is done directly against the database. There is a date search, i.e tourist would like to search for free rooms fromDate and toDate. The documents are added to the index pr hotel room(p