: For example, when you are indexing every hour and large document set
: is present, it takes >1 hr to index the documents. Now you are
: already behind indexing for the next hour. How do you design
: something that is robust?
fundementally, this question is really about issues in a producer/con
Don't do it that way ? Is this an actual or theoretical
scenario? And do you reasonably expect it to become actual?
Otherwise, why bother?
And you've got other problems here. If you're indexing that
much data, you'll soon outgrow your disk. Unless you're
replacing most of the documents.
But assu
I am indexing documents periodically every hour. I have a scenario.
For example, when you are indexing every hour and large document set
is present, it takes >1 hr to index the documents. Now you are
already behind indexing for the next hour. How do you design
something that is robust?
thanks.
Dan,
I agree Lucene users are repeatedly solving the same
problems of reading, writing and creating indexes,
building queries, scaling, parsing docs etc.
There's a 'HowTo' section on the Wiki made
for sharing tips and best practices:
http://wiki.apache.org/lucene-java/HowTo
but few new addition
milar.
My question for example is a design question. Every time, you talk about a
single index to use, in my case (a platform with user groups no related
among themselves), i thinked the best option is use an index for every
group, and when I need to search something over all i will use a
Multisea
Hi Michael,
if I understand your questions correctly - feels like I must have missed
something - here is what can do to achieve what you want:
index these fields:
to
from
content
subject
all (includes text from all the above 4 fields)
and use "all" as your default search field. Then when you
: One final note, it may be much easier for you to throw all the
: fields into a single uber-field and search that rather than implement
: all four separate clauses, but it's a trade off between simplicity and
: size.
this would be a very simple way to get the behavior you describe straight
f
I don't believe there's anything built into Lucene that helps you out here
because you're really saying "do special things for my problem space
in these situations".
So about the only thing you can do that I know of is to construct the
query yourself by making a series of additions to BooleanQuer
Hello All,
We allow our users to search through our index with a simple textfield.
The search phrase has "content" as its default value. This allows them
to search quickly through content but then when they type "to:blah AND
from:foo AND content:boogie" it will know to parse,etc.
What I wa
rs throat you could get
a book
;)
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - Share
- Original Message
From: shai deljo <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, February 20, 2007 2:05:2
t; . . . . .
> > Simpy -- http://www.simpy.com/ - Tag - Search
> - Share
> >
> > - Original Message
> > From: shai deljo <[EMAIL PROTECTED]>
> > To: java-user@lucene.apache.org
> > Sent: Tuesday, February 20, 2007 2:05:25 PM
> > Subj
--
>> From: shai deljo <[EMAIL PROTECTED]>
>> To: java-user@lucene.apache.org
>> Sent: Tuesday, February 20, 2007 2:05:25 PM
>> Subject: Re: Using Lucene - Design Question
>>
>> Hi,
>> Thanks for the reply.
>> * Regarding hardware I'll
could get a book ;)
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - Share
- Original Message
From: shai deljo <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, February 20, 2007 2:05:25 PM
Subject: Re: Using Lucene - Des
work, but you'll have to make sure that the
> segments file gets rsynced last.
>
> Otis
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Simpy -- http://www.simpy.com/ - Tag - Search - Share
>
> - Original Message
> From: shai deljo <[EMAIL PROTECTED]>
> To: java-user@lucene.
. . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - Share
- Original Message
From: shai deljo <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, February 20, 2007 5:51:13 AM
Subject: Using Lucene - Design Question
Hi,
I have no experience
ced last.
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - Share
- Original Message
From: shai deljo <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, February 20, 2007 5:51:13 AM
Subject: Using Lucene - D
Hi,
I have no experience with Lucene and I'm trying to collect some
information in order to determine what solution is best for me.
I need to index ~50M documents (starting with 10M), the size of each
document is ~2k-~5k and I'll index a couple of fields per document. I
expect ~20 queries per seco
erfectly suitable for the job and a lot simpler.
Otis
- Original Message
From: "Chenini, Mohamed " <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, October 12, 2006 10:25:44 AM
Subject: a design question
Hello,
This is a design question: For Lucene to
t; <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, October 12, 2006 10:25:44 AM
Subject: a design question
Hello,
This is a design question: For Lucene to be able to process a million
documents and in the purpose for the search application to be scalable
and still have
way with it but I see no
> particular reasons why using an EJB server should offer any benefits
> over a Servlet container.
>
> Cheers
> Mark
>
> - Original Message
> From: "Chenini, Mohamed " <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Se
From: "Chenini, Mohamed " <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, 12 October, 2006 3:25:44 PM
Subject: a design question
Hello,
This is a design question: For Lucene to be able to process a million
documents and in the purpose for the search a
benefits over a Servlet container.
Cheers
Mark
- Original Message
From: "Chenini, Mohamed " <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, 12 October, 2006 3:25:44 PM
Subject: a design question
Hello,
This is a design question: For Lucene to be able to p
Hello,
This is a design question: For Lucene to be able to process a million
documents and in the purpose for the search application to be scalable
and still have a good response time do we need to use an EJB container
such as Weblogic or is a Servlet container such as Tomcat sufficient to
do the
Hi,
> If you store only IDs in Lucene, you won't be able to search using
> > keywords (text).
>
>
> Let me explain better. Suppose I have an index field called
> categories which
> contains a list of ids for each post. For example - 1 2 45 198. Now I
> can
> search on the contents field but re
Hi,
If you store only IDs in Lucene, you won't be able to search using
> keywords (text).
Let me explain better. Suppose I have an index field called categories which
contains a list of ids for each post. For example - 1 2 45 198. Now I can
search on the contents field but restricted to all po
Hi,
> Let me describe my issue taking a simpler model. Lets say I were to
> build a
> blog which allows each post to have multiple keywords. I want to
> provide a
> search over the posts but restricted to a subset of the keywords (say
> -
> python, windows, etc.). How can I structure the index
Hi list,
Let me describe my issue taking a simpler model. Lets say I were to build a
blog which allows each post to have multiple keywords. I want to provide a
search over the posts but restricted to a subset of the keywords (say -
python, windows, etc.). How can I structure the index in this c
: My head was thinking to find a generic solution to Lucene's
: limitation: The TooManyClauses problem when using RangeQuery and there
: are more than 1024 values. It should be another thread.
It's been discussed in several threads, and i can think of 2 good
solutions at this point...
Using a Ra
about 4900 room units which I think is OK as far as
Still we have optimization work to do.
Assuming your availability is a year in advance and yours is a reputable chain
of hotels that books rooms by the day, (not the hour!) You only need:
4900 * 365 bits of true/false info to cache all the ava
Guys, thanks for your inputs.
I think the solution Mark has suggested does solves
the problem in an acceptable way. Its actually gonna
be a little better than the solution the customer is
has right now.
Apart from the availability we have to also check if
there is any price for room units saved in
Erik, Mark and Naimdjon, Sorry I totally misunderstood the question,
of multiple dates for a Document. I came to agree with Erik and Mark
on this problem.
My head was thinking to find a generic solution to Lucene's
limitation: The TooManyClauses problem when using RangeQuery and there
are more th
On Jun 30, 2005, at 11:27 PM, Chris Lu wrote:
Mark, your suggestion will incur another trip to the database. And
if the search results is large, filtering in DB by pk is not really
good.
Chris - I disagree with that last comment. It can be a great
solution when the filter is cached. Cer
> It is anyway going to be too many fields then? Days of
> year for the whole year ahead? Since the fromDate and
> toDate can be across two months and the customer wants
> the data be available for one year.
It won't have too many fields.
> > My suggestion is, use "year" + "month" + "day" three
>
Hi Chris,
It is anyway going to be too many fields then? Days of
year for the whole year ahead? Since the fromDate and
toDate can be across two months and the customer wants
the data be available for one year.
Naimdjon
--- Chris Lu <[EMAIL PROTECTED]> skrev:
> Mark, your suggestion will incur a
Mark, your suggestion will incur another trip to the database. And if
the search results is large, filtering in DB by pk is not really good.
Erik, your original "date" field is good when there is not many
dates(<1024) in the database. Otherwise, Range Query can not handle it.
My suggestion is
I suspect the most performant is as follows (but could require bags of
RAM) :
Heres the pseudo code .
[on IndexReader open, initialize map]
int []luceneDocIdsByDbKey=new int [largestDbKey]; //could be large array!
for (int i=0;i;Should be super-quick but requires (int size* num db records) m
Hi Jian,
Thanks for your inputs. The (DB) datamodel is quite
complex with rooms and room units (I skipped it to
make the case easier to understand), so I guess the
easiest and actually best way to do it is with the
filter.
Mark: yes, there are a lot of text fields the user
should be able to searc
Hi, Naimdjon,
I have some suggestions as well along the lines of Mark Harwood.
As an example, suppose for each hotel room there is a description, and
you want the user to do free text search on the description field.
You could do the following:
1) store hotel room reservation info as rows in a
I second Mark's suggestion over the alternative I posted. My
alternative was merely to invert the field structure originally
described, but using a Filter for the volatile information is wiser.
Erik
On Jun 29, 2005, at 9:58 AM, mark harwood wrote:
Presumably there is also a free-text
On Jun 29, 2005, at 9:18 AM, Naimdjon Takhirov wrote:
Hi,
We are porting our search functionality over to lucene
in our hotel solution which is java based. Today
search is done directly against the database.
There is a date search, i.e tourist would like to
search for free rooms fromDate and to
Presumably there is also a free-text element to the
search or you wouldn't be using Lucene.
Multiple fields is not the way to go.
A single Lucene field could contain multiple terms (
the available dates) but I still don't think that's
the best solution.
The availability info is likely to be pretty
Hi,
We are porting our search functionality over to lucene
in our hotel solution which is java based. Today
search is done directly against the database.
There is a date search, i.e tourist would like to
search for free rooms fromDate and toDate.
The documents are added to the index pr hotel
room(p
42 matches
Mail list logo