We are currently running a search service with a single Lucene index
of about 10 GB. We would like to find out:
(a) What is the usual index size of everyone else? How large have
Lucene index gone in prodution environments, and is there a sort of a
optimal size that Lucene indexes should be?
(b)
you done profiling on your application such that you are
sure moving Lucene off the machine is going to help that much?
Cheers,
Grant
ps, the mailing lists strips attachments.
On Jun 28, 2007, at 10:19 AM, Samuel LEMOINE wrote:
> Chun Wei Ho a écrit :
>> Hi,
>>
>> We are
Hi,
We are currently running a Tomcat web application serving searches
over our Lucene index (10GB) on a single server machine (Dual 3GHz
CPU, 4GB RAM). Due to performance issues and to scale up to handle
more traffic/search requests, we are getting another server machine.
We are looking at two
Thanks for the ideas.
We are testing out the methods and changes suggested to see if they
work with our current set up, and are checking if the disks are the
bottleneck in this case, but feel free to drop more hints. :)
At the moment we are copying the index at an offpeak hour, but we
would also
We are running a search service on the internet using two machines. We
have a crawler machine which crawls the web and merges new documents
found into the Lucene index. We have a searcher machine which allows
users to perform searches on the Lucene index.
Periodically, we would copy the newest ve
Hi,
We run a search engine based on Lucene 1.9.1 / Nutch 0.7.2. Our index
has approximately 2 million documents and the physical size of it is
about 10 GB. We run it as a tomcat web application on a Fedora Core 4
server with duo Xeon 3.2GHz processors and 4GB RAM.
We receive about 46500 web sear
We are starting to run a small index of classifieds alongside our main
search items. The classifieds are also in a lucene index. We show
classifieds that match the user's search criteria, which means we do a
lucene search on that index and show the top few results. We also keep
track of the number
Hi,
I've been trying to adjust the weightings for my searches (thanks
Chris for his replies on that thread), and have been using
ConstantScoreQuery to even out scores from portions in my query that I
want to match but not to contribute to the ranking of that result.
I convert a BooleanQuery/Term
I have a index from which I have a number of documents from authors,
but would like to drop the relevance/score for documents from one
particular author using the query. That is for documents returned by
querying: (content:"miracle cure"), I would like to reduce the
relevancy of authorid:3024
How
I am performing searches on an index that includes a title field and a
content field, and return results only if either title or content
matches ALL the words searched. So searching for "miracle cure for
cancer" might yield:
(+title:miracle +title:cure +title:for +title:cancer)^5.0
(+content:mira
Hi,
I use Hits to search for and get documents matching a particular query, e.g.:
Hits hits = indexSearcher.search(new TermQuery(new Term("startswith","A")));
but it is not returning all the matching documents in the index. From
experimentation it appears to return about less than half the match
I would like to make some updates to values within my large index. I
understand that I have to delete and re-insert each document to be
changed to do that. However I do have some large fields that are
unstored (only indexed and no, these are not the fields that I am
wanting to change), which means
I have a large Lucene index that I am planning on adding one or more
search fields, and perform searches on them.
How do I include results from the other documents that do not have the
new field? For example, I have 10 million documents in a index, and I
update 200 of them adding the field "b" =
Hi,
I have a pretty large index and I would like to obtain all the Terms
for only one or two particular fields.
As I understand - IndexReader.terms() returns a termEnum of all the
terms in the index, and I would have to iterate through all of them to
pick out the ones from the fields that I want
I am wondering if anyone has existing code for a simpler QueryParser -
one that does not create the more complex prefix/fuzzy/range queries,
but still allow the usual term/boolean queries.
I use QueryParser to directly parse user input (allowing for more
flexible specification of include/exclude a
Hi,
I am in the process of deciding specs for a crawling machine and a
searching machine (two machines), which will support merging/indexing
and searching operations on a single Lucene index that may scale to
about several million pages (at which it would be about 2-10 GB,
assuming linear growth w
ull;
>
>
> public Query getQuery() {
> return query;
> }
>
>
> public void setQuery(Query query) {
> this.query = query;
> }
>
>
> public String toString(){
> return query.toString();
> }
>
>
Hi,
I am trying to suggest refine searches for my Lucene search. For
example, if a search turned out too many searches, it would list a
number of document title subsequences that occurred frequently in the
results of the previous search, as possible candidates for refining
the search.
Does anyone
Hi,
I am running a search for something akin to a news site, when each
news document has a date, title, keywords/bylines, summary fields and
then the actual content. Using Lucene for this database of documents,
it seems that:
1. The relevancy score is skewed drastically by the actual number of
ne
I am deploying a web application serving searches on a Lucene index,
and am deciding between distributing search between several machines
or single searching, and was hoping that someone could tell me from
their experiences:
+ Is there anything particular to watch out for if using distributed
sear
Thanks for the info :) One last related question.
If I delete documents using a IndexReader(), can I assume that the
internal document numbers of other undeleted documents (obtained using
the same IndexReader instance) will not change until I call
IndexReader.close()?
Hi,
Thanks for the help, just a few more questions:
On 1/26/06, Paul Elschot <[EMAIL PROTECTED]> wrote:
> On Thursday 26 January 2006 09:15, Chun Wei Ho wrote:
> > I am attempting to prune an index by getting each document in turn and
> > then checking/deleting it:
&
I am attempting to prune an index by getting each document in turn and
then checking/deleting it:
IndexReader ir = IndexReader.open(path);
for(int i=0;i
23 matches
Mail list logo