Hi Michael,
Yes the collector counts hits across all segments. Thanks for the
suggestion, I'm also asking the question on solr-dev.
Wei
On Thu, May 11, 2023 at 11:57 AM Michael Sokolov wrote:
> Maybe ask this issue on solr-dev then? I'm not familiar with how that
> collecto
on in SolrIndexSearcher
https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L281
Thanks,
Wei
On Thu, May 4, 2023 at 11:47 AM Michael Sokolov wrote:
> Yes, sorry I didn't mean to imply you couldn't c
? Any
suggestion is appreciated.
Thanks,
Wei
On Thu, May 4, 2023 at 3:33 AM Michael Sokolov wrote:
> There is no meaning to the sequence. The segments are created concurrently
> by many threads and the merge process will merge them without regards to
> any ordering.
>
>
>
> On
Thanks Patrick! In the default case when no LeafSorter is provided, are the
segments traversed in the order of creation time, i.e. the oldest segment
is always visited first?
Wei
On Tue, May 2, 2023 at 7:22 PM Patrick Zhai wrote:
> Hi Wei,
> Lucene in general iterate through the index
Hello,
We have a index that has multiple segments generated with continuous
updates. Does Lucene have a specific order when iterate through the
segments (assuming single query thread) ? Can the order be customized that
the latest generated segments are searched first?
Thanks,
Wei
on on this. Any
pointer is greatly appreciated.
Best,
Wei
Strange. That's all I got from the log beside the first line I wrote to
show starting merging with a time stamp.
On Sun, Apr 14, 2013 at 4:58 PM, Robert Muir wrote:
> Your stack trace is incomplete: it doesn't even show where the OOM
> occurred.
>
> On Sun, Apr 14, 201
t much memory consumption. But it seems not the case.
On Sun, Apr 14, 2013 at 4:13 PM, Wei Wang wrote:
> That makes sense.
>
> BTW, I checked the jar file. Exactly as you pointed out, the services
> files only contains info from lucene-core, without codec from
> lucene-codecs. Aft
JAR file with a ZIP
> > > program and check that all files in META-INF/services contain all
> > > entries merged from all Lucene JARs.
> > >
> > > Uwe
> > >
> > > -
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Brem
ith a ZIP program
> and check that all files in META-INF/services contain all entries merged
> from all Lucene JARs.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Orig
3, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Wei Wang [mailto:welshw...@gmail.com]
> > Sent: Sunday, April 14, 2013 11:30 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: DiskDocValuesFormat
>
ve created a single jar file that
has all necessary dependencies, such as lucene-codecs-4.2.0.jar. And I
assume the indexing step works well, so Lucene already knows the format
with name 'Disk'.
Thanks.
On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand wrote:
> Hi Wei,
>
> On Sat,
Hi Adrien,
Thanks for your example. Really helpful!
Wei
On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand wrote:
> Hi Wei,
>
> On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang wrote:
> > I am trying to use DiskDocValuesFormat for a particular
> > BinaryDocValuesField. It seems ther
I am trying to use DiskDocValuesFormat for a particular
BinaryDocValuesField. It seems there is no good examples showing how to do
this. The only hint I got from various docs and forums is set some codec in
IndexWriter. Could someone give a few lines of code snippet and show how to
set DiskDocValue
m, its unrelated to merging: it means you don't
> have enough RAM to support all the stuff you are putting in these
> binarydocvalues fields with an in-RAM implementation. I'd use "Disk" for
> this instead.
>
> On Thu, Apr 11, 2013 at 12:57 PM, Wei Wang wrote:
>
Hi,
After finishing indexing, we tried to consolidate all segments using
forcemerge, but we continuously get out of memory error even if we
increased the memory up to 4GB.
Exception in thread "main" java.lang.IllegalStateException: this writer hit
an OutOfMemoryError; cannot complete forceMerge
Thanks for the clarification. Very helpful.
On Wed, Apr 10, 2013 at 8:19 AM, Adrien Grand wrote:
> Hi,
>
> On Wed, Apr 10, 2013 at 4:59 PM, Wei Wang wrote:
> > Okay. Since there is no ByteField, setByteValue will never by used. It
> > seems like a dead function.
>
>
Hi,
On Wed, Apr 10, 2013 at 2:45 AM, Adrien Grand wrote:
> Hi,
>
> On Wed, Apr 10, 2013 at 9:34 AM, Wei Wang wrote:
> > IntField inherits from Field class a function called setByteValue().
> > However, if we call it, it gives an error message:
> >
> > java.lang
IntField inherits from Field class a function called setByteValue().
However, if we call it, it gives an error message:
java.lang.IllegalArgumentException: cannot change value type from Integer
to Byte
1. If this not allowed for IntField, and there is no ByteField, how will
function setByteValue(
Adrien and Rober, thanks a lot for the hints. Will try a few options and
see how it goes.
On Tue, Apr 9, 2013 at 9:25 AM, Robert Muir wrote:
> On Tue, Apr 9, 2013 at 9:11 AM, Adrien Grand wrote:
>
> > The default codec stores numeric doc values by blocks of 4096 values
> > that have independent
a from the comments.
On Tue, Apr 9, 2013 at 8:51 AM, Robert Muir wrote:
> On Tue, Apr 9, 2013 at 8:22 AM, Wei Wang wrote:
>
> > DocValues makes fast per doc value lookup possible, which is nice. But it
> > brings other interesting issues.
> >
> > Assume there are 100M d
DocValues makes fast per doc value lookup possible, which is nice. But it
brings other interesting issues.
Assume there are 100M docs and 200 NumericDocValuesFields, this ends up
with huge number of disk and memory usage, even if there are just thousands
of values for each field. I guess this is b
today ... but,
> likely this wouldn't really buy you much performance if it did vs just
> creating a new Document when the fields changed.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sun, Apr 7, 2013 at 2:41 AM, Wei Wang wrote:
> > Lucene encourages to
Lucene encourages to re-use Document by setting new values for Fields
contained within a Document object. This assumes there is no change to the
number and types of Fields contained in a Document object during indexing.
If the number and types of Fields contained in a Document object changes
from
error:
Exception in thread "main" java.lang.IllegalArgumentException: cannot
change value type from Long to Integer
Do we need to use setLongValue() all the time?
Thanks.
On Thu, Apr 4, 2013 at 3:58 PM, Wei Wang wrote:
> Thanks! Good to know the codec uses variable length encod
Thanks! Good to know the codec uses variable length encoding mechanism here.
On Thu, Apr 4, 2013 at 3:36 PM, Adrien Grand wrote:
> On Thu, Apr 4, 2013 at 11:03 PM, Wei Wang wrote:
> > Given the new Lucene 4.2 DocValues API, it seems no matter it is byte,
> > short, int, or lon
ed to
give some hint to NumericDocValuesField to save space?
On Thu, Apr 4, 2013 at 11:53 AM, Wei Wang wrote:
> Hi Adrien,
>
> Thanks for the clarification. It is very helpful. Will try Lucene 4.2 and
> AtomicReader API.
>
> Wei
>
>
> On Thu, Apr 4, 2013 at 11:22 AM, Adrie
Hi Adrien,
Thanks for the clarification. It is very helpful. Will try Lucene 4.2 and
AtomicReader API.
Wei
On Thu, Apr 4, 2013 at 11:22 AM, Adrien Grand wrote:
> Hi,
>
> On Thu, Apr 4, 2013 at 10:30 AM, Wei Wang wrote:
> > A few quick questions about DocValues:
> >
any examples to show how DocValues are stored and retrieved? It
seems JavaDoc only shows how to add it, and no complete examples are out
there.
Thanks in advance,
Wei
Hi Yann-Erwan,
Thank you for the detailed reply. Your idea seems reasonable. I will
give it a try for out environment settings.
Wei
On Tue, Mar 26, 2013 at 5:22 PM, Yann-Erwan Perio wrote:
> On Sun, Mar 24, 2013 at 10:46 AM, Wei Wang wrote:
>
> Hi,
>
>> For example, assume
Can someone give some hint on this? Or this is a tough problem.
Thanks in advance.
On Sun, Mar 24, 2013 at 2:46 AM, Wei Wang wrote:
> Hello,
>
> We have documents with many numerical fields. In some search scenario,
> we would like to create a filter based on the sum of the v
ble combination of pairs of
numerical fields which leads to large number of aggregated fields such
as F3. Can we directly use the values of F1 and F2 to create a filter?
Thanks,
Wei
-
To unsubscribe, e-mail: java-user-unsub
rnal field together with the docID of the parent doc to
remove the whole doc block. Here we assume the parent doc is given a
doc ID first during the indexing time.
Wei
On Sun, Mar 3, 2013 at 11:54 AM, Wei Wang wrote:
> I see. Probably assigning blockID is the most efficient way. Thanks.
>
n't join to
> anything.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Sat, Mar 2, 2013 at 11:34 PM, Wei Wang wrote:
>> Hello,
>>
>> I understand BlockJoinQuery can be used to index nested documents with
>> some internal structure. And
can we delete
the old document block efficiently? It seems IndexWriter does not
track these blocks.
Thanks,
Wei
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h
Thank you, Mike. I will try it out.
On Mon, Feb 25, 2013 at 4:01 PM, Michael McCandless
wrote:
> On Mon, Feb 25, 2013 at 2:19 PM, Wei Wang wrote:
>> Cool. Thanks, Ian.
>>
>> I will try FieldCacheTermsFilter.
>>
>> A related question. Occasionally, we would like
to
maxDoc. If we are able to interpret bitmap of filters directly, it may
be more efficient.
Can we use Filter to return list of docs or count of docs directly?
Wei
> I'm sure that Filters are thread safe.
>
> Lucene doesn't have a global caching mechanism as such. But see
a central place. I noticed FilterManager was removed
from Lucene 4. Is there another class replacing FilterManager?
Thanks!
Wei
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail
hat is, force Lucene
to create Query2 for both Input1 and Input2.
Thanks,
Wei
Original Message
Subject: Re: Lucene QueryParser and Analyzer
From: Sudarsan, Sithu D.
To: java-user@lucene.apache.org
Date: 4/29/2010 4:54 PM
---sample code-
Analyzer analyze
ifcal. Does
QueryParser doing any sort of pre-processing or filtering beforehand? If
so, how can I turn it off?
Aside from stopping tokens at punctuations, my analyzer is also doing
Chinese word segmentation, so I'd like to be sure that QueryParser is
using the analyzer the way I exp
?
Thanks,
Wei Ho
Original Message
Subject: Re: Lucene QueryParser and Analyzer
From: Sudarsan, Sithu D.
To: java-user@lucene.apache.org
Date: 4/29/2010 3:54 PM
Hi,
Is there a whitespace after the comma?
Sincerely,
Sithu D Sudarsan
-Original Message-
From: Wei Ho
rser.parse(queryLine[1]);
ScoreDoc[] results = searcher.search(query, TOP_N).scoreDocs;
---
I'm probably just doing something dumb, but any help would be greatly
appreciated!
Thanks,
Wei Ho
---
:
transportation: car
to match the document because car is a subclass of vehicle.
Is it possible to change the part where Lucene decide if a term is matched?
So I can take the subclass relationship into account?
Many thanks.
--
Jason Wei
Chair of Software Engineering, ETH Zurich
We are currently running a search service with a single Lucene index
of about 10 GB. We would like to find out:
(a) What is the usual index size of everyone else? How large have
Lucene index gone in prodution environments, and is there a sort of a
optimal size that Lucene indexes should be?
(b)
you done profiling on your application such that you are
sure moving Lucene off the machine is going to help that much?
Cheers,
Grant
ps, the mailing lists strips attachments.
On Jun 28, 2007, at 10:19 AM, Samuel LEMOINE wrote:
> Chun Wei Ho a écrit :
>> Hi,
>>
>> We are
Hi,
We are currently running a Tomcat web application serving searches
over our Lucene index (10GB) on a single server machine (Dual 3GHz
CPU, 4GB RAM). Due to performance issues and to scale up to handle
more traffic/search requests, we are getting another server machine.
We are looking at two
Thanks for the ideas.
We are testing out the methods and changes suggested to see if they
work with our current set up, and are checking if the disks are the
bottleneck in this case, but feel free to drop more hints. :)
At the moment we are copying the index at an offpeak hour, but we
would also
We are running a search service on the internet using two machines. We
have a crawler machine which crawls the web and merges new documents
found into the Lucene index. We have a searcher machine which allows
users to perform searches on the Lucene index.
Periodically, we would copy the newest ve
Hi,
We run a search engine based on Lucene 1.9.1 / Nutch 0.7.2. Our index
has approximately 2 million documents and the physical size of it is
about 10 GB. We run it as a tomcat web application on a Fedora Core 4
server with duo Xeon 3.2GHz processors and 4GB RAM.
We receive about 46500 web sear
We are starting to run a small index of classifieds alongside our main
search items. The classifieds are also in a lucene index. We show
classifieds that match the user's search criteria, which means we do a
lucene search on that index and show the top few results. We also keep
track of the number
Hi,
I've been trying to adjust the weightings for my searches (thanks
Chris for his replies on that thread), and have been using
ConstantScoreQuery to even out scores from portions in my query that I
want to match but not to contribute to the ranking of that result.
I convert a BooleanQuery/Term
I have a index from which I have a number of documents from authors,
but would like to drop the relevance/score for documents from one
particular author using the query. That is for documents returned by
querying: (content:"miracle cure"), I would like to reduce the
relevancy of authorid:3024
How
I am performing searches on an index that includes a title field and a
content field, and return results only if either title or content
matches ALL the words searched. So searching for "miracle cure for
cancer" might yield:
(+title:miracle +title:cure +title:for +title:cancer)^5.0
(+content:mira
Hi,
I use Hits to search for and get documents matching a particular query, e.g.:
Hits hits = indexSearcher.search(new TermQuery(new Term("startswith","A")));
but it is not returning all the matching documents in the index. From
experimentation it appears to return about less than half the match
I would like to make some updates to values within my large index. I
understand that I have to delete and re-insert each document to be
changed to do that. However I do have some large fields that are
unstored (only indexed and no, these are not the fields that I am
wanting to change), which means
I have a large Lucene index that I am planning on adding one or more
search fields, and perform searches on them.
How do I include results from the other documents that do not have the
new field? For example, I have 10 million documents in a index, and I
update 200 of them adding the field "b" =
Hi,
I have a pretty large index and I would like to obtain all the Terms
for only one or two particular fields.
As I understand - IndexReader.terms() returns a termEnum of all the
terms in the index, and I would have to iterate through all of them to
pick out the ones from the fields that I want
I am wondering if anyone has existing code for a simpler QueryParser -
one that does not create the more complex prefix/fuzzy/range queries,
but still allow the usual term/boolean queries.
I use QueryParser to directly parse user input (allowing for more
flexible specification of include/exclude a
Hi,
I am in the process of deciding specs for a crawling machine and a
searching machine (two machines), which will support merging/indexing
and searching operations on a single Lucene index that may scale to
about several million pages (at which it would be about 2-10 GB,
assuming linear growth w
ull;
>
>
> public Query getQuery() {
> return query;
> }
>
>
> public void setQuery(Query query) {
> this.query = query;
> }
>
>
> public String toString(){
> return query.toString();
> }
>
>
Hi,
I am trying to suggest refine searches for my Lucene search. For
example, if a search turned out too many searches, it would list a
number of document title subsequences that occurred frequently in the
results of the previous search, as possible candidates for refining
the search.
Does anyone
Hi,
I am running a search for something akin to a news site, when each
news document has a date, title, keywords/bylines, summary fields and
then the actual content. Using Lucene for this database of documents,
it seems that:
1. The relevancy score is skewed drastically by the actual number of
ne
I am deploying a web application serving searches on a Lucene index,
and am deciding between distributing search between several machines
or single searching, and was hoping that someone could tell me from
their experiences:
+ Is there anything particular to watch out for if using distributed
sear
Thanks for the info :) One last related question.
If I delete documents using a IndexReader(), can I assume that the
internal document numbers of other undeleted documents (obtained using
the same IndexReader instance) will not change until I call
IndexReader.close()?
Hi,
Thanks for the help, just a few more questions:
On 1/26/06, Paul Elschot <[EMAIL PROTECTED]> wrote:
> On Thursday 26 January 2006 09:15, Chun Wei Ho wrote:
> > I am attempting to prune an index by getting each document in turn and
> > then checking/deleting it:
&
I am attempting to prune an index by getting each document in turn and
then checking/deleting it:
IndexReader ir = IndexReader.open(path);
for(int i=0;i
66 matches
Mail list logo