Re: Question about index segment search order

2023-05-11 Thread Wei
Hi Michael, Yes the collector counts hits across all segments. Thanks for the suggestion, I'm also asking the question on solr-dev. Wei On Thu, May 11, 2023 at 11:57 AM Michael Sokolov wrote: > Maybe ask this issue on solr-dev then? I'm not familiar with how that > collecto

Re: Question about index segment search order

2023-05-09 Thread Wei
on in SolrIndexSearcher https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L281 Thanks, Wei On Thu, May 4, 2023 at 11:47 AM Michael Sokolov wrote: > Yes, sorry I didn't mean to imply you couldn't c

Re: Question about index segment search order

2023-05-04 Thread Wei
? Any suggestion is appreciated. Thanks, Wei On Thu, May 4, 2023 at 3:33 AM Michael Sokolov wrote: > There is no meaning to the sequence. The segments are created concurrently > by many threads and the merge process will merge them without regards to > any ordering. > > > > On

Re: Question about index segment search order

2023-05-03 Thread Wei
Thanks Patrick! In the default case when no LeafSorter is provided, are the segments traversed in the order of creation time, i.e. the oldest segment is always visited first? Wei On Tue, May 2, 2023 at 7:22 PM Patrick Zhai wrote: > Hi Wei, > Lucene in general iterate through the index

Question about index segment search order

2023-05-02 Thread Wei
Hello, We have a index that has multiple segments generated with continuous updates. Does Lucene have a specific order when iterate through the segments (assuming single query thread) ? Can the order be customized that the latest generated segments are searched first? Thanks, Wei

Lucene 8 early termination

2020-01-23 Thread Wei
on on this. Any pointer is greatly appreciated. Best, Wei

Re: DiskDocValuesFormat

2013-04-14 Thread Wei Wang
Strange. That's all I got from the log beside the first line I wrote to show starting merging with a time stamp. On Sun, Apr 14, 2013 at 4:58 PM, Robert Muir wrote: > Your stack trace is incomplete: it doesn't even show where the OOM > occurred. > > On Sun, Apr 14, 201

Re: DiskDocValuesFormat

2013-04-14 Thread Wei Wang
t much memory consumption. But it seems not the case. On Sun, Apr 14, 2013 at 4:13 PM, Wei Wang wrote: > That makes sense. > > BTW, I checked the jar file. Exactly as you pointed out, the services > files only contains info from lucene-core, without codec from > lucene-codecs. Aft

Re: DiskDocValuesFormat

2013-04-14 Thread Wei Wang
JAR file with a ZIP > > > program and check that all files in META-INF/services contain all > > > entries merged from all Lucene JARs. > > > > > > Uwe > > > > > > - > > > Uwe Schindler > > > H.-H.-Meier-Allee 63, D-28213 Brem

Re: DiskDocValuesFormat

2013-04-14 Thread Wei Wang
ith a ZIP program > and check that all files in META-INF/services contain all entries merged > from all Lucene JARs. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Orig

Re: DiskDocValuesFormat

2013-04-14 Thread Wei Wang
3, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Wei Wang [mailto:welshw...@gmail.com] > > Sent: Sunday, April 14, 2013 11:30 PM > > To: java-user@lucene.apache.org > > Subject: Re: DiskDocValuesFormat >

Re: DiskDocValuesFormat

2013-04-14 Thread Wei Wang
ve created a single jar file that has all necessary dependencies, such as lucene-codecs-4.2.0.jar. And I assume the indexing step works well, so Lucene already knows the format with name 'Disk'. Thanks. On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand wrote: > Hi Wei, > > On Sat,

Re: DiskDocValuesFormat

2013-04-13 Thread Wei Wang
Hi Adrien, Thanks for your example. Really helpful! Wei On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand wrote: > Hi Wei, > > On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang wrote: > > I am trying to use DiskDocValuesFormat for a particular > > BinaryDocValuesField. It seems ther

DiskDocValuesFormat

2013-04-12 Thread Wei Wang
I am trying to use DiskDocValuesFormat for a particular BinaryDocValuesField. It seems there is no good examples showing how to do this. The only hint I got from various docs and forums is set some codec in IndexWriter. Could someone give a few lines of code snippet and show how to set DiskDocValue

Re: Forcemerge running out of memory

2013-04-11 Thread Wei Wang
m, its unrelated to merging: it means you don't > have enough RAM to support all the stuff you are putting in these > binarydocvalues fields with an in-RAM implementation. I'd use "Disk" for > this instead. > > On Thu, Apr 11, 2013 at 12:57 PM, Wei Wang wrote: >

Forcemerge running out of memory

2013-04-11 Thread Wei Wang
Hi, After finishing indexing, we tried to consolidate all segments using forcemerge, but we continuously get out of memory error even if we increased the memory up to 4GB. Exception in thread "main" java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot complete forceMerge

Re: IntField question

2013-04-10 Thread Wei Wang
Thanks for the clarification. Very helpful. On Wed, Apr 10, 2013 at 8:19 AM, Adrien Grand wrote: > Hi, > > On Wed, Apr 10, 2013 at 4:59 PM, Wei Wang wrote: > > Okay. Since there is no ByteField, setByteValue will never by used. It > > seems like a dead function. > >

Re: IntField question

2013-04-10 Thread Wei Wang
Hi, On Wed, Apr 10, 2013 at 2:45 AM, Adrien Grand wrote: > Hi, > > On Wed, Apr 10, 2013 at 9:34 AM, Wei Wang wrote: > > IntField inherits from Field class a function called setByteValue(). > > However, if we call it, it gives an error message: > > > > java.lang

IntField question

2013-04-10 Thread Wei Wang
IntField inherits from Field class a function called setByteValue(). However, if we call it, it gives an error message: java.lang.IllegalArgumentException: cannot change value type from Integer to Byte 1. If this not allowed for IntField, and there is no ByteField, how will function setByteValue(

Re: DocValues space usage

2013-04-09 Thread Wei Wang
Adrien and Rober, thanks a lot for the hints. Will try a few options and see how it goes. On Tue, Apr 9, 2013 at 9:25 AM, Robert Muir wrote: > On Tue, Apr 9, 2013 at 9:11 AM, Adrien Grand wrote: > > > The default codec stores numeric doc values by blocks of 4096 values > > that have independent

Re: DocValues space usage

2013-04-09 Thread Wei Wang
a from the comments. On Tue, Apr 9, 2013 at 8:51 AM, Robert Muir wrote: > On Tue, Apr 9, 2013 at 8:22 AM, Wei Wang wrote: > > > DocValues makes fast per doc value lookup possible, which is nice. But it > > brings other interesting issues. > > > > Assume there are 100M d

DocValues space usage

2013-04-09 Thread Wei Wang
DocValues makes fast per doc value lookup possible, which is nice. But it brings other interesting issues. Assume there are 100M docs and 200 NumericDocValuesFields, this ends up with huge number of disk and memory usage, even if there are just thousands of values for each field. I guess this is b

Re: Reuse Document

2013-04-07 Thread Wei Wang
today ... but, > likely this wouldn't really buy you much performance if it did vs just > creating a new Document when the fields changed. > > Mike McCandless > > http://blog.mikemccandless.com > > On Sun, Apr 7, 2013 at 2:41 AM, Wei Wang wrote: > > Lucene encourages to

Reuse Document

2013-04-06 Thread Wei Wang
Lucene encourages to re-use Document by setting new values for Fields contained within a Document object. This assumes there is no change to the number and types of Fields contained in a Document object during indexing. If the number and types of Fields contained in a Document object changes from

Re: DocValues questions

2013-04-04 Thread Wei Wang
error: Exception in thread "main" java.lang.IllegalArgumentException: cannot change value type from Long to Integer Do we need to use setLongValue() all the time? Thanks. On Thu, Apr 4, 2013 at 3:58 PM, Wei Wang wrote: > Thanks! Good to know the codec uses variable length encod

Re: DocValues questions

2013-04-04 Thread Wei Wang
Thanks! Good to know the codec uses variable length encoding mechanism here. On Thu, Apr 4, 2013 at 3:36 PM, Adrien Grand wrote: > On Thu, Apr 4, 2013 at 11:03 PM, Wei Wang wrote: > > Given the new Lucene 4.2 DocValues API, it seems no matter it is byte, > > short, int, or lon

Re: DocValues questions

2013-04-04 Thread Wei Wang
ed to give some hint to NumericDocValuesField to save space? On Thu, Apr 4, 2013 at 11:53 AM, Wei Wang wrote: > Hi Adrien, > > Thanks for the clarification. It is very helpful. Will try Lucene 4.2 and > AtomicReader API. > > Wei > > > On Thu, Apr 4, 2013 at 11:22 AM, Adrie

Re: DocValues questions

2013-04-04 Thread Wei Wang
Hi Adrien, Thanks for the clarification. It is very helpful. Will try Lucene 4.2 and AtomicReader API. Wei On Thu, Apr 4, 2013 at 11:22 AM, Adrien Grand wrote: > Hi, > > On Thu, Apr 4, 2013 at 10:30 AM, Wei Wang wrote: > > A few quick questions about DocValues: > >

DocValues questions

2013-04-04 Thread Wei Wang
any examples to show how DocValues are stored and retrieved? It seems JavaDoc only shows how to add it, and no complete examples are out there. Thanks in advance, Wei

Re: Filter based on the sum of values of two fields

2013-03-27 Thread Wei Wang
Hi Yann-Erwan, Thank you for the detailed reply. Your idea seems reasonable. I will give it a try for out environment settings. Wei On Tue, Mar 26, 2013 at 5:22 PM, Yann-Erwan Perio wrote: > On Sun, Mar 24, 2013 at 10:46 AM, Wei Wang wrote: > > Hi, > >> For example, assume

Re: Filter based on the sum of values of two fields

2013-03-26 Thread Wei Wang
Can someone give some hint on this? Or this is a tough problem. Thanks in advance. On Sun, Mar 24, 2013 at 2:46 AM, Wei Wang wrote: > Hello, > > We have documents with many numerical fields. In some search scenario, > we would like to create a filter based on the sum of the v

Filter based on the sum of values of two fields

2013-03-24 Thread Wei Wang
ble combination of pairs of numerical fields which leads to large number of aggregated fields such as F3. Can we directly use the values of F1 and F2 to create a filter? Thanks, Wei - To unsubscribe, e-mail: java-user-unsub

Re: BlockJoinQuery: delete documents

2013-03-05 Thread Wei Wang
rnal field together with the docID of the parent doc to remove the whole doc block. Here we assume the parent doc is given a doc ID first during the indexing time. Wei On Sun, Mar 3, 2013 at 11:54 AM, Wei Wang wrote: > I see. Probably assigning blockID is the most efficient way. Thanks. >

Re: BlockJoinQuery: delete documents

2013-03-03 Thread Wei Wang
n't join to > anything. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Sat, Mar 2, 2013 at 11:34 PM, Wei Wang wrote: >> Hello, >> >> I understand BlockJoinQuery can be used to index nested documents with >> some internal structure. And

BlockJoinQuery: delete documents

2013-03-02 Thread Wei Wang
can we delete the old document block efficiently? It seems IndexWriter does not track these blocks. Thanks, Wei - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h

Re: Lucene filter questions

2013-02-25 Thread Wei Wang
Thank you, Mike. I will try it out. On Mon, Feb 25, 2013 at 4:01 PM, Michael McCandless wrote: > On Mon, Feb 25, 2013 at 2:19 PM, Wei Wang wrote: >> Cool. Thanks, Ian. >> >> I will try FieldCacheTermsFilter. >> >> A related question. Occasionally, we would like

Re: Lucene filter questions

2013-02-25 Thread Wei Wang
to maxDoc. If we are able to interpret bitmap of filters directly, it may be more efficient. Can we use Filter to return list of docs or count of docs directly? Wei > I'm sure that Filters are thread safe. > > Lucene doesn't have a global caching mechanism as such. But see

Lucene filter questions

2013-02-24 Thread Wei Wang
a central place. I noticed FilterManager was removed from Lucene 4. Is there another class replacing FilterManager? Thanks! Wei - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail

Re: Lucene QueryParser and Analyzer

2010-04-29 Thread Wei Ho
hat is, force Lucene to create Query2 for both Input1 and Input2. Thanks, Wei Original Message Subject: Re: Lucene QueryParser and Analyzer From: Sudarsan, Sithu D. To: java-user@lucene.apache.org Date: 4/29/2010 4:54 PM ---sample code- Analyzer analyze

Re: Lucene QueryParser and Analyzer

2010-04-29 Thread Wei Ho
ifcal. Does QueryParser doing any sort of pre-processing or filtering beforehand? If so, how can I turn it off? Aside from stopping tokens at punctuations, my analyzer is also doing Chinese word segmentation, so I'd like to be sure that QueryParser is using the analyzer the way I exp

Re: Lucene QueryParser and Analyzer

2010-04-29 Thread Wei Ho
? Thanks, Wei Ho Original Message Subject: Re: Lucene QueryParser and Analyzer From: Sudarsan, Sithu D. To: java-user@lucene.apache.org Date: 4/29/2010 3:54 PM Hi, Is there a whitespace after the comma? Sincerely, Sithu D Sudarsan -Original Message- From: Wei Ho

Lucene QueryParser and Analyzer

2010-04-29 Thread Wei Ho
rser.parse(queryLine[1]); ScoreDoc[] results = searcher.search(query, TOP_N).scoreDocs; --- I'm probably just doing something dumb, but any help would be greatly appreciated! Thanks, Wei Ho ---

It is possible to change the meaning of a match in lucene

2010-04-22 Thread Wei Yi
: transportation: car to match the document because car is a subclass of vehicle. Is it possible to change the part where Lucene decide if a term is matched? So I can take the subclass relationship into account? Many thanks. -- Jason Wei Chair of Software Engineering, ETH Zurich

Lucene index sizes and performance

2007-07-07 Thread Chun Wei Ho
We are currently running a search service with a single Lucene index of about 10 GB. We would like to find out: (a) What is the usual index size of everyone else? How large have Lucene index gone in prodution environments, and is there a sort of a optimal size that Lucene indexes should be? (b)

Re: Scaling up to several machines with Lucene

2007-07-07 Thread Chun Wei Ho
you done profiling on your application such that you are sure moving Lucene off the machine is going to help that much? Cheers, Grant ps, the mailing lists strips attachments. On Jun 28, 2007, at 10:19 AM, Samuel LEMOINE wrote: > Chun Wei Ho a écrit : >> Hi, >> >> We are

Scaling up to several machines with Lucene

2007-06-28 Thread Chun Wei Ho
Hi, We are currently running a Tomcat web application serving searches over our Lucene index (10GB) on a single server machine (Dual 3GHz CPU, 4GB RAM). Due to performance issues and to scale up to handle more traffic/search requests, we are getting another server machine. We are looking at two

Re: Index updates between machines

2007-04-06 Thread Chun Wei Ho
Thanks for the ideas. We are testing out the methods and changes suggested to see if they work with our current set up, and are checking if the disks are the bottleneck in this case, but feel free to drop more hints. :) At the moment we are copying the index at an offpeak hour, but we would also

Index updates between machines

2007-04-03 Thread Chun Wei Ho
We are running a search service on the internet using two machines. We have a crawler machine which crawls the web and merges new documents found into the Lucene index. We have a searcher machine which allows users to perform searches on the Lucene index. Periodically, we would copy the newest ve

Optimizing search speed & performance for a 10G Index.

2006-12-07 Thread Chun Wei Ho
Hi, We run a search engine based on Lucene 1.9.1 / Nutch 0.7.2. Our index has approximately 2 million documents and the physical size of it is about 10 GB. We run it as a tomcat web application on a Fedora Core 4 server with duo Xeon 3.2GHz processors and 4GB RAM. We receive about 46500 web sear

Classifieds rotation - weighting Lucene results by previous show frequency?

2006-08-07 Thread Chun Wei Ho
We are starting to run a small index of classifieds alongside our main search items. The classifieds are also in a lucene index. We show classifieds that match the user's search criteria, which means we do a lucene search on that index and show the top few results. We also keep track of the number

QueryFilter and Memory

2006-07-13 Thread Chun Wei Ho
Hi, I've been trying to adjust the weightings for my searches (thanks Chris for his replies on that thread), and have been using ConstantScoreQuery to even out scores from portions in my query that I want to match but not to contribute to the ranking of that result. I convert a BooleanQuery/Term

Reducing the boost for a particular Term

2006-07-10 Thread Chun Wei Ho
I have a index from which I have a number of documents from authors, but would like to drop the relevance/score for documents from one particular author using the query. That is for documents returned by querying: (content:"miracle cure"), I would like to reduce the relevancy of authorid:3024 How

Giving weight to partial matches

2006-06-21 Thread Chun Wei Ho
I am performing searches on an index that includes a title field and a content field, and return results only if either title or content matches ALL the words searched. So searching for "miracle cure for cancer" might yield: (+title:miracle +title:cure +title:for +title:cancer)^5.0 (+content:mira

Getting all the matching documents for a search

2006-06-01 Thread Chun Wei Ho
Hi, I use Hits to search for and get documents matching a particular query, e.g.: Hits hits = indexSearcher.search(new TermQuery(new Term("startswith","A"))); but it is not returning all the matching documents in the index. From experimentation it appears to return about less than half the match

Updating documents in index with some fields not stored

2006-05-10 Thread Chun Wei Ho
I would like to make some updates to values within my large index. I understand that I have to delete and re-insert each document to be changed to do that. However I do have some large fields that are unstored (only indexed and no, these are not the fields that I am wanting to change), which means

Adding a new search field but needs searching for all

2006-05-10 Thread Chun Wei Ho
I have a large Lucene index that I am planning on adding one or more search fields, and perform searches on them. How do I include results from the other documents that do not have the new field? For example, I have 10 million documents in a index, and I update 200 of them adding the field "b" =

Obtain terms for only particular field(s)

2006-05-04 Thread Chun Wei Ho
Hi, I have a pretty large index and I would like to obtain all the Terms for only one or two particular fields. As I understand - IndexReader.terms() returns a termEnum of all the terms in the index, and I would have to iterate through all of them to pick out the ones from the fields that I want

Simpler QueryParser

2006-03-20 Thread Chun Wei Ho
I am wondering if anyone has existing code for a simpler QueryParser - one that does not create the more complex prefix/fuzzy/range queries, but still allow the usual term/boolean queries. I use QueryParser to directly parse user input (allowing for more flexible specification of include/exclude a

Hardware Requirements for a large index?

2006-02-15 Thread Chun Wei Ho
Hi, I am in the process of deciding specs for a crawling machine and a searching machine (two machines), which will support merging/indexing and searching operations on a single Lucene index that may scale to about several million pages (at which it would be about 2-10 GB, assuming linear growth w

Re: Suggesting refine searches with Lucene

2006-02-13 Thread Chun Wei Ho
ull; > > > public Query getQuery() { > return query; > } > > > public void setQuery(Query query) { > this.query = query; > } > > > public String toString(){ > return query.toString(); > } > >

Suggesting refine searches with Lucene

2006-02-13 Thread Chun Wei Ho
Hi, I am trying to suggest refine searches for my Lucene search. For example, if a search turned out too many searches, it would list a number of document title subsequences that occurred frequently in the results of the previous search, as possible candidates for refining the search. Does anyone

Help: tweaking search - reducing IDF skew and implementing score cutoff

2006-02-09 Thread Chun Wei Ho
Hi, I am running a search for something akin to a news site, when each news document has a date, title, keywords/bylines, summary fields and then the actual content. Using Lucene for this database of documents, it seems that: 1. The relevancy score is skewed drastically by the actual number of ne

Distributed vs Merged Searching

2006-01-31 Thread Chun Wei Ho
I am deploying a web application serving searches on a Lucene index, and am deciding between distributing search between several machines or single searching, and was hoping that someone could tell me from their experiences: + Is there anything particular to watch out for if using distributed sear

Re: Getting the document number (with IndexReader)

2006-01-26 Thread Chun Wei Ho
Thanks for the info :) One last related question. If I delete documents using a IndexReader(), can I assume that the internal document numbers of other undeleted documents (obtained using the same IndexReader instance) will not change until I call IndexReader.close()?

Re: Getting the document number (with IndexReader)

2006-01-26 Thread Chun Wei Ho
Hi, Thanks for the help, just a few more questions: On 1/26/06, Paul Elschot <[EMAIL PROTECTED]> wrote: > On Thursday 26 January 2006 09:15, Chun Wei Ho wrote: > > I am attempting to prune an index by getting each document in turn and > > then checking/deleting it: &

Getting the document number (with IndexReader)

2006-01-26 Thread Chun Wei Ho
I am attempting to prune an index by getting each document in turn and then checking/deleting it: IndexReader ir = IndexReader.open(path); for(int i=0;i