chinese stopwords

2010-04-09 Thread John Wang
Hi: I am using the SmartChineseAnalyzer class and it is great! Was wondering if we should have a set of chinese stopwords. The default set containts only punctuations. Thanks -John

Re: chinese stopwords

2010-04-10 Thread John Wang
blog/item/146b5c346a738c4d251f1496.html > http://download.csdn.net/source/740407 > > > On Sat, Apr 10, 2010 at 9:59 AM, John Wang wrote: > >> Hi: >> >>I am using the SmartChineseAnalyzer class and it is great! >> >>Was wondering if we should have a set o

Re: chinese stopwords

2010-04-10 Thread John Wang
Awesome, thanks! Great job of the work! -John 2010/4/10 Gao Pinker > That's a good idea, I'll think about adding another stopword-list to let > users have a chance to choose. > > > On Sat, Apr 10, 2010 at 9:25 PM, John Wang wrote: > >> Yeah, I found so

Re: FindBugs Community Review of Lucene

2010-04-13 Thread John Wang
Hi Nat: Great analysis! Some of them DO seem to be bugs! Maybe findbugs can be enabled as part of the build? -John On Tue, Apr 13, 2010 at 11:01 AM, Nat Ayewah wrote: > Hello, > > I am a PhD student working with the FindBugs project, at the University of > Maryland. FindBugs

Re: Google-developed posting list encoding

2010-04-14 Thread John Wang
This would be something that's excellent for contribution after the Flex-Indexing support is added. -John On Wed, Apr 14, 2010 at 12:22 AM, Mike Klaas wrote: > Can be quite a bit faster than vInt in some cases: > http://www.ir.uwaterloo.ca/book/addenda-06-index-compression.html > > -Mike > > --

Re: official GIT repository / switch to GIT?

2010-04-17 Thread John Wang
Hi Thomas: There is a git mirror already: http://github.com/apache/lucene All of apache projects are: http://git.apache.org/ You are free to use git. Apache is running a git-svn server somewhere, although the repository itself is not git, but you can use it as one. Hope this help

DisjunctionScorer performance

2009-01-06 Thread John Wang
Hi guys: We have been building a suite of boolean operators DocIdSets (e.g. AndDocIdSet/Iterator, OrDocIdSet/Iterator, NotDocIdSet/Iterator). We compared our implementation on the OrDocIdSetIterator (based on DisjunctionMaxScorer code) with some code tuning, and we see the performance doubled

Re: DisjunctionScorer performance

2009-01-06 Thread John Wang
, Jan 6, 2009 at 11:48 PM, Paul Elschot wrote: > On Wednesday 07 January 2009 07:36:06 John Wang wrote: > > > Hi guys: > > > > > > We have been building a suite of boolean operators DocIdSets > > > (e.g. AndDocIdSet/Iterator, OrDocIdSet/Iterator, >

Re: DisjunctionScorer performance

2009-01-06 Thread John Wang
One more thing I missed. I don't quite get your point about skip() vs next(). With or queries, skipping does not help as much comparing to and queries. -John On Tue, Jan 6, 2009 at 11:55 PM, John Wang wrote: > Paul: > >Our very simple/naive testing methodology for OrDo

Re: Realtime Search

2009-01-08 Thread John Wang
We have worked on this problem on the server level as well. We have also open sourced it at: http://code.google.com/p/zoie/ wiki on the realtime aspect: http://code.google.com/p/zoie/wiki/ZoieSystem -John On Fri, Dec 26, 2008 at 12:34 PM, Robert Engels wrote: > If you move to the "either embe

Re: Future projects

2009-04-02 Thread John Wang
Michael: I love your suggestion on 3)! This really opens doors for flexible indexing. -John On Thu, Apr 2, 2009 at 1:40 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Wed, Apr 1, 2009 at 7:05 PM, Jason Rutherglen > wrote: > > Now that LUCENE-1516 is close to being commit

Re: Future projects

2009-04-02 Thread John Wang
Just to clarify, Approach 1 and approach 2 are both currently performing ok currently for us. -John On Thu, Apr 2, 2009 at 2:41 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Thu, Apr 2, 2009 at 4:43 PM, Jason Rutherglen > wrote: > >> What does Bobo use the cached bitsets for? >

Re: Future projects

2009-04-03 Thread John Wang
By default bobo DOES use a flavor of the field cache data structure with some addition information for performance. (e.g. minDocid,maxDocid,freq per term) Bobo is architected as a platform where clients can write their own "FacetHandlers" in which each FacetHandler manages its own view of memory st

Re: [jira] Issue Comment Edited: (LUCENE-1536) if a filter can support random access API, we should use it

2009-04-20 Thread John Wang
Maybe I am not understanding the patch. But isn't casting from Filter.getDocIdSet to OpenBitSet kinda dangerous and assuming Filter constructing a Bitset something we want to move away from? -John On Mon, Apr 20, 2009 at 4:27 PM, Jason Rutherglen (JIRA) wrote: > >[ > https://issues.apache.o

perf enhancement and lucene-1345

2009-04-24 Thread John Wang
Hi Guys: A while ago I posted some enhancements to disjunction and conjunction docIdSetIterators that showed performance improvements to Lucene-1345. I think it got mixed up with another discussion on that issue. Was wondering what happened with it and what are the plans. Thanks -John

Re: perf enhancement and lucene-1345

2009-05-04 Thread John Wang
t codec. > > On the logic operators for combining DocIDSets... how do these differ > from what we already do in BooleanScorer[2]? (I haven't had a chance > to get a good look at Kamikaze yet). > > Mike > > On Fri, Apr 24, 2009 at 11:34 PM, John Wang wrote: &

Re: I wanna contribute a Chinese analyzer to lucene

2009-05-05 Thread John Wang
Hi Gao: On the google code page, can you check in the source? Thanks -John On Tue, May 5, 2009 at 2:30 AM, Gao Pinker wrote: > I have opened a new issue(http://issues.apache.org/jira/browse/LUCENE-1629) > and now creating the patch, > There are 2500 lines of code to be added cause this Chi

Re: I wanna contribute a Chinese analyzer to lucene

2009-05-05 Thread John Wang
> On Tue, May 5, 2009 at 10:47 PM, John Wang wrote: > >> Hi Gao: >> On the google code page, can you check in the source? >> >> Thanks >> >> -John >> >> >> On Tue, May 5, 2009 at 2:30 AM, Gao Pinker wrote: >> >>> I have

Re: WebLuke - include Jetty in Lucene binary distribution?

2009-06-05 Thread John Wang
Hi guys: I am interested in what is the latest decision on webluke - I downloaded the zip, tried it and love it! Does it support all Luke's functionality? (especially the plugin support) Thanks -John On Sun, Apr 27, 2008 at 7:09 AM, Uwe Schindler wrote: > Here another Servlet 2.3 com

Fwd: addIndexesNoOptimize

2009-07-05 Thread John Wang
Guys: Any thoughts? Forwarding the question from the users list after not hearing back. Thanks -John -- Forwarded message -- From: John Wang Date: Fri, Jul 3, 2009 at 3:49 PM Subject: addIndexesNoOptimize To: java-u...@lucene.apache.org Hi guys: Running into a

Re: addIndexesNoOptimize

2009-07-05 Thread John Wang
egmentInfo and what > else might go wrong if dups enter IndexWriter's segmentInfos but it'd > make me somewhat nervous removing that defensive check. > > Maybe instead we can add an addIndexesNoOptimize(IndexReader[]) (and > deprecate addIndexes(IndexReader

Re: A Comparison of Open Source Search Engines

2009-07-06 Thread John Wang
Vik did a very nice job.One thing the experiment did not mention is that Lucene handles incremental updates, whereas many of the other "competitors" do not. So the indexing performance comparison is not really fair. -John On Mon, Jul 6, 2009 at 8:06 AM, Sean Owen wrote: > > http://zooie.wordpre

Re: A Comparison of Open Source Search Engines

2009-07-06 Thread John Wang
e worth mentioning. > > There's also MG4J, which wasn't covered and has a nice algorithmic > background. > Anybody knows other interesting open-source search engines? > > On Tue, Jul 7, 2009 at 00:39, John Wang wrote: > > Vik did a very nice job. > > One t

custom indexing

2009-07-10 Thread John Wang
Hi guys: I think for lucene 3.0, there is functionality for customized indexing. I was wondering if user can create their own partition file (per segment) which contracts to merge, expungeDeletes etc.? If not, do you think this is something should be added? Thanks -John

Re: custom indexing

2009-07-10 Thread John Wang
new features compared to 2.9. > > -Michael > > > On 7/10/09 4:39 PM, John Wang wrote: > >> Hi guys: >> >>I think for lucene 3.0, there is functionality for customized indexing. >> I was wondering if user can create their own partition file (per segment) &g

custom segment files

2009-09-17 Thread John Wang
Hi guys: I am trying to figure how to add the ability to create custom segment files. Hopefully it is possible to create a plugin framework where one can provide some sort of callback to add to a segment given a doc and provide some sort of merge logic. This is in light of the flexible indexi

Re: custom segment files

2009-09-17 Thread John Wang
towards getting PforDelta working... > > However, that change doesn't [yet] do anything for norms, stored > fields nor term vectors. > > Can you describe more details about what kinds of customization you're > looking to do? > > Mike > > On Thu, Sep 17, 2009 at 10:

Re: custom segment files

2009-09-17 Thread John Wang
ger. > > On Thu, Sep 17, 2009 at 7:00 AM, John Wang wrote: > > Hi guys: > > > > I am trying to figure how to add the ability to create custom > segment > > files. Hopefully it is possible to create a plugin framework where one > can > > provide some s

Re: custom segment files

2009-09-18 Thread John Wang
Wanna include un-final'ing it in a patch? > > > Is there a wiki or some sort of write up on LUCENE-1458? > > Sorry not just yet. I agree it's badly needed... it's an enormous set > of changes at this point. I'll add a wiki page that I'll try to keep > curr

Re: ReleaseTodo steps

2009-09-21 Thread John Wang
Hi Guys: A quick comment on 2.9 release: org.apache.lucene.Weight interface has been changed to an abstract class. This is a non-backward compatible change and would break many custom Query implementations. Is this intentional? Thanks -John On Mon, Sep 21, 2009 at 8:59 PM, Uwe Schindler wr

TermCount per fiend

2009-09-21 Thread John Wang
Hi guys: Not sure if this would be a better fit on the users or the dev list. It would be very useful to be able to get term count given a field, e.g. int IndexReader.termCount(String field) Wanted to get your opinion on what is the best way to approach this. After looking th

Re: ReleaseTodo steps

2009-09-21 Thread John Wang
Thanks Mark for the clarification! -John On Mon, Sep 21, 2009 at 9:09 PM, Mark Miller wrote: > Yeah it is, sorry :( > > Check out the back compat break section in changes - its the first > section I think. > > John Wang wrote: > > Hi Guys: > > > >

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-21 Thread John Wang
s peak > from merging large segments. Using madvise would prevent usable > indexes from being swapped to disk during a merge, query > performance would continue unabated. > > As we move to a sharded model of indexes, large merges will > naturally not occur. Shards will reach a sp

Re: TermCount per fiend

2009-09-21 Thread John Wang
anging the index format. > > With LUCENE-1458 this becomes simple (it already keeps track of each > fields's terms, separately, including total number of terms for that > field). > > Mike > > On Mon, Sep 21, 2009 at 9:14 AM, John Wang wrote: > > Hi guys: > >

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-21 Thread John Wang
ou say more about what motivates your test model and where I am wrong > about your situation? > > > On Mon, Sep 21, 2009 at 4:50 PM, John Wang wrote: > >> Jason: >> >>Before jumping into any conclusions, let me describe the test setup. It >> is rather dif

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-21 Thread John Wang
LUCENE-1076 to do well. > > > is rather different from Lucene benchmark as we are testing > high updates in a realtime environment > > Lucene's benchmark allows this. NearRealtimeReaderTask is a good > place to start. > > On Mon, Sep 21, 2009 at 4:50 PM, John Wang wrote

2.9 NRT w.r.t. sorting and field cache

2009-09-21 Thread John Wang
Looking at the code, seems there is a disconnect between how/when field cache is loaded when IndexWriter.getReader() is called. Is FieldCache updated? Otherwise, are we reloading FieldCache for each reader instance? Seems for operations that lazy loads field cache, e.g. sorting, this has a signif

Re: 2.9 NRT w.r.t. sorting and field cache

2009-09-21 Thread John Wang
Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley wrote: > On Tue, Sep 22, 2009 at 12:56 AM, John Wang wrote: > > Looking at the code, seems there is a disconnect between how/when field > > cache is loaded when IndexWriter.getReader() is called. > > I'm not sure what you mea

Re: Welcome, Koji

2009-09-21 Thread John Wang
Congratulations Koji! -John On Tue, Sep 22, 2009 at 1:47 PM, Uwe Schindler wrote: > Welcome Koji! > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Koji Sekiguchi [mailto:k...@r.email.

Re: 2.9 NRT w.r.t. sorting and field cache

2009-09-22 Thread John Wang
ers) and > sends a collector over them one by one, rather than using the multireader. > So only fc for seg readers that change need to be reloaded. > > - Mark > http://www.lucidimagination.com (mobile) > > On Sep 22, 2009, at 1:27 AM, John Wang wrote: > > Hi Yonik: >

Re: 2.9 NRT w.r.t. sorting and field cache

2009-09-22 Thread John Wang
remen > http://www.thetaphi.de > eMail: u...@thetaphi.de > -- > > *From:* John Wang [mailto:john.w...@gmail.com] > *Sent:* Tuesday, September 22, 2009 9:32 AM > *To:* java-dev@lucene.apache.org > *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread John Wang
hat is the difference between how ZMP and > LogMergePolicy.setCalibrateSizeByDeletes handles deletes? > > Are the queries using Zoie or Lucene's index searcher? > > Can you explain why the Viterbi algorithm was used and how it > works in this context? > > -J > > On

Re: 2.9 NRT w.r.t. sorting and field cache

2009-09-22 Thread John Wang
ache key. Only when the deletes are merged out would they > > invalidate - but because your writing a new segment anyway ... > > > > - Mark > > > > John Wang wrote: > >> I understand what you are saying. Let me detail what I am trying to say: > >> &g

Re: 2.9 NRT w.r.t. sorting and field cache

2009-09-22 Thread John Wang
throws IOException; > } > > Now pass something in that warms the reader. Load a fieldcache - do a > search. Do the hokey pokey and turn your self around ... > > Investigation time: 5 seconds. > > John Wang wrote: > > Hi Michael: > > > > Thanks for

Re: 2.9 NRT w.r.t. sorting and field cache

2009-09-22 Thread John Wang
g in that warms the reader. Load a fieldcache - do a > > search. Do the hokey pokey and turn your self around ... > > > > Investigation time: 5 seconds. > > > > John Wang wrote: > > > >> Hi Michael: > >> > >> Thanks for the pointer! > >> >

Re: 2.9 NRT w.r.t. sorting and field cache

2009-09-22 Thread John Wang
e of the core API - the core api works per segment now. And > the IndexReaderWarmer is always passed a segmentreader from the readerPool. > > - Mark > > John Wang wrote: > > Mark: > > > > I did spend at least a quarter of an ounce. :) And I am sure Mike's > &g

SegmentReader

2009-09-22 Thread John Wang
Just realized it, thanks for making SegmentReader public!!! -John

Re: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

2009-09-26 Thread John Wang
AFAIK, application has always assume the responsibility of closing IndexReader instances. However, with 2.9, this is the first time, IndexReader can be instantiated via a getter from IndexWriter. Previously, IndexReaders are usually constructed via IndexReader.open factory method. Having a getter

Re: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

2009-09-26 Thread John Wang
Oops, I completely misunderstood the question. I thought this is about IndexReaders :) -John On Sun, Sep 27, 2009 at 11:14 AM, John Wang wrote: > AFAIK, application has always assume the responsibility of closing > IndexReader instances. > However, with 2.9, this is the first time, In

Re: q.alt matching no documents

2009-09-28 Thread John Wang
patch created for lucene: https://issues.apache.org/jira/browse/LUCENE-1931 I am not sure what the right thing to do here is to hook it into QueryParser.java. Maybe the Solr people can comment on how to hook it into Solr. -John On Mon, Sep 28, 2009 at 6:31 AM, John Wang wrote: > You

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-08 Thread John Wang
Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: John Wang (JIRA) [mailto:j...@apache.org] > > Sent: Thursday, September 24, 2009 3:14 PM > > To: java-dev@lucene

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-08 Thread John Wang
Awesome! Mike, can you let us know what the process is and the time line? Thanks -John On Thu, Oct 8, 2009 at 11:48 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > +1! > > Mike > > On Thu, Oct 8, 2009 at 2:41 PM, John Wang wrote: > > Hi guys: > >

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-08 Thread John Wang
open an issue, > etc. > > > > Probably because it's a large amount of code (I think?) you'll need to > > submit a software grant > > (http://www.apache.org/licenses/software-grant.txt). > > > > Mike > > > > On Thu, Oct 8, 2009 at 2:58 PM, John Wa

new sorting api and some perf numbers

2009-10-11 Thread John Wang
Hi guys: The new FieldComparator api looks really scary :) But after some perf testing with numbers I'd like to share, I guess it is worth it: HW: Mac Pro with 16G memory jvm: 1.6.0_13" jvm arg: -Xms1g -Xmx1g -server setup index: 1M docs even split into 8 segments (to make sure the test

lucene 2.9 sorting algorithm

2009-10-14 Thread John Wang
Hi guys: Looking at the 2.9 sorting algorithm, and while trying to understand FieldComparator class, I was wondering about the following optimization: (I am using StringOrdValComparator as an example) Currently we have 1 instance of per segment data structure, e.g. (ords,vals etc.), and we kee

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
Hi guys: I did some Big O math a few times and reached the same conclusion Jake had. I was not sure about the code tuning opportunities we could have done with the MergeAtTheEnd method as Yonik mentioned and the internal behavior with PQ Mike suggested, so I went ahead and implemented the

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
hu, Oct 15, 2009 at 2:12 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Nice results! Comments below... > > On Thu, Oct 15, 2009 at 3:58 PM, John Wang wrote: > > Hi guys: > > > > I did some Big O math a few times and reached the same conclusion &g

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
Numbers Mike requested for Int types: only the time/cputime are posted, others are all the same since the algorithm is the same. Lucene 2.9: numhits: 10 time: 14619495 cpu: 146126 numhits: 20 time: 14550568 cpu: 163242 numhits: 100 time: 16467647 cpu: 178379 my test: numHits: 10 time: 1410109

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
BTW, we are have a little sandbox for these experiments. And all my testcode are at. They are not very polished. https://lucene-book.googlecode.com/svn/trunk -John On Thu, Oct 15, 2009 at 3:29 PM, John Wang wrote: > Numbers Mike requested for Int types: > > only the time/cputime a

Re: unique-id to doc-num

2008-03-17 Thread John Wang
you can also use the payload to store the docid. michael had a posting on that a while back. -John On Sun, Mar 16, 2008 at 11:43 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : I'd like to have an up-to-date map from unique-ids to lucene internal > : doc-nums. > > Which way do you want the m

index reopen question

2008-04-09 Thread John Wang
Hi: Have been reading the 2.3.1 release code and have a few questions regarding indexReader reopen: 1) looking at the code: if (this.hasChanges || this.isCurrent()) { // the index hasn't changed - nothing to do here return this; } Shouldn't it be !this.hasChanges?

lucy progress

2008-04-17 Thread John Wang
Hi: What is the current progress on Lucy? Which version of Lucene index format is it up to? Thanks -John

Fwd: changing index format

2008-06-24 Thread John Wang
eatly appreciated. Thanks -John -- Forwarded message -- From: John Wang <[EMAIL PROTECTED]> Date: Tue, Jun 24, 2008 at 11:59 AM Subject: changing index format To: [EMAIL PROTECTED] Hi: I am trying to add couple more values to the TermInfo file and want to keep the ind

Re: Fwd: changing index format

2008-06-25 Thread John Wang
Paul Elschot <[EMAIL PROTECTED]> wrote: > Op Wednesday 25 June 2008 07:03:59 schreef John Wang: > > Hi guys: > > Perhaps I should have posted this to this list in the first > > place. > > > > I am trying to work on a patch to for each term, expose m

Re: Fwd: changing index format

2008-06-25 Thread John Wang
contracts. Given these tools, we are able to build a customized scored BooleanQuery-like query infrastructure. We'd be happy to contribute them. Thanks -John On Wed, Jun 25, 2008 at 9:29 AM, Paul Elschot <[EMAIL PROTECTED]> wrote: > Op Wednesday 25 June 2008 17:05:17 schreef John Wa

Re: BooleanQuery and DocIdSet; Was: Fwd: changing index format

2008-06-25 Thread John Wang
. Maybe I am misunderstanding the point o the question. Thanks -John On Wed, Jun 25, 2008 at 10:32 AM, Paul Elschot <[EMAIL PROTECTED]> wrote: > Op Wednesday 25 June 2008 18:45:16 schreef John Wang: > > Hi Paul: > > Regarding to your comment on adding required/prohibite

Re: changing index format

2008-07-03 Thread John Wang
Hi Michael: What is the plan/timeline for supporting flexible indexing? Thanks -John On Wed, Jun 25, 2008 at 3:40 AM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > John Wang wrote: > > The problem I am having is stated below, I don't know how to add t

Re: docid set compression and boolean docid set operations

2008-09-10 Thread John Wang
Sorry, I meant lucene 2.4 -John On Wed, Sep 10, 2008 at 2:08 PM, John Wang <[EMAIL PROTECTED]> wrote: > Hi guys: > > We have build this on top of the lucene 1.4. api/refactoring for docid > sets and docIdIterater. > > We've implemented the p4Delta compr

docid set compression and boolean docid set operations

2008-09-10 Thread John Wang
Hi guys: We have build this on top of the lucene 1.4. api/refactoring for docid sets and docIdIterater. We've implemented the p4Delta compression algorithm presented at www2008: http://www2008.org/papers/fp618.html We've been using this in production here at LinkedIn and would lov

Re: 2.4 status

2008-09-10 Thread John Wang
Looking forward to 2.4! -John On Tue, Sep 9, 2008 at 2:38 AM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > OK we are gradually whittling down the list. It's down to 9 issues now. > > I have 2 issues, Grant has 3, Otis has 2 and Mark and Karl have 1 each. > > Can each of you try to finish

Re: docid set compression and boolean docid set operations

2008-09-11 Thread John Wang
y, the existing Lucene TermDocs and TermPositions > appear to be just right for this. > > Regards, > Paul Elschot > > > Op Wednesday 10 September 2008 23:09:18 schreef John Wang: > > Sorry, I meant lucene 2.4 > > > > -John > > > > On Wed, Sep 10, 20

Re: docid set compression and boolean docid set operations

2008-09-16 Thread John Wang
Paul and Eks: Because this is being hosted on code.google.com, it requires proj. members to have gmail accts. Can you guys send me yours? Also, we are developing on the BR_DEV_1_0_4branch. Thanks -John On Mon, Sep 15, 2008 at

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-02 Thread John Wang
common enough, and makes sense to me in certain instances. > > - Mark > > > > On Dec 2, 2008, at 6:30 PM, "John Wang (JIRA)" <[EMAIL PROTECTED]> wrote: > > >> [ >> https://issues.apache.org/jira/browse/LUCENE-1473?page=c

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-02 Thread John Wang
e part volunteer effort. > > - Mark > > On Dec 2, 2008, at 7:02 PM, "John Wang" <[EMAIL PROTECTED]> wrote: > > I have described our use-case in good detail. I think it is a common > architecture. And we are not using RemoteSearcher. This problem is not tied > t

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-03 Thread John Wang
either accepted or rejected. I just hope the committers would be "calm" enough to be able to see criticisms for what they are. I am a strong advocate of Lucene, hence my passion for its success. -John On Wed, Dec 3, 2008 at 10:07 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: &

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-03 Thread John Wang
You are right, we can always transmit the string form and re-parse on the other-end. Our problem is that we took this (serialization nature) for granted, and once something is deployed over a cluster, it would be difficult to do partial roll-outs in this case. But I guess there is no immediate reme

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-03 Thread John Wang
Grant: I am sorry that I disagree with some points: 1) "I think it's a sign that Lucene is pretty stable." - While lucene is a great project, especially with 2.x releases, great improvements are made, but do we really have a clear picture on how lucene is being used and deployed. While luc

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-03 Thread John Wang
omething scalable that i can customize to my peculiarities. > > so i think i fit in your 10% and im not stressing on either scalability or > api. > > thanks, > robert > > > On Thu, Dec 4, 2008 at 12:36 AM, John Wang <[EMAIL PROTECTED]> wrote: > >> Grant: >>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-03 Thread John Wang
at 11:03 PM, Robert Muir <[EMAIL PROTECTED]> wrote: > > > On Thu, Dec 4, 2008 at 1:24 AM, John Wang <[EMAIL PROTECTED]> wrote: > >> Nice! >> Some questions: >> >> 1) one index? >> > no, but two individual ones today were around 100M docs

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-03 Thread John Wang
iters shared my opinion > about usfulness and the priority of this patch, it could have been > different. If all commiters were busy with private agenda and had higher > priorities at that moment, well, that would habe been bad luck for me. No > hard feelings even in that case, why shou

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-03 Thread John Wang
ke sense > (redundant) for my Similarity. everything is pretty much i/o bound now so if > tehre is some throughput issue i will look into SSD for high volume indexes. > > i posted on Use Cases on the wiki how I made fuzzy and regex fast if you > are curious. > > > On Thu, Dec

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-04 Thread John Wang
> the commercial alternatives i have fought with. lucene was evaluated a while > ago before 2.3 and this was not the case, but I re-evaluated around 2.3 > release and it is now. > > > On Thu, Dec 4, 2008 at 2:45 AM, John Wang <[EMAIL PROTECTED]> wrote: > >> Thanks Rob

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-04 Thread John Wang
issues (I am not smart enough to provide any earth shattering patches) that has blown out of proportion in my mind. I will try to keep my mouth shut in the future. -John On Thu, Dec 4, 2008 at 5:24 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > On Dec 4, 2008, at 12:

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-04 Thread John Wang
Hi Grant: I agree and I apologize for hijacking this thread. If Luceners feel our criticisms are invalid, then so be it. We should focus on this issue, being the serialization story in Lucene. Not general java serialization, so I don't see how it would benefit to move this to the java de

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-05 Thread John Wang
let us know and we can close the bug and terminate the thread. Thanks -John On Fri, Dec 5, 2008 at 9:18 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > John Wang wrote: > >> Thus we are enforcing users that care about Serialization to use the >> release jar. >> > &

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-05 Thread John Wang
We are happy to accept whatever you guys think on this issue. As it is currently, it is not consistent amongst different committers. Thanks -John On Fri, Dec 5, 2008 at 12:07 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > John Wang wrote: > > My proposal is to add

Re: Java logging in Lucene

2008-12-05 Thread John Wang
I thought the main point is to avoid a jar dependency. If we were to depend on a jar for logging, then why not log4j or commons-logging? -John On Fri, Dec 5, 2008 at 1:00 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > Shai Erera wrote: > >> Perhaps instead of introducing Java logging then (if yo

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-05 Thread John Wang
Works for me. Thanks -John On Fri, Dec 5, 2008 at 1:23 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > John Wang wrote: > >> This has been gone back and forth on this thread already. Again, I >> agree it is not the perfect solution. I am comparing that to the curr

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
t; Mike > > On Thu, Oct 15, 2009 at 6:33 PM, John Wang wrote: > > BTW, we are have a little sandbox for these experiments. And all my > testcode > > are at. They are not very polished. > > > > https://lucene-book.googlecode.com/svn/trunk > > > > -John

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
Hi Michael: I added classes: ScoreDocComparatorQueue and OneSortNoScoreCollector as a more general case. I think keeping the old api for ScoreDocComparator and SortComparatorSource would work. Please take a look. Thanks -John On Thu, Oct 15, 2009 at 6:52 PM, John Wang wrote: >

Re: lucene 2.9 sorting algorithm

2009-10-16 Thread John Wang
anks John; I'll have a look. > > Mike > > On Fri, Oct 16, 2009 at 12:57 AM, John Wang wrote: > > Hi Michael: > > I added classes: ScoreDocComparatorQueue and OneSortNoScoreCollector > as > > a more general case. I think keeping the old api for ScoreDocCompa

Re: 2.9.1

2009-10-17 Thread John Wang
In DirectoryReader$MultiTermDocs implementation:in method: protected TermDocs termDocs(IndexReader reader) return term==null ? reader.termDocs(null) : reader.termDocs(); Is this correct? Shouldn't it be: return term==null ? reader.termDocs() : reader.termDocs(term); Thanks -John On Sat, Oc

Re: 2.9.1

2009-10-17 Thread John Wang
ke > > On Sat, Oct 17, 2009 at 1:09 PM, John Wang wrote: > > In DirectoryReader$MultiTermDocs implementation: > > in method: protected TermDocs termDocs(IndexReader reader) > > return term==null ? reader.termDocs(null) : reader.termDocs(); > > Is this cor

Re: 2.9.1

2009-10-17 Thread John Wang
Hi guys: Maybe it is not a big deal. But I would still like to know why in MultiTermDocs, if term is not null, termDocs(term) is not called, rather termDocs() is. Thanks -John On Sat, Oct 17, 2009 at 10:16 AM, John Wang wrote: > Oh ok. I was thinking that if term is not null, termD

Re: 2.9.1

2009-10-18 Thread John Wang
ah! Thanks Yonik! -John On Sun, Oct 18, 2009 at 6:32 AM, Yonik Seeley wrote: > On Sun, Oct 18, 2009 at 1:43 AM, John Wang wrote: > > Maybe it is not a big deal. But I would still like to know why in > > MultiTermDocs, if term is not null, termDocs(term) is not called, rath

Re: lucene 2.9 sorting algorithm

2009-10-19 Thread John Wang
, Michael McCandless < luc...@mikemccandless.com> wrote: > Oh, no problem... > > Mike > > On Fri, Oct 16, 2009 at 12:33 PM, John Wang wrote: > > Mike, just a clarification on my first perf report email. > > The first section, numHits is incorrectly labeled, it should

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread John Wang
Hi guys: I am not suggesting just simply changing the deprecated signatures. There are some work to be done of course. In the beginning of the thread, we discussed two algorithms (both handling per-segment field loading), and at the conclusion, (to be still verified by Mike) that both algorithm

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread John Wang
Sorry, mistyped again, we have a multivalued field of STRINGS, no integers. -John On Tue, Oct 20, 2009 at 8:55 AM, John Wang wrote: > Hi guys: > I am not suggesting just simply changing the deprecated signatures. > There are some work to be done of course. In the beginning of the t

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread John Wang
rk, plus a simple python wrapper to run old/new tests > across different queries, sort, topN, etc. > > But I got different results... MultiPQ looks generally slower than > SinglePQ. So I think we now need to reconcile what's different > between our tests. > > Mike > >

Re: lucene 2.9 sorting algorithm

2009-10-21 Thread John Wang
I lot though, no? -John On Wed, Oct 21, 2009 at 3:11 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Tue, Oct 20, 2009 at 11:55 AM, John Wang wrote: > > > the simpler api places less restriction on the type of custom > > sorting that can be done. > >

  1   2   3   >