Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Benson Margulies
OK, I'm slow but I'm getting there. A funny wrapping Codec would require messing with how codecs come into being, and it's too late to do that without a lot of changes. On the other hand, the insides of the D-P-F could be used accomplish the same thing out on the filter reader. ---

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Benson Margulies
Interested / easily amused parties are welcomed to observe the proceedings in https://github.com/apache/lucene-solr/pull/44. It's a PR _only_ to offer visibility! So far, I've got a 'delegating codec' that interposes the direct posting idea atop any other codec. Next comes the filtering. I'm not s

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Benson Margulies
On Mon, Apr 7, 2014 at 5:32 PM, Alan Woodward wrote: > Does FilterDirectoryReader do what you want? > https://lucene.apache.org/core/4_7_1/core/org/apache/lucene/index/FilterDirectoryReader.html Yes, indeed, precisely what the doctor ordered. > > Alan Woodward > www.flax.co.uk > > > On 7 Apr 201

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Alan Woodward
Does FilterDirectoryReader do what you want? https://lucene.apache.org/core/4_7_1/core/org/apache/lucene/index/FilterDirectoryReader.html Alan Woodward www.flax.co.uk On 7 Apr 2014, at 22:19, Benson Margulies wrote: > Typically, an app gets a directory reader, which is a composite > reader. T

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Benson Margulies
Typically, an app gets a directory reader, which is a composite reader. To get a filter down there into the leaves of the composite reader, does anyone have a suggestion about where to enter the modularity? I sort of want to insert myself at org.apache.lucene.index.StandardDirectoryReader#open(org

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Benson Margulies
Eventually, I'll care about how to set this up in Solr. For now I think I'll see if I can figure out the luceneutils benchmark. On Mon, Apr 7, 2014 at 1:02 PM, Shai Erera wrote: > Given that DPF delegates indexing to another PF anyway (currently Lucene41), > I think this might be the case. We w

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Shai Erera
Given that DPF delegates indexing to another PF anyway (currently Lucene41), I think this might be the case. We would need to test of course. The key point is that this FilterAtomicReader will be able to serve anything as direct, even DV, so it might eliminate DVF too. We need to experiment and ben

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread david.w.smi...@gmail.com
Aaaah, nice idea to simply use FilterAtomicReader -- of course! So this would ultimately be a new IndexReaderFactory that creates FilterAtomicReaders for a subset of the fields you want to do this on. Cool! With that, I don't think there would be a need for DirectPostingsFormat as a postings for

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Shai Erera
The only problem is how the Codec makes a dynamic decision on whether to use the wrapped Codec for reading vs pre-load data into in-memory structures, because Codecs are loaded through reflection by the SPI loading mechanism. There is also a TODO in DirectPF to allow wrapping arbitrary PFs, just m

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread david.w.smi...@gmail.com
Benson, I like your idea. I think your idea can be achieved as a codec, one that wraps another codec that establishes the on-disk format. By default the wrapped codec can be Lucene's default codec. I think, if implemented, this would be a change to DPF instead of an additional DPF-variant codec.

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Benson Margulies
On Mon, Apr 7, 2014 at 9:30 AM, Robert Muir wrote: > On Mon, Apr 7, 2014 at 9:22 AM, Benson Margulies > wrote: >> On Mon, Apr 7, 2014 at 9:14 AM, Robert Muir wrote: >>> On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies >>> wrote: >>> My takeaway from the prior conversation was that

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Robert Muir
On Mon, Apr 7, 2014 at 9:22 AM, Benson Margulies wrote: > > I don't want to have to declare my intentions at the time I create > the index. I don't want to have to use D-P-F for all readers all the > time. Because I want to be able to decide to open up an index with an > arbitrary on-disk format

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Robert Muir
On Mon, Apr 7, 2014 at 9:22 AM, Benson Margulies wrote: > On Mon, Apr 7, 2014 at 9:14 AM, Robert Muir wrote: >> On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies >> wrote: >> >>> >>> My takeaway from the prior conversation was that various people didn't >>> entirely believe that I'd seen a dram

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Benson Margulies
On Mon, Apr 7, 2014 at 9:14 AM, Robert Muir wrote: > On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies > wrote: > >> >> My takeaway from the prior conversation was that various people didn't >> entirely believe that I'd seen a dramatic improvement in query perfo >> using D-P-F, and so would not

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Robert Muir
On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies wrote: > > My takeaway from the prior conversation was that various people didn't > entirely believe that I'd seen a dramatic improvement in query perfo > using D-P-F, and so would not smile upon a patch intended to liberate > D-P-F from codecs. I

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Benson Margulies
If you look at people.apache.org:~bimargulies/dpf-bench.log (http://people.apache.org/bimargulies/dpf-bench.log should also work), you'll see the results of a luceneutil run that compares DPF to 'normal' on the 10M wikipedia case. Some things are better, some are worse, some are the same. The clai

Re: Anticipating a benchmark for direct posting format

2014-04-03 Thread Benson Margulies
On Thu, Apr 3, 2014 at 11:37 AM, Michael McCandless wrote: > Is the benchmark just trying to measure speedups by using DirectPF vs > the default PF? You could do this today w/ luceneutil (using > Wikipedia as content). > > But if you have another content source / index, I'm happy to run the > ben

Re: Anticipating a benchmark for direct posting format

2014-04-03 Thread Michael McCandless
Is the benchmark just trying to measure speedups by using DirectPF vs the default PF? You could do this today w/ luceneutil (using Wikipedia as content). But if you have another content source / index, I'm happy to run the benchmark. It'd be easier to make the content available (CSV, or line doc

Anticipating a benchmark for direct posting format

2014-04-03 Thread Benson Margulies
Some of you may recall that I started a thread some time ago about wishing for the benefits of the direct posting format without needing to use a codec. The thread landed as a challenge: show a benchmark of the benefit of D-P-F. After a lot of distraction, I'm now in a position to build it. The co