Re: In-place processing and performance.

2012-09-19 Thread Ted Dunning
Important at this stage! On Wed, Sep 19, 2012 at 8:37 AM, Julian Hyde wrote: > I'm not saying we should do any of these. Just laying out the options. >

Re: In-place processing and performance.

2012-09-19 Thread Julian Hyde
Higher-level caches are also possible. Caching query results, or intermediate results, or on-demand creation of materialized views based on usage. OLTP databases find disk block caches useful, analytic databases generally cache at a higher level (thus saving both IO and CPU). These forms of cac

Re: In-place processing and performance.

2012-09-19 Thread zhiyuan dai
The column is less than the full table,therefore,it is more easier to be cached. 2012/9/19 Constantine Peresypkin > > Just that "disk cache" doesn't specify format. > > If the on-disk format is columnar, disk cache will also be columnar. > > On Wed, Sep 19, 2012 at 7:48 AM, Ted Dunning > wrote:

Re: In-place processing and performance.

2012-09-19 Thread Constantine Peresypkin
> Just that "disk cache" doesn't specify format. If the on-disk format is columnar, disk cache will also be columnar. On Wed, Sep 19, 2012 at 7:48 AM, Ted Dunning wrote: > On Tue, Sep 18, 2012 at 9:40 PM, Constantine Peresypkin < > pconstant...@gmail.com> wrote: > > > > Columnar cache will make

Re: In-place processing and performance.

2012-09-18 Thread Ted Dunning
On Tue, Sep 18, 2012 at 9:40 PM, Constantine Peresypkin < pconstant...@gmail.com> wrote: > > Columnar cache will make the next query fast. > > Why is that? What is the difference between columnar cache and disk cache > then? > Just that "disk cache" doesn't specify format. Again, this will be up

Re: In-place processing and performance.

2012-09-18 Thread Constantine Peresypkin
> Columnar cache will make the next query fast. Why is that? What is the difference between columnar cache and disk cache then? > Scanners will be in whatever language the authors write them in. No problem with that, I've just explained why there will be C-scanners. On Wed, Sep 19, 2012 at 7:35

Re: In-place processing and performance.

2012-09-18 Thread Ted Dunning
On Tue, Sep 18, 2012 at 6:30 PM, Constantine Peresypkin < pconstant...@gmail.com> wrote: > 1. I don't see why cache should be in columnar format. The only purpose of > Dremel columnar format is to accelerate full table scans. That's it. > The cache is to make things fast. Columnar cache will mak

Re: In-place processing and performance.

2012-09-18 Thread Constantine Peresypkin
1. I don't see why cache should be in columnar format. The only purpose of Dremel columnar format is to accelerate full table scans. That's it. 2. Scanners will be in C for performance reasons. Dremel idea = scan performance. On Wed, Sep 19, 2012 at 12:58 AM, moon soo Lee wrote: > i agree, worki

Re: In-place processing and performance.

2012-09-18 Thread moon soo Lee
i agree, working version first, and optimization later. Are there good reason that many input scanners expected in C? On Tue, Sep 18, 2012 at 12:11 PM, Ted Dunning wrote: > I also generally agree, but I really think that we need a bit of experience > with a simple working version of Drill fir

Re: In-place processing and performance.

2012-09-18 Thread moon soo Lee
I thought i made Cache and Data manipulation separated. Maybe, proposal is unclear. :-) On Tue, Sep 18, 2012 at 11:51 AM, Azuryy Yu wrote: > Thanks! > > Generally agree, but Cache and Data manipulation should be separated. every > query reach cache firstly, if not hit, then call the read data i

Re: In-place processing and performance.

2012-09-17 Thread Ted Dunning
I also generally agree, but I really think that we need a bit of experience with a simple working version of Drill first. Also, anything like this is going to have to recognize that there are likely to be multiple columnar formats and that some (many) input scanners are going to be coded in C, not

Re: In-place processing and performance.

2012-09-17 Thread Azuryy Yu
Thanks! Generally agree, but Cache and Data manipulation should be separated. every query reach cache firstly, if not hit, then call the read data interface, which cannot be included in the cache module. so everybody can replace cache policy and read/write data. then can configure drill.cache.pol

Re: In-place processing and performance.

2012-09-17 Thread moon soo Lee
Here's my quick drill's common caching framework proposal. 0. Why - While In-place processing, data format is not guaranteed the best efficient format to process (ie. columnar). - Non-columnar format can make huge performance impact. (order of magnitude) 1. Goal. - Increase perf

Re: In-place processing and performance.

2012-09-17 Thread Tomer Shiran
The plan was to have the scan operator do that kind of caching, but I agree it could make sense to have some common caching framework in case other scan operators want to cache as well. On Sun, Sep 16, 2012 at 5:29 PM, moon soo Lee wrote: > Drill want In-place processing ([1], page 12). yes, ETL

In-place processing and performance.

2012-09-16 Thread moon soo Lee
Drill want In-place processing ([1], page 12). yes, ETL is painful. In my understanding, In-place processing means the data is not always columnar. [2], Figure 10, shows performance difference between columnar and record-oriented (MR) if Dremel work with record-oriented data, I can guess that'll b