Important at this stage!
On Wed, Sep 19, 2012 at 8:37 AM, Julian Hyde wrote:
> I'm not saying we should do any of these. Just laying out the options.
>
Higher-level caches are also possible. Caching query results, or intermediate
results, or on-demand creation of materialized views based on usage. OLTP
databases find disk block caches useful, analytic databases generally cache at
a higher level (thus saving both IO and CPU).
These forms of cac
The column is less than the full table,therefore,it is more easier to be
cached.
2012/9/19 Constantine Peresypkin
> > Just that "disk cache" doesn't specify format.
>
> If the on-disk format is columnar, disk cache will also be columnar.
>
> On Wed, Sep 19, 2012 at 7:48 AM, Ted Dunning
> wrote:
> Just that "disk cache" doesn't specify format.
If the on-disk format is columnar, disk cache will also be columnar.
On Wed, Sep 19, 2012 at 7:48 AM, Ted Dunning wrote:
> On Tue, Sep 18, 2012 at 9:40 PM, Constantine Peresypkin <
> pconstant...@gmail.com> wrote:
>
> > > Columnar cache will make
On Tue, Sep 18, 2012 at 9:40 PM, Constantine Peresypkin <
pconstant...@gmail.com> wrote:
> > Columnar cache will make the next query fast.
>
> Why is that? What is the difference between columnar cache and disk cache
> then?
>
Just that "disk cache" doesn't specify format.
Again, this will be up
> Columnar cache will make the next query fast.
Why is that? What is the difference between columnar cache and disk cache
then?
> Scanners will be in whatever language the authors write them in.
No problem with that, I've just explained why there will be C-scanners.
On Wed, Sep 19, 2012 at 7:35
On Tue, Sep 18, 2012 at 6:30 PM, Constantine Peresypkin <
pconstant...@gmail.com> wrote:
> 1. I don't see why cache should be in columnar format. The only purpose of
> Dremel columnar format is to accelerate full table scans. That's it.
>
The cache is to make things fast.
Columnar cache will mak
1. I don't see why cache should be in columnar format. The only purpose of
Dremel columnar format is to accelerate full table scans. That's it.
2. Scanners will be in C for performance reasons. Dremel idea = scan
performance.
On Wed, Sep 19, 2012 at 12:58 AM, moon soo Lee wrote:
> i agree, worki
i agree, working version first, and optimization later.
Are there good reason that many input scanners expected in C?
On Tue, Sep 18, 2012 at 12:11 PM, Ted Dunning wrote:
> I also generally agree, but I really think that we need a bit of experience
> with a simple working version of Drill fir
I thought i made Cache and Data manipulation separated.
Maybe, proposal is unclear. :-)
On Tue, Sep 18, 2012 at 11:51 AM, Azuryy Yu wrote:
> Thanks!
>
> Generally agree, but Cache and Data manipulation should be separated. every
> query reach cache firstly, if not hit, then call the read data i
I also generally agree, but I really think that we need a bit of experience
with a simple working version of Drill first.
Also, anything like this is going to have to recognize that there are
likely to be multiple columnar formats and that some (many) input scanners
are going to be coded in C, not
Thanks!
Generally agree, but Cache and Data manipulation should be separated. every
query reach cache firstly, if not hit, then call the read data interface,
which cannot be included in the cache module.
so everybody can replace cache policy and read/write data. then can
configure drill.cache.pol
Here's my quick drill's common caching framework proposal.
0. Why
- While In-place processing, data format is not guaranteed the best
efficient format to process (ie. columnar).
- Non-columnar format can make huge performance impact. (order of
magnitude)
1. Goal.
- Increase perf
The plan was to have the scan operator do that kind of caching, but I agree
it could make sense to have some common caching framework in case other
scan operators want to cache as well.
On Sun, Sep 16, 2012 at 5:29 PM, moon soo Lee wrote:
> Drill want In-place processing ([1], page 12). yes, ETL
Drill want In-place processing ([1], page 12). yes, ETL is painful.
In my understanding, In-place processing means the data is not always
columnar.
[2], Figure 10, shows performance difference between columnar and
record-oriented (MR)
if Dremel work with record-oriented data, I can guess that'll b
15 matches
Mail list logo