I agree it would be disruptive in the case that you outlined. This is
why we have release notes and semver, though.
I think this change should only go into a major release for downstream
stability. Even though how Accumulo creates and manages files is not
covered by our compatibility
We need to consider the scenario in which somebody has written an
application on Accumulo that uses the default compression codec. If we
change the default, their app's behavior will change when they upgrade
Accumulo, either because an existing table will start using snappy or
because their app
I like the idea of making snappy the default. However, I am concerned
about raising the barrier of entry to new users by adding yet another
dependency to install.
On Mon, Aug 15, 2016 at 11:13 AM, Josh Elser wrote:
> No, I never asserted that Snappy is *always* the better
Ok, understood. Such a change would certainly require mention in release
notes, user manual, etc.
Christopher wrote:
Yes, it's a simple matter to install the dependency... it just might not be
installed by default. I'd certainly recommend downstream vendors/packagers
add it as a required or
No, I never asserted that Snappy is *always* the better choice. I would
say that I believe Snappy is better in *most cases*.
Most users I talk to (with and without Accumulo involved) have plenty of
disk space available to them. It is rare that space on disk is actually
a concern. Instead,
If the crux of your argument was that snappy is always a better choice,
then my retort was to say it is not, since sometimes compression ratio can
be a dominant factor. Changes to defaults are disruptive for existing
users, so you need a better argument. I don't mean that you shouldn't
continue to
Yes, it's a simple matter to install the dependency... it just might not be
installed by default. I'd certainly recommend downstream vendors/packagers
add it as a required or suggested dependency to their RPMs/DEBs/etc.,
though.
The snappy package on RHEL/CentOS provides libsnappy. The
That's a fair point. I'm off in nebulous vendor land and tend to be removed
from pure Apache Hadoop artifacts. I feel like there's a snappy package (at
least on centos) which is enough, but understanding this would be good.
Is there a nonnative snappy impl?
On Aug 13, 2016 11:19 PM,
Your argument fails to address the performance benefits. I could pose the
same question back to you: you need to prove why we shouldn't use the
faster compression algorithm.
I don't mean to be snarky, but your argument is shutting down conversation.
I appreciate you sharing the opinion but don't
Perhaps there is a happy medium, though, by not necessarily defining
example configurations by the size of your memory footprint, but instead by
performance configuration? Snappy could be the default for those who want a
faster but less space cognizant implementation. Christopher's concerns
would
Native libraries for snappy are also not typically installed by default on
Linux distros. Even if the hadoop native libraries are installed, the user
is likely going to end up using the Java implementation by default, I
*think*, unless they take additional actions.
On Sat, Aug 13, 2016 at 11:18
In my experience gz gets roughly 1.5x to 2x better compression than snappy.
Snappy is definitely not a pareto improvement (although we tend to use
snappy by default). Since it's not always better I think you would need a
more solid argument to change the default.
Adam
On Aug 13, 2016 8:06 PM,
Same motivation of using it as for making it the default. I am not aware
of any downside to it. It's become pretty standard across all
installations I've worked with for years.
Asking because I am no oracle on the matter. I could just be ignorant of
some issue, but, given my current
Sorry. I wasn't clear. I understand the motivation for using it... I'm
asking about the motivation for making it the default.
Since both are available, I'm not sure the default matters *that* much, but
it could be an unexpected change for those preferring GZ.
Also, are there any risks regarding
In many cases , I would imagine faster is better .
On Sat, Aug 13, 2016, 10:56 PM Christopher wrote:
> What's the motivation for changing it?
>
> On Sat, Aug 13, 2016 at 10:47 PM Josh Elser wrote:
>
> > Any reason we don't want to do this? Last
Uhh, besides what I already mentioned? (close in compressed size but
"much" faster)
Christopher wrote:
What's the motivation for changing it?
On Sat, Aug 13, 2016 at 10:47 PM Josh Elser wrote:
Any reason we don't want to do this? Last rule-of-thumb I heard was that
What's the motivation for changing it?
On Sat, Aug 13, 2016 at 10:47 PM Josh Elser wrote:
> Any reason we don't want to do this? Last rule-of-thumb I heard was that
> snappy is often close enough in compression to GZ but quite a bit faster
> (I don't remember exactly how
Any reason we don't want to do this? Last rule-of-thumb I heard was that
snappy is often close enough in compression to GZ but quite a bit faster
(I don't remember exactly how much).
- Josh
18 matches
Mail list logo