Re: Snappy as default table.file.compress.type?

2016-08-15 Thread Josh Elser
I agree it would be disruptive in the case that you outlined. This is why we have release notes and semver, though. I think this change should only go into a major release for downstream stability. Even though how Accumulo creates and manages files is not covered by our compatibility

Re: Snappy as default table.file.compress.type?

2016-08-15 Thread Adam Fuchs
We need to consider the scenario in which somebody has written an application on Accumulo that uses the default compression codec. If we change the default, their app's behavior will change when they upgrade Accumulo, either because an existing table will start using snappy or because their app

Re: Snappy as default table.file.compress.type?

2016-08-15 Thread Michael Wall
I like the idea of making snappy the default. However, I am concerned about raising the barrier of entry to new users by adding yet another dependency to install. On Mon, Aug 15, 2016 at 11:13 AM, Josh Elser wrote: > No, I never asserted that Snappy is *always* the better

Re: Snappy as default table.file.compress.type?

2016-08-15 Thread Josh Elser
Ok, understood. Such a change would certainly require mention in release notes, user manual, etc. Christopher wrote: Yes, it's a simple matter to install the dependency... it just might not be installed by default. I'd certainly recommend downstream vendors/packagers add it as a required or

Re: Snappy as default table.file.compress.type?

2016-08-15 Thread Josh Elser
No, I never asserted that Snappy is *always* the better choice. I would say that I believe Snappy is better in *most cases*. Most users I talk to (with and without Accumulo involved) have plenty of disk space available to them. It is rare that space on disk is actually a concern. Instead,

Re: Snappy as default table.file.compress.type?

2016-08-13 Thread Adam Fuchs
If the crux of your argument was that snappy is always a better choice, then my retort was to say it is not, since sometimes compression ratio can be a dominant factor. Changes to defaults are disruptive for existing users, so you need a better argument. I don't mean that you shouldn't continue to

Re: Snappy as default table.file.compress.type?

2016-08-13 Thread Christopher
Yes, it's a simple matter to install the dependency... it just might not be installed by default. I'd certainly recommend downstream vendors/packagers add it as a required or suggested dependency to their RPMs/DEBs/etc., though. The snappy package on RHEL/CentOS provides libsnappy. The

Re: Snappy as default table.file.compress.type?

2016-08-13 Thread Josh Elser
That's a fair point. I'm off in nebulous vendor land and tend to be removed from pure Apache Hadoop artifacts. I feel like there's a snappy package (at least on centos) which is enough, but understanding this would be good. Is there a nonnative snappy impl? On Aug 13, 2016 11:19 PM,

Re: Snappy as default table.file.compress.type?

2016-08-13 Thread Josh Elser
Your argument fails to address the performance benefits. I could pose the same question back to you: you need to prove why we shouldn't use the faster compression algorithm. I don't mean to be snarky, but your argument is shutting down conversation. I appreciate you sharing the opinion but don't

Re: Snappy as default table.file.compress.type?

2016-08-13 Thread Marc P.
Perhaps there is a happy medium, though, by not necessarily defining example configurations by the size of your memory footprint, but instead by performance configuration? Snappy could be the default for those who want a faster but less space cognizant implementation. Christopher's concerns would

Re: Snappy as default table.file.compress.type?

2016-08-13 Thread Christopher
Native libraries for snappy are also not typically installed by default on Linux distros. Even if the hadoop native libraries are installed, the user is likely going to end up using the Java implementation by default, I *think*, unless they take additional actions. On Sat, Aug 13, 2016 at 11:18

Re: Snappy as default table.file.compress.type?

2016-08-13 Thread Adam Fuchs
In my experience gz gets roughly 1.5x to 2x better compression than snappy. Snappy is definitely not a pareto improvement (although we tend to use snappy by default). Since it's not always better I think you would need a more solid argument to change the default. Adam On Aug 13, 2016 8:06 PM,

Re: Snappy as default table.file.compress.type?

2016-08-13 Thread Josh Elser
Same motivation of using it as for making it the default. I am not aware of any downside to it. It's become pretty standard across all installations I've worked with for years. Asking because I am no oracle on the matter. I could just be ignorant of some issue, but, given my current

Re: Snappy as default table.file.compress.type?

2016-08-13 Thread Christopher
Sorry. I wasn't clear. I understand the motivation for using it... I'm asking about the motivation for making it the default. Since both are available, I'm not sure the default matters *that* much, but it could be an unexpected change for those preferring GZ. Also, are there any risks regarding

Re: Snappy as default table.file.compress.type?

2016-08-13 Thread Marc P.
In many cases , I would imagine faster is better . On Sat, Aug 13, 2016, 10:56 PM Christopher wrote: > What's the motivation for changing it? > > On Sat, Aug 13, 2016 at 10:47 PM Josh Elser wrote: > > > Any reason we don't want to do this? Last

Re: Snappy as default table.file.compress.type?

2016-08-13 Thread Josh Elser
Uhh, besides what I already mentioned? (close in compressed size but "much" faster) Christopher wrote: What's the motivation for changing it? On Sat, Aug 13, 2016 at 10:47 PM Josh Elser wrote: Any reason we don't want to do this? Last rule-of-thumb I heard was that

Re: Snappy as default table.file.compress.type?

2016-08-13 Thread Christopher
What's the motivation for changing it? On Sat, Aug 13, 2016 at 10:47 PM Josh Elser wrote: > Any reason we don't want to do this? Last rule-of-thumb I heard was that > snappy is often close enough in compression to GZ but quite a bit faster > (I don't remember exactly how

Snappy as default table.file.compress.type?

2016-08-13 Thread Josh Elser
Any reason we don't want to do this? Last rule-of-thumb I heard was that snappy is often close enough in compression to GZ but quite a bit faster (I don't remember exactly how much). - Josh