Time to test 1.5.4?

2015-08-19 Thread Josh Elser
I pushed https://issues.apache.org/jira/browse/ACCUMULO-3946 tonight (thanks again, James). The merge to 1.6 and onwards wasn't easy, so I would appreciate if someone could spot-check this for me. I _think_ this (and the previous audit issue) were the only things we wanted to get into 1.5? A

Re: HBase and Accumulo

2015-08-19 Thread Ted Malaska
That being send what is the use case that u feel you need a nosql solution for? On Aug 19, 2015 6:54 PM, "Ted Malaska" wrote: > I'm on the side of benchmarking for the use case and with an expert. > There a so many ways to cheat a benchmark. And the bench mark may not be > anything like your use

Re: HBase and Accumulo

2015-08-19 Thread Ted Malaska
I'm on the side of benchmarking for the use case and with an expert. There a so many ways to cheat a benchmark. And the bench mark may not be anything like your use case. On Aug 19, 2015 5:43 PM, "Andrew Purtell" wrote: > I think someone who uses third party benchmarks to assess a system like >

Re: HBase and Accumulo

2015-08-19 Thread Josh Elser
Ah right, I did forgot about that paper. Thanks for clarifying. Big +1 to Andy's comments, too. Jeremy Kepner wrote: Turning off the walog was mostly to shorten the benchmarking cycle (it allowed us to go from zero to peak ingest in a few seconds). BAH got pretty much the same performance resu

Re: HBase and Accumulo

2015-08-19 Thread Andrew Purtell
I think someone who uses third party benchmarks to assess a system like HBase or Accumulo (or Cassandra...) is taking a foolish shortcut, so perhaps we must agree to disagree. On Wed, Aug 19, 2015 at 2:34 PM, Jeremy Kepner wrote: > I agree, that performance on real apps is the most important fo

Re: HBase and Accumulo

2015-08-19 Thread Jeremy Kepner
I agree, that performance on real apps is the most important for any particular organization, but as technologists how do we measure ourselves? Hence imperfect benchmarking remains our only recourse. On Wed, Aug 19, 2015 at 12:34:44PM -0700, Andrew Purtell wrote: > I can't speak for anyone other t

Re: HBase and Accumulo

2015-08-19 Thread Jeremy Kepner
Turning off the walog was mostly to shorten the benchmarking cycle (it allowed us to go from zero to peak ingest in a few seconds). BAH got pretty much the same performance results in their paper, it just took longer for their experiments to run. So, in this case, we had two different teams doing

Re: HBase and Accumulo

2015-08-19 Thread Ted Malaska
Hbase region splits can be done through a variety of strategies. Data size can be a component in those strategies. There's no hard and fast rule of how large a region can be. There's some tradeoffs with larger or smaller region sizes. A region split strategy will depend upon a number of factors. Me

Re: HBase and Accumulo

2015-08-19 Thread Andrew Purtell
I can't speak for anyone other than myself in the HBase community, but I'm much more interested and focused on performance analysis and developing/deploying for the use cases of my employer than participating in generic bench-marketing to make weapons for happy OSS warriors. Perhaps this does a dis

Re: HBase and Accumulo

2015-08-19 Thread Josh Elser
Alright, I have to ask... are you referring to the paper that cites Accumulo performance without write-ahead logs enabled? I have some serious reservations about the relevance of that paper to this conversation and just want to make sure people aren't led astray by what the actual takeaway shou

Re: HBase and Accumulo

2015-08-19 Thread Christopher
Forgive my ignorance about HBase, but wouldn't size of records count, also? Your response seems to imply that number of records is what matters for how many regions are needed. For what it's worth, Accumulo's tablets are split based on storage size, not number of records. I assumed the same was tru

Re: HBase and Accumulo

2015-08-19 Thread Jeremy Kepner
A big difference between Accumulo and HBase is the published performance numbers. The Accumulo community has done a good job of continuing to publish up-to-date performance numbers in peer-reviewed venues which allow Accumulo to claim best in the world performance. The HBase community hasn't be

Re: HBase and Accumulo

2015-08-19 Thread Ted Malaska
Sorry Type-o So there might be issues when you pass the Quadrillion. But Like I said never ran into that issue of region limits. On Wed, Aug 19, 2015 at 2:29 PM, Ted Malaska wrote: > Sorry 10 billion a day so that is 7 Trillion records. So many issues > around 1000 Trillion > > On Wed, Aug 19

Re: HBase and Accumulo

2015-08-19 Thread Ted Malaska
Yeah is you have more then a Quadrillion records in you design let me know I would love to help out. Ted Malaska On Wed, Aug 19, 2015 at 2:30 PM, Josh Elser wrote: > Like I've said many times now, it's relative to your actual problem. If > you don't have that much data (or intend to grow into t

Re: HBase and Accumulo

2015-08-19 Thread Josh Elser
He didn't ask just about security, FWIW "I am looking for real gap comparing HBase to Accumulo if there is any so that I can be prepared to address them. This is not limited to the security area." Sean Busbey wrote: Let's please stick to the topic Jerry asked about: security features. We ca

Re: HBase and Accumulo

2015-08-19 Thread dlmarion
"I am looking for real gap comparing HBase to Accumulo if there is any so that I can be prepared to address them. This is not limited to the security area. There are differences in some features and implementations. But they don't see like real 'gaps'." He asked about gaps, but not feature and

Re: HBase and Accumulo

2015-08-19 Thread William Slacum
If you drew a Venn diagram of HBase features compared to Accumulo features, it's pretty much going to be a single circle. If you want performance anecdotes, the most succinct summary I've seen is that Accumulo can handle heavier write loads whereas HBase will handle heavier read loads. From these

Re: HBase and Accumulo

2015-08-19 Thread Sean Busbey
Let's please stick to the topic Jerry asked about: security features. We can get into all sorts of discussions around scalability and read/write performance in a different joint thread if folks want. We all have lots of Opinions (and the YCSB community would love to see more of y'all show up to im

Re: HBase and Accumulo

2015-08-19 Thread Ted Malaska
I've been doing HBase for a long time and never had an issue with region count limits and I have clusters with 10s of billions of records. Many there would be issues around a couple Trillion records, but never got that high yet. Ted Malaska On Wed, Aug 19, 2015 at 2:24 PM, Josh Elser wrote: >

Re: HBase and Accumulo

2015-08-19 Thread Ted Malaska
Sorry 10 billion a day so that is 7 Trillion records. So many issues around 1000 Trillion On Wed, Aug 19, 2015 at 2:28 PM, Ted Malaska wrote: > I've been doing HBase for a long time and never had an issue with region > count limits and I have clusters with 10s of billions of records. Many > th

Re: HBase and Accumulo

2015-08-19 Thread Josh Elser
Like I've said many times now, it's relative to your actual problem. If you don't have that much data (or intend to grow into that much data), it's not an issue. Obviously, this is the case for you. However, it is an architectural difference between the two projects with known limitations for

Re: HBase and Accumulo

2015-08-19 Thread Josh Elser
Oh, one other thing that I should mention (was prompted off-list). (definition time since cross-list now: HBase regions == Accumulo tablets) Accumulo will handle many more regions than HBase does now due to a splittable metadata table. While I was told this was a very long and arduous journey

Re: HBase and Accumulo

2015-08-19 Thread Sean Busbey
+dev@accumulo (Though Josh and I are on this list, some other folks on dev@accumulo might have opinions) Hi Jerry! Do you have constraints on which version(s) of HBase and Accumulo you're comparing? Are you looking for currently shipping or for some expected future date? In very broad strokes: