Re: Standalone == Dev Only?

2015-03-16 Thread Michael Segel
I guess the old adage is true. If you only have a hammer, then every problem looks like a nail. As an architect, its your role to find the right tools to be used to solve the problem in the most efficient and effective manner. So the first question you need to ask is if HBase is the right too

Re: Standalone == Dev Only?

2015-03-13 Thread Sean Busbey
On Fri, Mar 13, 2015 at 2:41 PM, Michael Segel wrote: > > In stand alone, you’re writing to local disk. You lose the disk you lose > the data, unless of course you’ve raided your drives. > Then when you lose the node, you lose the data because its not being > replicated. While this may not be a m

Re: Standalone == Dev Only?

2015-03-13 Thread Michael Segel
Joseph, In stand alone, you’re writing to local disk. You lose the disk you lose the data, unless of course you’ve raided your drives. Then when you lose the node, you lose the data because its not being replicated. While this may not be a major issue or concern… you have to be aware of it’s

Re: Standalone == Dev Only?

2015-03-13 Thread Rose, Joseph
Michael, Thanks for your concern. Let me ask a few questions, since you’re implying that HDFS is the only way to reduce risk and ensure security, which is not the assumption under which I’ve been working. A brief rundown of our problem’s characteristics, since I haven’t really described what we’r

Re: Standalone == Dev Only?

2015-03-13 Thread Michael Segel
Guys, More than just needing some love. No HDFS… means data at risk. No HDFS… means that stand alone will have security issues. Patient Data? HINT: HIPPA. Please think your design through and if you go w HBase… you will want to build out a small cluster. > On Mar 10, 2015, at 6:16 PM, Nic

Re: Standalone == Dev Only?

2015-03-10 Thread Nick Dimiduk
As Stack and Andrew said, just wanted to give you fair warning that this mode may need some love. Likewise, there are probably alternative that run a bit lighter weight, though you flatter us with the reminder of the long feature list. I have no problem with helping to fix and committing fixes to

Re: Standalone == Dev Only?

2015-03-10 Thread Alex Baranau
On: - Future investment in a design that scales better Indeed, designing against key value store is different from designing against RDBMs. I wonder if you explored an option to abstract the storage layer and using "single node purposed" store until you grow enough to switch to another one? E.g

Re: Standalone == Dev Only?

2015-03-10 Thread Rose, Joseph
Sorry, never answered your question about versions. I have 1.0.0 version of hbase, which has hadoop-common 2.5.1 in its lib folder. -j On 3/10/15, 11:36 AM, "Rose, Joseph" wrote: >I tried it and it does work now. It looks like the interface for >hadoop.fs.Syncable changed in March, 2012 to re

Re: Standalone == Dev Only?

2015-03-10 Thread Rose, Joseph
I tried it and it does work now. It looks like the interface for hadoop.fs.Syncable changed in March, 2012 to remove the deprecated sync() method and define only hsync() instead. The same committer did the right thing and removed sync() from FSDataOutputStream at the same time. The remaining hsync(

Re: Standalone == Dev Only?

2015-03-08 Thread Michael Segel
You’re dealing with patient data which is either very structured or semi-structured where you can use an RDBMs if you really think about your schema. If you want an RDBMs that can be used to hold objects, look at Informix’s IDS which is now IBM’s IDS. It contains the extensibility that you cou

Re: Standalone == Dev Only?

2015-03-06 Thread Stack
On Fri, Mar 6, 2015 at 1:50 PM, Rose, Joseph < joseph.r...@childrens.harvard.edu> wrote: > So, I think Nick, St.Ack and Wilm have all made some excellent points, but > this last email more or less hit it on the head. Like I said, I¹m working > with patient data and while the volume is small now, i

Re: Standalone == Dev Only?

2015-03-06 Thread Andrew Purtell
... And if you have at most "small data" at this stage, you might be able to cut the heap sizes of the HDFS daemons in half. On Fri, Mar 6, 2015 at 2:18 PM, Andrew Purtell wrote: > > I think the final issue with hadoop-common (re: unimplemented sync for local > filesystems) is the one showstoppe

Re: Standalone == Dev Only?

2015-03-06 Thread Andrew Purtell
> I think the final issue with hadoop-common (re: unimplemented sync for local filesystems) is the one showstopper for us. Although the unnecessary overhead would be significant, you could run a stripped down HDFS stack on the VM. Give the NameNode, SecondaryNameNode, and DataNode 1GB of heap only

Re: Standalone == Dev Only?

2015-03-06 Thread Rose, Joseph
So, I think Nick, St.Ack and Wilm have all made some excellent points, but this last email more or less hit it on the head. Like I said, I¹m working with patient data and while the volume is small now, it¹s not going to stay that way. And the cell-level security is a *huge* win ‹ I¹m sure you folks

Re: Standalone == Dev Only?

2015-03-06 Thread Wilm Schumacher
Hi, Am 06.03.2015 um 19:18 schrieb Stack: > Why not use an RDBMS then? When I first read the hbase documentation I also stumbled about the "only use for large datasets" or "standalone only in dev mode" etc. In my point of view there are some arguments against RDBMSs and for e.g. hbase, although w

Re: Standalone == Dev Only?

2015-03-06 Thread Stack
On Tue, Mar 3, 2015 at 7:32 AM, Rose, Joseph < joseph.r...@childrens.harvard.edu> wrote: > Folks, > > I’m new to HBase (but not new to these sorts of data stores.) I think > HBase would be a good fit for a project I’m working on, except for one > thing: the amount of data we’re talking about, here

Re: Standalone == Dev Only?

2015-03-06 Thread Nick Dimiduk
Hi Joseph, Generally speaking we've thought of stand-alone mode a dev/testing because the common use case for HBase is larger datasets. There's nothing specifically non-production about a stand-alone mode, though you obviously won't have high-availability, and there may be bugs in the code paths t

Standalone == Dev Only?

2015-03-03 Thread Rose, Joseph
Folks, I’m new to HBase (but not new to these sorts of data stores.) I think HBase would be a good fit for a project I’m working on, except for one thing: the amount of data we’re talking about, here, is far smaller than what’s usually recommended for HBase. As I read the docs, though, it seems