For MOB, please take a look at HBASE-11339 Cheers
On Nov 27, 2014, at 3:32 PM, Aleks Laz <al-userhb...@none.at> wrote: > Hi Wilm. > > Am 27-11-2014 23:41, schrieb Wilm Schumacher: >> Hi Aleks ;), >> Am 27.11.2014 um 22:27 schrieb Aleks Laz: >>> Our application is a nginx/php-fpm/postgresql Setup. >>> The target design is nginx + proxy features / php-fpm / $DB / $Storage. >>> .) Can I mix HDFS /HBase for binary data storage and data analyzing? >> yes. hbase is perfect for that. For storage it will work (with the >> "MOB-extension") and with map reduce you can do whatever data analyzing >> you want. I assume you do some image processing with the data?!?! > > What's the plan about the "MOB-extension"? > > From development point of view I can build HBase with the "MOB-extension" > but from sysadmin point of view a 'package' (jar,zip, dep, rpm, ...) is much > easier to maintain. > > Currently there are no plans to analyse the images, but who knows what the > future brings. > > We need to make some "accesslog" analyzing like piwik or awffull. > Maybe elasticsearch is a better tool for that? > >>> .) What is the preferred way to us HBase with PHP? >> The native client lib is in java. This is the best way to go. But if you >> need only basic access from the php application, then thrift or rest >> would be a good choice. >> http://wiki.apache.org/hadoop/Hbase/ThriftApi >> http://wiki.apache.org/hadoop/Hbase/Stargate > > Stargate is a cool name ;-) > >> There are language bindings for both >>> .) How difficult is it to use HBase with PHP? >> Depending on what you are trying to do. If you just do a little >> fetching, updating, inserting etc. it's pretty easy. More complicate >> stuff I would do in java and expose it by a custom api by a java service. >>> .) What's a good solution for the 37 TB or the upcoming ~120 TB to >>> distribute? >>> [ ] N Servers with 1 37 TB mountpoints per server? >>> [ ] N Servers with x TB mountpoints pers server? >>> [ ] other: >> that's "not your business". hbase/hadoop does the trick for you. hbase >> distributes the data, replicates it etc.. You will only talk to the master. > > Well but at the end of the day I will need a physical storage distributed over > x servers. > > My question is do I need to care that all servers have enough storage for the > whole data? > > As far as I have understood hadoop client see a 'Filesystem' with 37 TB or > 120 TB but from the server point of view how should I plan the storage/server > setup for the datanodes. > > As from the link below hadoophbase-capacity-planning and > > http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/ > > ##### > .... > Here are the recommended specifications for DataNode/TaskTrackers in a > balanced Hadoop cluster: > > 12-24 1-4TB hard disks in a JBOD (Just a Bunch Of Disks) configuration > ... > ##### > > What happen when a datanode have 20TB but the whole hadoop/HBase 2 node > cluster have 40? > > I see I'm still new to hadoop/HBase concept. > >>> .) Is HBase a good value for $Storage? >> yes ;) >>> .) Is HBase a good value for $DB? >>> DB-Size is smaller then 1 GB, I would use HBase just for HA features >>> of Hadoop. >> well, the official documentation says: >> »First, make sure you have enough data. If you have hundreds of millions >> or billions of rows, then HBase is a good candidate. If you only have a >> few thousand/million rows, then using a traditional RDBMS might be a >> better choice ...« > > Okay so I will stay for this on postgresql with pgbouncer. > >> In my experience at around 1-10 million rows RDBMS are not really >> useable anymore. But I only used small/cheap hardware ... and don't like >> RDBMS ;). > > ;-) > >> Well, you will have at least 40 million rows ... and the plattform is >> growing. I think SQL isn't a choice anymore. And as you have heavy read >> and only a few writes hbase is a good fit. > > ?! why "40 million rows", do you mean the file tables? > In the DB is only some Data like, User account, id for a directory and so on. > >>> .) Due to the fact that HBase is a file-system I could use >>> /cams , for binary data >>> /DB , for DB storage >>> /logs , for log storage >>> but is this wise. On the 'disk' they are different RAIDs. >> hbase is a data store. This was probably copy pasted from the original >> hadoop question ;). > > ;-) > >>> .) Should I plan a dedicated Network+Card for the 'cluster >>> communication' as for the most other cluster software? >>> From what I have read it looks not necessary but from security point >>> of view, yes. >> http://blog.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/ >> Cloudera employees says that it wouldn't harm if you have to push a lot >> of data to the cluster. > > Okay, so it is like other cluster setups. > >>> .) Maybe the communication with the componnents (hadoop, zk, ...) could >>> be setup ed with TLS? >> hbase is build on top of hadoop/hdfs. This in the "hadoop domain". >> hadoop can encrypt the transported data by TLS, can encrypt the data on >> the disc, you can use kerberos auth (but this stuff I never did) etc. >> etc.. So the answer is yes. > > Thanks. > >> Last remark: You seem kind of bound to PHP. The hadoop world is written >> in java. Of course there are a lot of ways to do stuff in other >> languages, over interfaces etc. But the java api is the most powerful >> and sometimes there are no other ways then to use it directly. > > Currently, yes php is the main language. > I don't know a good solution for php similar like hadoop, anyone else know > one? > > I will take a look on > > https://wiki.apache.org/hadoop/PoweredBy > > to get some Ideas for a working solution. > >> Best wishes, >> Wilm > > Thanks for your feedbak. > I will dig deeper into this topic and start to setup the components step by > step. > > BR Aleks