Hi Aleks ;), Am 27.11.2014 um 22:27 schrieb Aleks Laz: > Our application is a nginx/php-fpm/postgresql Setup. > The target design is nginx + proxy features / php-fpm / $DB / $Storage. > > .) Can I mix HDFS /HBase for binary data storage and data analyzing? yes. hbase is perfect for that. For storage it will work (with the "MOB-extension") and with map reduce you can do whatever data analyzing you want. I assume you do some image processing with the data?!?!
> .) What is the preferred way to us HBase with PHP? The native client lib is in java. This is the best way to go. But if you need only basic access from the php application, then thrift or rest would be a good choice. http://wiki.apache.org/hadoop/Hbase/ThriftApi http://wiki.apache.org/hadoop/Hbase/Stargate There are language bindings for both > .) How difficult is it to use HBase with PHP? Depending on what you are trying to do. If you just do a little fetching, updating, inserting etc. it's pretty easy. More complicate stuff I would do in java and expose it by a custom api by a java service. > .) What's a good solution for the 37 TB or the upcoming ~120 TB to > distribute? > [ ] N Servers with 1 37 TB mountpoints per server? > [ ] N Servers with x TB mountpoints pers server? > [ ] other: that's "not your business". hbase/hadoop does the trick for you. hbase distributes the data, replicates it etc.. You will only talk to the master. > .) Is HBase a good value for $Storage? yes ;) > .) Is HBase a good value for $DB? > DB-Size is smaller then 1 GB, I would use HBase just for HA features > of Hadoop. well, the official documentation says: »First, make sure you have enough data. If you have hundreds of millions or billions of rows, then HBase is a good candidate. If you only have a few thousand/million rows, then using a traditional RDBMS might be a better choice ...« In my experience at around 1-10 million rows RDBMS are not really useable anymore. But I only used small/cheap hardware ... and don't like RDBMS ;). Well, you will have at least 40 million rows ... and the plattform is growing. I think SQL isn't a choice anymore. And as you have heavy read and only a few writes hbase is a good fit. > .) Due to the fact that HBase is a file-system I could use > /cams , for binary data > /DB , for DB storage > /logs , for log storage > but is this wise. On the 'disk' they are different RAIDs. hbase is a data store. This was probably copy pasted from the original hadoop question ;). > .) Should I plan a dedicated Network+Card for the 'cluster > communication' as for the most other cluster software? > From what I have read it looks not necessary but from security point > of view, yes. http://blog.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/ Cloudera employees says that it wouldn't harm if you have to push a lot of data to the cluster. > .) Maybe the communication with the componnents (hadoop, zk, ...) could > be setup ed with TLS? hbase is build on top of hadoop/hdfs. This in the "hadoop domain". hadoop can encrypt the transported data by TLS, can encrypt the data on the disc, you can use kerberos auth (but this stuff I never did) etc. etc.. So the answer is yes. Last remark: You seem kind of bound to PHP. The hadoop world is written in java. Of course there are a lot of ways to do stuff in other languages, over interfaces etc. But the java api is the most powerful and sometimes there are no other ways then to use it directly. Best wishes, Wilm