Re: serving static images using HBase

2014-08-08 Thread Damien Hardy
Hello Serega, We use it this way here via a python image manipulation service named Thumbor https://github.com/thumbor/thumbor/ + a pluggin of my own : https://github.com/thumbor/thumbor/wiki/Plugins#thumbor_hbase-by-damien-hardy One big advantage is you can use it with lazy loading plugin

HBase export limit bandwith

2014-06-04 Thread Damien Hardy
Hello, We are trying to export HBase table on S3 for backup purpose. By default export tool run a map per region and we want to limit output bandwidth on internet (to amazon s3). We were thinking in adding some reducer to limit the number of writers but this is explicitly hardcoded to 0 in

Re: HBase export limit bandwith

2014-06-04 Thread Damien Hardy
On Jun 4, 2014, at 5:39 AM, Damien Hardy dha...@viadeoteam.com wrote: Hello, We are trying to export HBase table on S3 for backup purpose. By default export tool run a map per region and we want to limit output bandwidth on internet (to amazon s3). We were thinking in adding some reducer

Re: Is it necessary to set MD5 on rowkey?

2013-12-17 Thread Damien Hardy
-Mike On Dec 18, 2012, at 3:33 AM, Damien Hardy dha...@viadeoteam.com wrote: Hello, There is middle term betwen sequecial keys (hot spoting risk) and md5 (heavy scan): * you can use composed keys with a field that can segregate data (hostname, productname, metric name) like OpenTSDB

Re: Is it necessary to set MD5 on rowkey?

2013-12-17 Thread Damien Hardy
are primarily going to access the data. You can then determine the best way to store the data to gain the best performance. For some applications... the region hot spotting isn't an important issue. Note YMMV HTH -Mike On Dec 18, 2012, at 3:33 AM, Damien Hardy dha...@viadeoteam.com wrote: Hello

Re: hbase-0.94.7 doesn't support HDFS QJM HA

2013-05-24 Thread Damien Hardy
:408) at java.lang.Thread.run(Thread.java:722) Doesn't that mean, HBase can only specify a real name node, not name-service ID? if so, HBase will be failed if Namenode crashed even if configured HDFS HA. -- Damien HARDY IT Infrastructure Architect Viadeo - 30 rue de la Victoire - 75009

Re: hbase-0.94.7 doesn't support HDFS QJM HA

2013-05-24 Thread Damien Hardy
And about the visibility of HDFS conf in HBase classpath, so ? (classpath should appear in startup log of HBase processes) 2013/5/24 Azuryy Yu azury...@gmail.com they are all configured as you pointed. --Send from my Sony mobile. On May 24, 2013 6:18 PM, Damien Hardy dha...@viadeoteam.com

Re: Dual Hadoop/HBase configuration through same client

2013-04-27 Thread Damien Hardy
ideas? Thanks a million. Regards, Shahab -- Damien HARDY IT Infrastructure Architect Viadeo - 30 rue de la Victoire - 75009 Paris - France

Re: Bulkload or hbase API

2013-03-14 Thread Damien Hardy
workflow). The more efficient (1 job) would be pure home made Java MapReduce (mapper only for each MySQL DB bulk loading on HTables) Cheers, -- Damien HARDY

Re: Bulkload or hbase API

2013-03-14 Thread Damien Hardy
Actually the concurency is limited by the number of map slots available in the Jobtracker (MR1). The last map tasks wait for the first ones to finish. -- Damien HARDY

Re: Talks at HBase Meetup down at Intel last Thursday evening

2013-03-05 Thread Damien Hardy
improvements in HBase The slides have been posted up on meetup. See Andrew's listing of them on the main page: http://www.meetup.com/hbaseusergroup/events/96584102/ St.Ack -- Damien HARDY IT Infrastructure Architect Viadeo - 30 rue de la Victoire - 75009 Paris - France

Re: discp versus export

2013-03-05 Thread Damien Hardy
IMO the easier would be hbase export. For long term offline backup (for disaster recovery). It can even be stored on a different hdfs storage than the one used by hbase using a full hdfs:// url as destination directory. Le 5 mars 2013 22:52, Leonid Fedotov lfedo...@hortonworks.com a écrit :

Re: MapReduce to load data in HBase

2013-02-07 Thread Damien Hardy
Hello, Why not using a PIG script for that ? make the json file available on HDFS Load with http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html Store with http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html

Re: Json+hbase

2013-02-04 Thread Damien Hardy
of this e-mail message, contents or its attachment other than by its intended recipient/s is strictly prohibited. Visit us at http://www.polarisFT.com -- Damien HARDY

Re: Storing images in Hbase

2013-01-06 Thread Damien Hardy
Hi there, Thank you, and happy new year. I had the same problematic and wrote a python module⁰ for thumbor¹ I use the Thrift interface for HBase to store image blobs. As allready said you have to keep images blob quite small (for latency problematic in web you have to keep them small too) ~100ko,

Re: Fastest way to find is a row exist?

2013-01-04 Thread Damien Hardy
Hello Jean-Marc, BloomFilters are just designed for that. But they say if a row doesn't exist with a ash of the key (not the oposit, 2 rowkeys could have the same ash result). If you want to be sure the rowkey exists you have to search for it in the HFile ( the whole mechanism is transparent

Re: Is it necessary to set MD5 on rowkey?

2012-12-18 Thread Damien Hardy
a range of date value one time on the date by MD5. How to balance this issue? Thanks. -- Damien HARDY

Re: Access remote HBase server with different user names

2012-11-14 Thread Damien Hardy
| http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform -- Damien HARDY IT Infrastructure Architect Viadeo - 30 rue

Re: Access remote HBase server with different user names

2012-11-14 Thread Damien Hardy
To be correct my remark about HDFS HA is not even relevant for HBase client just need zk quorum 2012/11/14 Carsten Schnober schno...@ids-mannheim.de Am 14.11.2012 17:04, schrieb Damien Hardy: Hi Damien, if zookeeper is running on server1 so No need for ssh access : just run `hbase

Re: scan filtering column familly return wrong cell

2012-11-12 Thread Damien Hardy
out as an HFile. On Fri, Nov 9, 2012 at 8:52 AM, Damien Hardy dha...@viadeoteam.com wrote: Ok I can reply to myself ... you have to add a clone of the KeyValue in the Put. So p.add(kv); becomes p.add(kv.clone()); If not, I suppose only the last one is added in HBase

Re: scan filtering column familly return wrong cell

2012-11-09 Thread Damien Hardy
Ok I can reply to myself ... you have to add a clone of the KeyValue in the Put. So p.add(kv); becomes p.add(kv.clone()); If not, I suppose only the last one is added in HBase (but the result is quite weird and should be fixed IMO) Cheers, -- Damien 2012/11/9 Damien Hardy dha

Re: HBASE vs Data Historian

2012-10-03 Thread Damien Hardy
Hello, Take a look at http://opentsdb.net/overview.html it's really look like what your are describing. Cheers 2012/10/3 Wendy Buster stevebuster...@hotmail.com I use a data historian (sometimes called time series database) for collecting and persisting large (billions) of rows of

Re: long garbage collecting pause

2012-10-02 Thread Damien Hardy
Hello 2012/10/2 Marcos Ortiz mlor...@uci.cu Another thing that I´m seeing is that one of your main process is compaction, so you can optimize all this inceasing the size of your regions (by defaulf the size of a region is 256 MB), but you will have in your hands a split/compaction storm

Re: Does hbase 0.90 client work with 0.92 server?

2012-09-28 Thread Damien Hardy
, if you find bugs while a release is in progress, it increases your chances to get your bugs fixed... Nicolas On Thu, Sep 27, 2012 at 10:37 AM, Damien Hardy dha...@viadeoteam.com wrote: Actually, I have an old cluster on on prod with 0.90.3 version installed manually and I am working

Re: Does hbase 0.90 client work with 0.92 server?

2012-09-27 Thread Damien Hardy
Hello, Corollary, what is the better way to migrate data from a 0.90 cluster to a 0.92 cluser ? Hbase 0.90 = Client 0.90 = stdout | stdin = client 0.92 = Hbase 0.92 All the data must tansit on a single host where compute the 2 clients. It may be paralalize with mutiple version working with

Re: Does hbase 0.90 client work with 0.92 server?

2012-09-27 Thread Damien Hardy
more complex? A kind of realtime replication between two clusters in two different versions? On Thu, Sep 27, 2012 at 9:56 AM, Damien Hardy dha...@viadeoteam.com wrote: Hello, Corollary, what is the better way to migrate data from a 0.90 cluster to a 0.92 cluser ? Hbase 0.90 = Client

Re: Use of MD5 as row keys - is this safe?

2012-07-20 Thread Damien Hardy
Le 20/07/2012 18:22, Jonathan Bishop a écrit : Hi, I know it is a commonly suggested to use an MD5 checksum to create a row key from some other identifier, such as a string or long. This is usually done to guard against hot-spotting and seems to work well. My concern is that there no guard

Re: HBase first steps: Design a table

2012-06-12 Thread Damien HARDY
Hi Jean-Marc, I reply in your text. Le 12/06/2012 23:42, Jean-Marc Spaggiari a écrit : Hi, I have read all the documentation here http://hbase.apache.org/book/book.html and I now have few questions. I currently have a mysql table with millions of lines (4 for now, but it's growing by 4

Hbase REST interface + JSON ?

2012-05-18 Thread Damien HARDY
is : how to create Scanner in Json ? And how specify filter ? (only documentated as string in schema) Thank you, Cheers, -- Damien Hardy signature.asc Description: OpenPGP digital signature

Hbase CopyTable timeout on scanner

2012-05-07 Thread Damien HARDY
Hello, I try to copy a table from on cluster to another. source is a 2 nodes cluster 16cpu / 32GoRAM (hadoop001, hadoop002). destination is a 3 nodes cluster 16cpu /64GoRAM (hbase01, hbase02, hbase04). nodes are all implementing datanode, regionserver,masterserver and zookeeper of CDH3u3 region

Re: Hbase CopyTable timeout on scanner

2012-05-07 Thread Damien HARDY
for is that if you are starting with 421 regions on the source but the dest table isn't pre-split then it's going to try to slam all the data into one region and then have to split (and split and split, etc.). http://hbase.apache.org/book.html#perf.writing On 5/7/12 8:22 AM, Damien HARDY dha

Re: HBase Rowcounter not working..

2012-05-07 Thread Damien HARDY
Hello, If you have the default /etc/zookeeper/zoo.cfg try to rename or remove it. It takeover the zookeeper Hbase quorum configuration of hbase-site.xml Cheers, -- Damien Le 07/05/2012 17:17, Subir S a écrit : Hello, Version:0.90.4-CDH3U3 HBase managed ZK I tried to run a simple

Re: Solr+Hbase

2012-03-08 Thread Damien HARDY
Le 08/03/2012 09:18, Mohammad Tariq a écrit : Hello list, We are planning to index our data stored in HBase using Solr.As we are totally new to Solr, we would like to have some comments from someone who is already doing it..While looking over the internet we came across Liliy.Is there any

How to implement tests for python based application using Hbase-thrift interface

2012-01-30 Thread Damien Hardy
Hello, I wrote some code in python using Hbase as image storage. I want my code to be tested independently of some external Hbase full architecture so my question is : Is there some howto helping on instantiate a temporary local minicluster + thrift interface in order to pass python (or maybe

Re: the occasion of the major compact?

2012-01-26 Thread Damien Hardy
Le 26/01/2012 14:43, yonghu a écrit : Hello, I read this blog http://outerthought.org/blog/465-ot.html. It mentions that every 24 hours the major compaction will occur. My question is that if there are any other conditions which can trigger major compaction happening? For example, when the

Using TTL tout purge data automatically ?

2011-09-23 Thread Damien Hardy
Hello, I created yesterday an HTable with 2 CF specifying the TTL for 5 an 10 min respectively. Inserted 2 datas (one in each column) And hoped that my values desapear passed a certain amount of time. This never happend ... This morning I keep hope that major_compaction once a days

Re : Using TTL tout purge data automatically ?

2011-09-23 Thread Damien HARDY
= '60' TTL = '30' It's just three orders of magnitude different from what you thought you set the TTL to :) J-D On Fri, Sep 23, 2011 at 2:22 AM, Damien Hardy dha...@figarocms.fr wrote: Hello, I created yesterday an HTable with 2 CF specifying the TTL for 5 an 10 min