Re: Embedded table data model

2012-07-12 Thread Ian Varley
Yes, that's fine; you can always do a single column PUT into an existing row, in a concurrency-safe way, and the lock on the row is only held as long as it takes to do that. Because of HBase's Log-Structured Merge-Tree architecture, that's efficient because the PUT only goes to memory, and is

Reporting tool for Hbase

2012-07-12 Thread Amlan Roy
Hi, I am looking for a reporting tool that can use Hbase data as input. Any recommendation? I am using Pentaho PDI because it can use Hbase data as input. But I am getting a strange error. My cluster is running, I can access data from my client program. But Pentaho is giving the following

Re: Reporting tool for Hbase

2012-07-12 Thread Sonal Goyal
Hi Amlan, Which versions are you running on? Do you see any errors in the hbase logs? For reporting over Hbase, you can also take a look at Crux at http://github.com/sonalgoyal/crux Best Regards, Sonal [1] Crux: Reporting for HBase https://github.com/sonalgoyal/crux Nube Technologies

RE: Reporting tool for Hbase

2012-07-12 Thread Amlan Roy
Hi Sonal, I am using hbase-0.92.0 with hadoop-1.0.0. Pentaho was using hbase-0.90.3 with hadoop-0.20.2. I replaced those jars with the jars I am using and restarted Pentaho. The issue was not resolved. I searched for the logs but did not find any in Pentaho. I will take a look at Crux. Thanks a

Reporting tool for Hbase

2012-07-12 Thread Amlan Roy
Hi, I am looking for a reporting tool that can use Hbase data as input. Any recommendation? I am using Pentaho PDI because it can use Hbase data as input. But I am getting a strange error. My cluster is running, I can access data from my client program. But Pentaho is giving the following

Re: Reporting tool for Hbase

2012-07-12 Thread xkwang bruce
hi amlan, It maybe that your Pentaho cannot connect cluster, so you should check your config file patiently. Just my suggestion. I had't used Pentaho and relatived tool. bruce, 2012/7/12 Amlan Roy amlan@cleartrip.com Hi, I am looking for a reporting tool that can use Hbase data as

Re: Blocking Inserts

2012-07-12 Thread Martin Alig
Thank you for the comment. Compaction queue seems to be at 0 (?) all the time. About the blocking store file: I already increased this value, but I could not see any improvements. Going through the logs during a blocking period, I often see a CompactionRequest. Then, for 1 minute or so nothing,

Re: Why Hadoop can't find Reducer when Mapper reads data from HBase?

2012-07-12 Thread Stack
On Thu, Jul 12, 2012 at 1:15 PM, yonghu yongyong...@gmail.com wrote: java.lang.RuntimeException: java.lang.ClassNotFoundException: com.mapreducetablescan.MRTableAccess$MTableReducer; Does anybody know why? Its not in your job jar? Check the job jar (jar -tf JAR_FILE). St.Ack

HDFS + HBASE process high cpu usage

2012-07-12 Thread Asaf Mesika
Hi, I have a cluster of 3 DN/RS and another computer hosting NN/Master. From some reason, two of the DataNode nodes are showing high load average (~17). When using top I can see HDFS and HBASE processes are the one using the most of the cpu (95% in top). When inspecting both HDFS and HBASE

Re: HDFS + HBASE process high cpu usage

2012-07-12 Thread Asaf Mesika
Just adding more information. The following is a histogram output of 'strace -p hdfs-pid -f -C' which ran for 10 seconds. From some reason futex takes 65% of the time. % time seconds usecs/call callserrors syscall -- --- --- - -

Re: hbase multi-user security

2012-07-12 Thread Devaraj Das
On Jul 11, 2012, at 10:41 AM, Tony Dean wrote: Hi, Looking into hbase security, it appears that when HBaseRPC is creating a proxy (e.g., SecureRpcEngine), it injects the current user: User.getCurrent() which by default is the cached Kerberos TGT (kinit'ed user - using the

RE: hbase multi-user security

2012-07-12 Thread Tony Dean
Thanks Andy for the reply. I understand your normal use case... If we are hosting we could create separate Web apps per client so that authentication occurs for each client back to the same hbase/hadoop cluster... therefore, each client would see only the data that they are supposed to see.

DataNode Hardware

2012-07-12 Thread Bartosz M. Frak
Quick question about data node hadrware. I've read a few articles, which cover the basics, including the Cloudera's recommendations here: http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/ The article is from early 2010, but I'm assuming that

Re: DataNode Hardware

2012-07-12 Thread Amandeep Khurana
Inline. On Thursday, July 12, 2012 at 12:56 PM, Bartosz M. Frak wrote: Quick question about data node hadrware. I've read a few articles, which cover the basics, including the Cloudera's recommendations here:

RE: hbase multi-user security

2012-07-12 Thread Tony Dean
Devaraj, I do see hbase secure impersonation being a nice feature for a multi-user environment. You authenticate one user and perform actions based on other identities. -Tony -Original Message- From: Tony Dean Sent: Thursday, July 12, 2012 3:45 PM To: user@hbase.apache.org Subject:

Re: hbase multi-user security

2012-07-12 Thread Andrew Purtell
On Thu, Jul 12, 2012 at 12:44 PM, Tony Dean tony.d...@sas.com wrote: I'm wondering how that proxy user can be injected into the RPC connection when making requests. Right, hence the suggestion to be able to set User per thread, at least, via a thread local, so you can set at will and RPC will

Re: DataNode Hardware

2012-07-12 Thread Bartosz M. Frak
Amandeep Khurana wrote: Inline. On Thursday, July 12, 2012 at 12:56 PM, Bartosz M. Frak wrote: Quick question about data node hadrware. I've read a few articles, which cover the basics, including the Cloudera's recommendations here:

Re: DataNode Hardware

2012-07-12 Thread Amandeep Khurana
The issue with having lower cores per box is that you are collocating datanode, region servers, task trackers and then the MR tasks themselves too. Plus you need a core for the OS too. These are things that need to run on a single node, so you need a minimum amount of resources that can handle

Re: hbase multi-user security

2012-07-12 Thread Devaraj Das
Wouldn't this work: User user = User.create(UserGroupInformation.createProxyUser(userToImpersonate, UserGroupInformation.getLoginUser())) //Run the regionserver operation within a runAs (authentication will happen using the credentials of the loginuser) user.runAs(...) At the RPC layer, the

Implement a shell in the Master UI

2012-07-12 Thread Claudiu Olteanu
Hello! My name in Claudiu Olteanu and I want to implement a shell in the Master UI. The problem is that I  don't know how to capture the output of the IRB's commands. I tried to create a new ruby class which runs the commands and save the stdout but it can't call any IRB's methods. I've

Re: hbase multi-user security

2012-07-12 Thread Devaraj Das
In the secure mode, the server will expect to see the [rpc-user == authenticating-user]. So (without code digging, IIRC) the idea of using a random rpc-user might not work.. The proxy user (my earlier mail) stuff attempts to address this problem. Please correct me if I am missing/overlooking

hbase secure channel

2012-07-12 Thread Tony Dean
Hi, Once authentication has been accomplished the application data begins to flow between client and server. How can one assure that the data is private? I see an hbase property to turn on privacy: hbase.rpc.protection=privacy. Is this basically SSL, but instead of using certificates, it's

Re: hbase secure channel

2012-07-12 Thread Andrew Purtell
On Thu, Jul 12, 2012 at 2:20 PM, Tony Dean tony.d...@sas.com wrote: Hi, Once authentication has been accomplished the application data begins to flow between client and server. How can one assure that the data is private? I see an hbase property to turn on privacy:

Re: DataNode Hardware

2012-07-12 Thread Michael Segel
Uhm... I'd take a step back... Thanks for the reply. I didn't realized that all the non-MR tasks were this CPU bound; plus my naive assumption was that four spindles will have a hard time supplying data to MR fast enough for it to become bogged down. Your gut feel is correct. If you go w

Re: HDFS + HBASE process high cpu usage

2012-07-12 Thread Esteban Gutierrez
Hi Asaf, By any chance is this issue has been going on in your boxes for the last few days? I won't be surprised by so many calls to futex by the JVM itself, but since you are giving the same symptoms as the leap second issue it would be good to know what OS are you using, if NTP is/was running

Re: HDFS + HBASE process high cpu usage

2012-07-12 Thread deanforwever2010
maybe there is some slow query I met the same problem,I found out that I query 100 thousand columns of a row, the hbase had no response and stopped working. 2012/7/13 Esteban Gutierrez este...@cloudera.com Hi Asaf, By any chance is this issue has been going on in your boxes for the last few

Re: Embedded table data model

2012-07-12 Thread Cole
I think this design has some question, please refer http://hbase.apache.org/book/number.of.cfs.html 2012/7/12 Ian Varley ivar...@salesforce.com Yes, that's fine; you can always do a single column PUT into an existing row, in a concurrency-safe way, and the lock on the row is only held as long

Re: Embedded table data model

2012-07-12 Thread Ian Varley
Column families are not the same thing as columns. You should indeed have a small number of column families, as that article points out. Columns (aka column qualifiers) are run-time defined key/value pairs that contain the data for every row, and having large numbers of these is fine. On

Re: Embedded table data model

2012-07-12 Thread Xiaobo Gu
Hi Ian, Do you mean each transaction will be created as a column inside the cf for transactions, and these columns are created dynamically as transactions occur? Regards, Xiaobo Gu On Fri, Jul 13, 2012 at 11:08 AM, Ian Varley ivar...@salesforce.com wrote: Column families are not the same

Re: Embedded table data model

2012-07-12 Thread Ian Varley
Yes, that's what I mean. It is not the only way to model this, but your question was, Can we embedded the transactions inside the customer table in HBase. On Jul 12, 2012, at 8:21 PM, Xiaobo Gu guxiaobo1...@gmail.commailto:guxiaobo1...@gmail.com wrote: Hi Ian, Do you mean each transaction

Re: Why Hadoop can't find Reducer when Mapper reads data from HBase?

2012-07-12 Thread yonghu
Strage thing is the same program works fine in the cluster. By the way, also in pseudo mode when MapReduce read data from Cassandra in Map phase and transferred to Reduce phase, the same error happened. regards! Yong On Thu, Jul 12, 2012 at 2:01 PM, Stack st...@duboce.net wrote: On Thu, Jul