MapReduce problem with SecureBase in HBase

2013-04-14 Thread Farrokh Shahriari
Hi, I've downloaded SecureBase.jar from github and have used it for my hbase tables.I have no problem in hbase shell and can put and scan the table correctly. But when I wanna use MapReduce to scan a table and put some values to another table,have got problems. My map phase is correct, but in

Re: Best Hadoop dev environment [WAS: RE: Few noob MR questions]

2013-04-14 Thread Jens Scheidtmann
Dear Vjeran, 2013/4/14 Vjeran Marcinko vjeran.marci...@email.t-com.hr Hi again, ** ** You actually touched what I'm trying to do here – setup best Hadooop development environment. [...] **So are there any more hints for me to setup this environment? In Eclipse you can use

Re: Copy Vs DistCP

2013-04-14 Thread Mathias Herberts
That was a hidden shameless plug Ted ;-) The main disadvantage of fs -cp is that all data has to transit via the machine you issue the command on, depending on the size of data you want to copy that can be a killer. DistCp is distributed as its name imply, so no bottleneck of this kind then. On

Test methods IR task on real-time content

2013-04-14 Thread Joachim Van den Bogaert
Hi all, I was wondering whether anyone has ever used information retrieval metrics on real-time big data with variable amounts of data. The main idea would be to test whether you can find relevant information for a given time frame for two data repositories: one baseline repository and one

Re: Best Hadoop dev environment [WAS: RE: Few noob MR questions]

2013-04-14 Thread Michel Segel
I tend to use a real cluster so that I can test at a reasonable fraction of scale. I've seen some instances where code that ran 'okay' in aVM failed to perform adequately at scale. Sent from a remote device. Please excuse any typos... Mike Segel On Apr 14, 2013, at 2:19 AM, Jens Scheidtmann

Re: Copy Vs DistCP

2013-04-14 Thread Ted Dunning
Inline On Sun, Apr 14, 2013 at 1:13 AM, Mathias Herberts mathias.herbe...@gmail.com wrote: That was a hidden shameless plug Ted ;-) Well, I will admit it was a shameless correction to Lance's absolute and incorrect claim. The main disadvantage of fs -cp is that all data has to transit

Re: Copy Vs DistCP

2013-04-14 Thread Mathias Herberts
This is absolutely true. Distcp dominates cp for large copies. On the other hand cp dominates distcp for convenience. In my own experience, I love cp when copying relatively small amounts of data (10's of GB) where the available bandwidth of about a GB/s allows the copy to complete in less

Re: Copy Vs DistCP

2013-04-14 Thread Ted Dunning
On Sun, Apr 14, 2013 at 10:33 AM, Mathias Herberts mathias.herbe...@gmail.com wrote: This is absolutely true. Distcp dominates cp for large copies. On the other hand cp dominates distcp for convenience. In my own experience, I love cp when copying relatively small amounts of data

Re: A question of QJM with HDFS federation

2013-04-14 Thread Azuryy Yu
Hi Harsh, If they are two separate cluster, instead of federated, haing the same cluster ID, but using different name service ID, so can they use the same journal nodes and ZK nodes? and if they are also separated cluster, with different clusterID, but using the same name service ID, so can they