Re: How to keep data consistency?

2014-02-19 Thread Sergey Murylev
Hi Edward, You can't achieve data consistency on your cluster configuration. To do this you need at least 3 data nodes and enabled replication with level 3 ( dfs.replication property in hdfs-site.xml). On 19/02/14 13:02, EdwardKing wrote: > Hadoop 2.2.0, two computer, one is master,another is nod

Re: UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE)

2014-04-12 Thread Sergey Murylev
Hi, > 1 - I don't understand why this is happening. I should have been able > to copy data. Why I can't copy data between hdfs? I think you should check permissions not only for root ("/"), you need to make sure that whole path is accessible for root. You can do this using following command: > /r

Re: HDFS Installation

2014-04-12 Thread Sergey Murylev
Hi Ekta, You can look to following instructions: single node cluster multi node cluster Actually I recommend to us

Re: Sqoop import/export tool fails with PriviledgedActionException

2014-04-24 Thread Sergey Murylev
Hi Kuchekar, > I do have the mentioned jar (avro-mapred-1.5.3.jar) in the mentioned > location. Not sure, what I am missing. Make sure that you can read this file as same user as you use to run sqoop. According to logs you run sqoop as root. I not sure that root has such privileges. You can try to

Re: CDH4 administration through one account

2014-04-28 Thread Sergey Murylev
Hi Raj, > Should 'john1' be included in the 'sudoers' file ? Hadoop don't use root privileges. But it has some built-in users and groups like hdfs, mapred, etc. I think you should add your admin user at least to groups hdfs and mapred. -- Thanks, Sergey On 28/04/14 23:40, Raj Hadoop wrote: > Hi,

Re: Are mapper classes re-instantiated for each record?

2014-05-05 Thread Sergey Murylev
Hi Jeremy, According to official documentation setup and cleanup calls performed for each InputSplit. In this case you variant 2 is more correct. But actually single mapper can be used for processing multiple InputS

Re: High performance Count Distinct - NO Error

2014-08-06 Thread Sergey Murylev
Why do you think that default implementation of COUNT DISTINCT is slow? As far as I understand the most famous way to find number of distinct elements is to sort them and scan all sorted items consequently excluding duplicated elements. Assimptotics of this algoritm is O(n *log n ), I think that th

Re: Pseudo -distributed mode

2014-08-12 Thread Sergey Murylev
Yes :) Pseudo-distributed mode is such configuration when we have some Hadoop environment on single computer. On 12/08/14 18:25, sindhu hosamane wrote: > Can Setting up 2 datanodes on same machine be considered as > pseudo-distributed mode hadoop ? > > Thanks, > Sindhu signature.asc Descrip

Re: Pseudo -distributed mode

2014-08-13 Thread Sergey Murylev
le Java process" . > > But if my hadoop is pseudo distributed mode , why does it still runs > as a single Java process and utilizes only 1 cpu core even if there > are many more ? > > > On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev > mailto:sergeymury...@gmail.com>&g