Re: Different input format based on files names in driver code

2017-06-23 Thread Ankit Singhal
You can create a list of files for each type and use MultipleInputs[1]. https://hadoop.apache.org/docs/r2.6.3/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html On Thu, Jun 22, 2017 at 10:30 PM, vivek wrote: > Thanks! > > > On Jun 22, 2017 20:15, "Erik

Re: Different input format based on files names in driver code

2017-06-22 Thread vivek
Thanks! On Jun 22, 2017 20:15, "Erik Krogen" wrote: > You would need to write a custom InputFormat which would return an > appropriate RecordReader based on the file format involved in each > InputSplit. You can have InputFormat#getSplits load InputSplits for both > file

Re: PolyBase Hadoop/SQL Server Integration results/experience? Does it scale horizontally?

2016-11-02 Thread Russell Jurney
I've described our use case here on Microsoft's forums for SQL Server. I'm hoping someone out there has used this technology:

Re: Can't run hadoop examples with YARN Single node cluster

2016-03-07 Thread Hitesh Shah
+common-user On Mar 7, 2016, at 3:42 PM, Hitesh Shah wrote: > > On Mar 7, 2016, at 1:50 PM, José Luis Larroque wrote: > >> Hi again guys, i could, finally, find what the issue was!!! >> >> > >> >> mapreduce.map.java.opts >> 256 >> >> >> >>

Re: Hadoop namenode high availability

2014-10-30 Thread Ivan Kelly
I'm not sure what they are trying to say with persistent session. A session in zookeeper has a timeout associated with it. If the server doesn't hear from the client within the timeout period the session is expired, and all ephemeral nodes associated with the session are deleted. This is what

Re: IP based hadoop cluster

2014-09-12 Thread sodul
Replying to my own thread here. While we got a good handle on the IP based hadoop cluster by using the settings mentioned above, we are now upgrading the Cloudera 5.1.0 packages and Yarn. So far most everything seemed to work well, except that for some reason Yarn insists on making use of DNS,

Re: where are the old hadoop documentations for v0.22.0 and below ?

2014-07-30 Thread Jane Wayne
harsh, those are just javadocs. i'm talking about the full documentations (see original post). On Tue, Jul 29, 2014 at 2:17 PM, Harsh J ha...@cloudera.com wrote: Precompiled docs are available in the archived tarballs of these releases, which you can find on:

Re: where are the old hadoop documentations for v0.22.0 and below ?

2014-07-30 Thread Harsh J
Jane, The tarball includes generated release documentation pages as well. Did you download and look inside? ~ tar tf hadoop-0.22.0.tar.gz | grep cluster_setup | grep html hadoop-0.22.0/common/docs/cluster_setup.html On Wed, Jul 30, 2014 at 11:24 PM, Jane Wayne jane.wayne2...@gmail.com wrote:

Re: where are the old hadoop documentations for v0.22.0 and below ?

2014-07-29 Thread Harsh J
Precompiled docs are available in the archived tarballs of these releases, which you can find on: https://archive.apache.org/dist/hadoop/common/ On Tue, Jul 29, 2014 at 1:36 AM, Jane Wayne jane.wayne2...@gmail.com wrote: where can i get the old hadoop documentation (e.g. cluster setup, xml

Re: where are the old hadoop documentations for v0.22.0 and below ?

2014-07-28 Thread Konstantin Boudnik
I think your best bet might be to check out a particular release tag for 0.22 release and checking the docs out there. Perhaps you might want to run 'ant docs' of whatever the target used to be back then. Cos On Mon, Jul 28, 2014 at 04:06PM, Jane Wayne wrote: where can i get the old hadoop

Re: slaves datanodes are not starting, hadoop v2.4.1

2014-07-27 Thread Jane Wayne
nevermind i resolved it. the solution was bad instructions on the hadoop site or unclear/misleading instructions. this is NOT the way to start slave datanode daemons (NOTICE THE SINGULAR DAEMON). $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode this is

Re: HDFS mounting issue using Hadoop-Fuse on Fully Distributed Cluster?

2014-05-03 Thread Harsh J
Can you check your dmesg | tail output to see if there are any error messages from the HDFS fuse client? On Sat, May 3, 2014 at 11:44 PM, Preetham Kukillaya pkuki...@gmail.com wrote: Hi, I m also getting the same error i.e. ?- ? ? ? ?? hdfs after mounting the hadoo file

Re: Data Locality Importance

2014-03-22 Thread Vinod Kumar Vavilapalli
Like you said, it depends both on the kind of network you have and the type of your workload. Given your point about S3, I'd guess your input files/blocks are not large enough that moving code to data trumps moving data itself to the code. When that balance tilts a lot, especially when moving

Re: Data Locality Importance

2014-03-22 Thread Chen He
AM To: common-user@hadoop.apache.org Subject: Re: Data Locality Importance Like you said, it depends both on the kind of network you have and the type of your workload. Given your point about S3, I'd guess your input files/blocks are not large enough that moving code to data trumps moving

Re: Error when connecting Hive

2014-03-07 Thread Nitin Pawar
somehow looks like hive is not able to find hadoop libs On Fri, Mar 7, 2014 at 11:48 PM, Manish manishbh...@rocketmail.com wrote: Please look into the below issue help. Original Message Subject:Error when connecting Hive Date: Fri, 07 Mar 2014 20:51:25 +0530

Re: are the job and task tracker monitor webpages gone now in hadoop v2.3.0

2014-03-06 Thread Vinod Kumar Vavilapalli
Yes. JobTracker and TaskTracker are gone from all the 2.x release lines. MapReduce is an application on top of YARN. That is per job - launches, starts and finishes after it is done with its work. Once it is done, you can go look at it in the MapReduce specific JobHistoryServer. +Vinod On

Re: are the job and task tracker monitor webpages gone now in hadoop v2.3.0

2014-03-06 Thread Jane Wayne
when i go to the job history server http://hadoop-cluster:19888/jobhistory i see no map reduce job there. i ran 3 simple mr jobs successfully. i verified by the console output and hdfs output directory. all i see on the UI is: No data available in table. any ideas? unless there is a

Re: are the job and task tracker monitor webpages gone now in hadoop v2.3.0

2014-03-06 Thread Jane Wayne
ok, the reason why hadoop jobs were not showing up was because i did not enable mapreduce to be run as a yarn application. On Thu, Mar 6, 2014 at 11:45 PM, Jane Wayne jane.wayne2...@gmail.comwrote: when i go to the job history server http://hadoop-cluster:19888/jobhistory i see no map

Re: du reserved can ran into problems with reserved disk capacity by tune2fs

2014-02-11 Thread Alexander Fahlke
Hi! At least I'm not alone with this issue. I'd like to create this ticket as I incidently ran into this again today with a few nodes. :( On which hadoop-version did you ran into this issue? I guess its not version related. It will enhance the ticket if it would not only affect the old hadoop

Re: Cannot connect to Ambari admin screen from Hortownworks sandbox

2014-01-04 Thread Yusaku Sako
Hi Gary, It looks like port 8080 is already taken on your machine by XDB. You should shut XDB down to free up port 8080 and re-launch the Sandbox VM. Then you should be able to log in to Ambari using ambari/ambari. Yusaku On Sat, Jan 4, 2014 at 3:19 PM, Ted Yu yuzhih...@gmail.com wrote

Re: Can NMs communicate among themselves?

2013-12-09 Thread santosh kumar
Hi Sandy, Thank you so much for the immediate response. Is there a way to make it happen? Any suggestions will be greatly appreciated. Also, can you tell me how any communication happens in the cluster, be it between RM and nodes or any scenario? Thanks, Santosh PhD Candidate USF, Tampa, FL

Re: Write a file to local disks on all nodes of a YARN cluster.

2013-12-08 Thread Adam Kawa
I believe that you could do that through Puppet, or any tool that can remotely execute some command (e.g. pssh). 2013/12/8 Jay Vyas jayunit...@gmail.com I want to put a file on all nodes of my cluster, that is locally readable (not in HDFS). Assuming that i cant gaurantee a FUSE mount or

Re: du reserved can ran into problems with reserved disk capacity by tune2fs

2013-12-01 Thread Adam Kawa
We ran into issue as well on our cluster. +1 for JIRA for that Alexander, could you please create a JIRA in https://issues.apache.org/jira/browse/HDFS for that (it is your observation, so that you should get credit ;). Otherwise, I can do that. 2013/2/12 Alexander Fahlke

Re: Hadoop Test libraries: Where did they go ?

2013-11-25 Thread Jay Vyas
Yup , we figured it out eventually. The artifacts now use the test-jar directive which creates a jar file that you can reference in mvn using the type tag in your dependencies. However, fyi, I haven't been able to successfully google for the quintessential classes in the hadoop test libs like

Re: Hadoop automated tests

2013-10-16 Thread Konstantin Boudnik
[Cc bigtop-dev@] We have stack tests as a part of Bigtop project. We don't do fault injection tests like you describe just yet, but that be a great contribution to the project. Cos On Wed, Oct 16, 2013 at 02:12PM, hdev ml wrote: Hi all, Are there automated tests available for testing

Re: datanode tuning

2013-10-07 Thread Rita
Thanks Ravi. The number of nodes isn't a lot but the size is rather large. Each data node has about 14-16T (560-640T). For the datanode block scanner, how can increase its Current scan rate limit KBps ? On Sun, Oct 6, 2013 at 11:09 PM, Ravi Prakash ravi...@ymail.com wrote: Please look at

Re: datanode tuning

2013-10-07 Thread Rita
From: Rita rmorgan...@gmail.com To: common-user@hadoop.apache.org common-user@hadoop.apache.org; Ravi Prakash ravi...@ymail.com Sent: Monday, October 7, 2013 5:55 AM Subject: Re: datanode tuning Thanks Ravi. The number of nodes isn't a lot but the size is rather large

Re: IP based hadoop cluster

2013-10-06 Thread sodul
I got hbase working. The trick was to properly configure the fs.defaultFS and hbase.rootdir. Other than that hbase does not seem to care about hostname vs ip address. Note that I use python to fill my templates, hence the %(hadoop.dfs.master)s syntax. Here hadoop.dfs.master is an ip address and

Re: datanode tuning

2013-10-06 Thread Ravi Prakash
Please look at dfs.heartbeat.interval and dfs.namenode.heartbeat.recheck-interval 40 datanodes is not a large cluster IMHO and the Namenode is capable of managing 100 times more datanodes. From: Rita rmorgan...@gmail.com To: common-user@hadoop.apache.org

Re: IP based hadoop cluster

2013-10-05 Thread sodul
The only security is the one provided by the slave/master whitelists (more dumb proof than attack proof, but still useful to avoid clusters talking to each other accidentally). I want to automate the deployment of hadoop clusters through Glu (from LinkedIn) since we already use it to do single

Re: rack awarness unexpected behaviour

2013-10-03 Thread Marc Sturlese
I've check it out and it works like that. The problem is, if the two racks have not the same capacity, one will have the disk space filled up much faster than the other (that's what I'm seeing). If one rack (rack A) has 2 servers of 8 cores with 4 reduce slots each and the other rack (rack B) has

Re: rack awarness unexpected behaviour

2013-10-03 Thread Michael Segel
Marc, The rack aware script is an artificial concept. Meaning you can tell which machine is in which rack and that may or may not reflect where the machine is actually located. The idea is to balance the number of nodes in the racks, at least on paper. So you can have 14 machines in rack 1,

Re: rack awarness unexpected behaviour

2013-10-03 Thread Marc Sturlese
Doing that will balance the block writing but I think here you loose the concept of physical rack awareness. Let's say you have 2 physical racks, one with 2 servers and one with 4. If you artificially tell hadoop that one rack has 3 servers and the other 3 you are loosing the concept of rack

Re: rack awarness unexpected behaviour

2013-10-03 Thread Michel Segel
And that's the rub. Rack awareness is an artificial construct. You want to fix it and match the real world, you need to balance the racks physically. Otherwise you need to rewrite load balancing to take in to consideration the number and power of the nodes in the rack. The short answer, it's

Re: rack awarness unexpected behaviour

2013-10-03 Thread Jun Ping Du
. Thanks, Junping - Original Message - From: Michael Segel michael_se...@hotmail.com To: common-user@hadoop.apache.org Cc: hadoop-u...@lucene.apache.org Sent: Thursday, October 3, 2013 8:23:58 PM Subject: Re: rack awarness unexpected behaviour Marc, The rack aware script is an artificial

Re: IP based hadoop cluster

2013-10-01 Thread Ravi Prakash
Is security on? I'm not entirely sure (and I think it might be illuminating to the rest of us when you work this out, so please email back when you do), but I am guessing that a code change may be required. I think I remember someone telling me that hostnames are reverse-lookup'd to verify

Re: no jobtracker to stop,no namenode to stop

2013-08-30 Thread NJain
Hey Nikhil, Just tried what you asked for and yes there are files and folders in c:/Hadoop/name (folders: current, image, previous.checkpoint, in_use.lock) and also tried with the firewall is disabled. Just want to let you know one more thing that when on Jobtracker UI, I click on '0' under

Re: no jobtracker to stop,no namenode to stop

2013-08-29 Thread NJain
Hi Nikhil, Appreciate your quick response on this, but the issue still continues. I believe I have covered all the pointers you have mentioned. Still I am pasting the portions of the documents so that you can verify. 1. /etc/hosts file, localhost should not be commented, and add ip address. The

Re: no jobtracker to stop,no namenode to stop

2013-08-28 Thread NJain
Hi, I am facing an issue where the map job is stuck at map 0% reduce 0%. I have installed Hadoop version 1.2.1 and am trying to run on my windows 8 machine using cygwin in pseudo distribution mode. I have followed the instruction at: http://hadoop.apache.org/docs/stable/single_node_setup.html

Re: rack awarness unexpected behaviour

2013-08-22 Thread Marc Sturlese
Jobs run on the whole cluster. After rebalancing everything is properly allocated. Then I start running jobs using all the slots of the 2 racks and the problem starts to happen. Maybe I'm missing something. When using the rack awareness, do you have to specify to the jobs to run in slots form both

Re: rack awarness unexpected behaviour

2013-08-22 Thread Nicolas Liochon
When you rebalance, the block is fully written, so the writer locality does not have to be taken into account (there is no writer anymore), hence it can rebalance across the racks. That's why jobs asymmetry was the easy guess. What's your hadoop version by the way? I remember a bug around rack

Re: rack awarness unexpected behaviour

2013-08-22 Thread Marc Sturlese
I'm on cdh3u4 (0.20.2), gonna try to read a bit on this bug -- View this message in context: http://lucene.472066.n3.nabble.com/rack-awareness-unexpected-behaviour-tp4086029p4086049.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: Submit RHadoop job using Ozzie in Cloudera Manager

2013-08-22 Thread guydou
Hi Rohit Did you succeed in running R script from Oozie action? If so can you share you action configuration? I am trying to figure out how to run a R script from Oozie -- View this message in context:

Re: rack awarness unexpected behaviour

2013-08-22 Thread Harsh J
I'm not aware of a bug in 0.20.2 that would not honor the Rack Awareness, but have you done the two below checks as well? 1. Ensuring JT has the same rack awareness scripts and configuration so it can use it for scheduling, and, 2. Checking if the map and reduce tasks are being evenly spread

Re: rack awarness unexpected behaviour

2013-08-22 Thread Michel Segel
Rack aware is an artificial concept. Meaning you can define where a node is regardless of is real position in the rack. Going from memory, and its probably been changed in later versions of the code... Isn't the replication... Copy on node 1, copy on same rack, third copy on different rack?

Re: rack awarness unexpected behaviour

2013-08-22 Thread Jun Ping Du
, August 22, 2013 6:57:15 PM Subject: Re: rack awarness unexpected behaviour Rack aware is an artificial concept. Meaning you can define where a node is regardless of is real position in the rack. Going from memory, and its probably been changed in later versions of the code... Isn't

Re: hadoop v0.23.9, namenode -format command results in Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode

2013-08-12 Thread Jane Wayne
thanks, i also tried using HADOOP_PREFIX but that didn't work. I still get the same error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode btw, how do we install hadoop-common and hadoop-hdfs? also, according to this link,

Re: hadoop v0.23.9, namenode -format command results in Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode

2013-08-11 Thread Harsh J
I don't think you ought to be using HADOOP_HOME anymore. Try unset HADOOP_HOME and then export HADOOP_PREFIX=/opt/hadoop and retry the NN command. On Sun, Aug 11, 2013 at 8:50 AM, Jane Wayne jane.wayne2...@gmail.com wrote: hi, i have downloaded and untarred hadoop v0.23.9. i am trying to set

Re: [ANNOUNCE] Hadoop version 1.2.1 (stable) released

2013-08-05 Thread Chris K Wensel
any particular reason the 1.1.2 releases were pulled from the mirrors (so quickly)? On Aug 4, 2013, at 2:08 PM, Matt Foley ma...@apache.org wrote: I'm happy to announce that Hadoop version 1.2.1 has passed its release vote and is now available. It has 18 bug fixes and patches over the

Re: [ANNOUNCE] Hadoop version 1.2.1 (stable) released

2013-08-05 Thread Matt Foley
It's still available in archive at http://archive.apache.org/dist/hadoop/core/. I can put it back on the main download site if desired, but the model is that the main download site is for stuff we actively want people to download. Here is the relevant quote from

Re: [ANNOUNCE] Hadoop version 1.2.1 (stable) released

2013-08-05 Thread Chris K Wensel
regardless of what was written in a wiki somewhere, it is a bit aggressive I think. there are a fair number of automated things that link to the former stable releases that are now broken as they weren't given a grace period to cut over. not the end of the world or anything. just a bit of a

Re: [ANNOUNCE] Hadoop version 1.2.1 (stable) released

2013-08-05 Thread Matt Foley
Chris, there is a stable link for exactly this purpose: http://www.apache.org/dist/hadoop/core/stable/ --Matt On Mon, Aug 5, 2013 at 11:43 AM, Chris K Wensel ch...@wensel.net wrote: regardless of what was written in a wiki somewhere, it is a bit aggressive I think. there are a fair number

Re: [ANNOUNCE] Hadoop version 1.2.1 (stable) released

2013-08-04 Thread Matt Foley
which will include Windows native compatibility. My apologies, this was incorrect. Windows has only been integrated to trunk and branch-2.1. Thanks, --Matt On Sun, Aug 4, 2013 at 2:08 PM, Matt Foley ma...@apache.org wrote: I'm happy to announce that Hadoop version 1.2.1 has passed its

RE: Multiple data node and namenode ?

2013-07-25 Thread Devaraj k
Hi Manish, Can you check how many data node processes are running really in the machine using the command 'jps' or 'ps'. Thanks Devaraj k -Original Message- From: Manish Bhoge [mailto:manishbh...@rocketmail.com] Sent: 25 July 2013 12:29 To: common-user@hadoop.apache.org Subject:

Re: Multiple data node and namenode ?

2013-07-25 Thread Manish Bhoge
in hdfs-site.xml? From: Devaraj k devara...@huawei.com To: common-user@hadoop.apache.org common-user@hadoop.apache.org Sent: Thursday, 25 July 2013 12:41 PM Subject: RE: Multiple data node and namenode ? Hi Manish,   Can you check how many data node processes

RE: Multiple data node and namenode ?

2013-07-25 Thread Devaraj k
datanode' shell command to know how many datanode processes are running at this moment. Thanks Devaraj k -Original Message- From: Manish Bhoge [mailto:manishbh...@rocketmail.com] Sent: 25 July 2013 12:56 To: common-user@hadoop.apache.org Subject: Re: Multiple data node and namenode

Re: Multiple data node and namenode ?

2013-07-25 Thread manishbh...@rocketmail.com
' shell command to know how many datanode processes are running at this moment. Thanks Devaraj k -Original Message- From: Manish Bhoge [mailto:manishbh...@rocketmail.com] Sent: 25 July 2013 12:56 To: common-user@hadoop.apache.org Subject: Re: Multiple data node and namenode

RE: Multiple data node and namenode ?

2013-07-25 Thread Devaraj k
To: common-user@hadoop.apache.org Subject: Re: Multiple data node and namenode ? Yes I have change the hostname and restarted datanode Sent via Rocket from my HTC - Reply message - From: Devaraj k devara...@huawei.com To: common-user@hadoop.apache.org common-user@hadoop.apache.org Subject

Re: Definite APi for Vidoe Processing

2013-07-24 Thread Mallika Pothukuchi
We can have it if u r able to process Sent from my iPhone On Jul 24, 2013, at 8:12 PM, Kasi Subrahmanyam kasisubbu...@gmail.com wrote: Hi, I am able to find that we have definite API for processing iages in hadoop using HIPI. Why dont we have the same for videos? Thanks, Subbu

RE: Sending the entire file content as value to the mapper

2013-07-11 Thread Devaraj k
Hi, You could send the file meta info to the map function as key/value through the split, and then you can read the entire file in your map function. Thanks Devaraj k -Original Message- From: Kasi Subrahmanyam [mailto:kasisubbu...@gmail.com] Sent: 11 July 2013 13:38 To:

Re: Task failure in slave node

2013-07-11 Thread devara...@huawei.com
Hi, It seems mahout-examples-0.7-job.jar is depending on other jars/classes. While running Job Tasks it is not able to find those classes in the classpath and failing those tasks. You need to provide the dependent jar files while submitting/running Job. Thanks Devaraj k -- View this

Re: Requesting containers on a specific host

2013-07-05 Thread devara...@huawei.com
Hi Kishore, As per the exception given, Node Manager is getting excluded. It might be happening that you have configured the Node Manager in excluded file using this configuration in Resource Manager. Could you check this configuration in RM, is it configured with any file and that file

RE: Output Directory not getting created

2013-07-03 Thread Devaraj k
Hi Kasi, I think MapR mailing list is the better place to ask this question. Thanks Devaraj k From: Kasi Subrahmanyam [mailto:kasisubbu...@gmail.com] Sent: 04 July 2013 08:49 To: common-user@hadoop.apache.org; mapreduce-u...@hadoop.apache.org Subject: Output Directory not getting created Hi,

Re: KrbException: Could not load configuration from SCDynamicStore in Eclipse on Mac

2013-06-17 Thread anil gupta
Hi Harsh, Awesome!! It worked. Thanks you so much. Actually, i went through that ticket before but the above option was not mentioned there. Additional Info for others: In the run configuration add -Djava.security.krb5.realm=yourrealm -Djava.security.krb5.kdc=yourkdc in VM arguments. Thanks,

Re: KrbException: Could not load configuration from SCDynamicStore in Eclipse on Mac

2013-06-16 Thread Harsh J
Anil, Please try the options provided at https://issues.apache.org/jira/browse/HADOOP-7489. Essentially, pass JVM system properties (In Eclipse you'll edit the Run Configuration for this) and add -Djava.security.krb5.realm=yourrealm -Djava.security.krb5.kdc=yourkdc and also ensure your Mac's

Re:5

2013-06-05 Thread sejong510
Nadine_RIOU http://fonio-bio.org/yahoo.com/bernard_blanchet.jpg

Re:6

2013-06-05 Thread sejong510
mapoun_prioux http://obsession.mu/yahoo.com/isabelle_maillard.jpeg

Re: Changing the maximum tasks per node on a per job basis

2013-05-24 Thread Steve Lewis
My reading on Capacity Scheduling is that it controls the number of jobs scheduled at the level of the cluster. My issue is not sharing at the level of the cluster - usually my job is the only one running but rather at the level of the individual machine. Some of my jobs require more memory and

Re: Changing the maximum tasks per node on a per job basis

2013-05-24 Thread Harsh J
Yes, you're correct that the end-result is not going to be as static as you expect it to be. FWIW, per node limit configs have been discussed before (and even implemented + removed): https://issues.apache.org/jira/browse/HADOOP-5170 On Fri, May 24, 2013 at 1:47 PM, Steve Lewis

Re: Changing the maximum tasks per node on a per job basis

2013-05-23 Thread Harsh J
Your problem seems to surround available memory and over-subscription. If you're using a 0.20.x or 1.x version of Apache Hadoop, you probably want to use the CapacityScheduler to address this for you. I once detailed how-to, on a similar question here: http://search-hadoop.com/m/gnFs91yIg1e On

Re: Running Hadoop client as a different user

2013-05-20 Thread Steve Lewis
I Found it works with the following You need to be in a thread as a PrivilegedExceptionAction final String user = My Identity; UserGroupInformation uig = UserGroupInformation.createRemoteUser(user); try { return uig.doAs(new PrivilegedExceptionActionReturnType() {

Re: Default permissions for Hadoop output files

2013-05-20 Thread Ravi Prakash
Hi Steve, You can use fs.permissions.umask-modeto set the appropriate umask From: Steve Lewis lordjoe2...@gmail.com To: common-user common-user@hadoop.apache.org Sent: Monday, May 20, 2013 9:33 AM Subject: Default permissions for Hadoop output files I am

Re: how to get the time of a hadoop cluster, v0.20.2

2013-05-18 Thread Michael Segel
Then you have a problem where the solution is more of people management and not technical. All of your servers should be using NTP. At a minimum, you have one server that gets the time from a national (government) time server, and then have all of the machines in that Data Center use that

Re: Running Hadoop client as a different user

2013-05-17 Thread Steve Lewis
Here is the issue - 1 - I am running a Java client on a machine unknown to the cluster - my default name on this pc is HYPERCHICKEN\local_admin - the name known to the cluster is slewis 2 Thew listed code String connectString = hdfs:// + host + : + port + /;* Configuration config =

Re: Running Hadoop client as a different user

2013-05-17 Thread Harsh J
Am not sure I'm getting your problem yet, but mind sharing the error you see specifically? That'd give me more clues. On Fri, May 17, 2013 at 2:39 PM, Steve Lewis lordjoe2...@gmail.com wrote: Here is the issue - 1 - I am running a Java client on a machine unknown to the cluster - my default

Re: how to get the time of a hadoop cluster, v0.20.2

2013-05-17 Thread Jane Wayne
What is meant by 'cluster time'? and What you want to achieve? let me try to clarify. i have a hadoop cluster (e.g. name node, data nodes, job tracker, task trackers, etc...). all the nodes in this hadoop cluster use ntp to sync time. i have another computer (which i have referred to as a

Re: how to get the time of a hadoop cluster, v0.20.2

2013-05-17 Thread Jane Wayne
You are searching for a solution in the Hadoop API (where this does not exist) thanks, that's all i needed to know. cheers. On Fri, May 17, 2013 at 9:17 AM, Niels Basjes ni...@basjes.nl wrote: Hi, i have another computer (which i have referred to as a server, since it is running

Re: how to get the time of a hadoop cluster, v0.20.2

2013-05-17 Thread Bertrand Dechoux
For hadoop, 'cluster time' is the local OS time. You might want to get the time of the namenode machine but indeed if NTP is correctly used, the local OS time from your server machine will be the best estimation. If you request the time from the namenode machine, you will be penalized by the delay

Re: how to get the time of a hadoop cluster, v0.20.2

2013-05-17 Thread Jane Wayne
if NTP is correclty used that's the key statement. in several of our clusters, NTP setup is kludgy. note that the professionals administering the cluster are different from us the engineers. so, there's a lot of red tape to go through to get something trivial or not fixed. we have noticed that

Re: how to get the time of a hadoop cluster, v0.20.2

2013-05-17 Thread Jane Wayne
and please remember, i stated that although the hadoop cluster uses NTP, the server (the machine that is not a part of the hadoop cluster) cannot assume to be using NTP (and in fact, doesn't). On Fri, May 17, 2013 at 10:10 AM, Jane Wayne jane.wayne2...@gmail.comwrote: if NTP is correclty used

Re: how to get the time of a hadoop cluster, v0.20.2

2013-05-16 Thread Jane Wayne
yes, but that gets the current time on the server, not the hadoop cluster. i need to be able to probe the date/time of the hadoop cluster. On Tue, May 14, 2013 at 5:09 PM, Niels Basjes ni...@basjes.nl wrote: I made a typo. I meant API (instead of SPI). Have a look at this for more

Re: how to get the time of a hadoop cluster, v0.20.2

2013-05-14 Thread Niels Basjes
If you have all nodes using NTP then you can simply use the native Java SPI to get the current system time. On Tue, May 14, 2013 at 4:41 PM, Jane Wayne jane.wayne2...@gmail.comwrote: hi all, is there a way to get the current time of a hadoop cluster via the api? in particular, getting the

Re: how to get the time of a hadoop cluster, v0.20.2

2013-05-14 Thread Jane Wayne
niels, i'm not familiar with the native java spi. spi = service provider interface? could you let me know if this spi is part of the hadoop api? if so, which package/class? but yes, all nodes on the cluster are using NTP to synchronize time. however, the server (which is not a part of the hadoop

Re: how to get the time of a hadoop cluster, v0.20.2

2013-05-14 Thread Niels Basjes
I made a typo. I meant API (instead of SPI). Have a look at this for more information: http://stackoverflow.com/questions/833768/java-code-for-getting-current-time If you have a client that is not under NTP then that should be the way to fix your issue. Once you have that getting the current

Re: Running Hadoop client as a different user

2013-05-13 Thread Harsh J
Hi Steve, A normally-written client program would work normally on both permissions and no-permissions clusters. There is no concept of a password for users in Apache Hadoop as of yet, unless you're dealing with a specific cluster that has custom-implemented it. Setting a specific user is not

Re: Is disk use reported with replication?

2013-04-23 Thread burberry blues
Hi I am new to Hadoop world. Can you please let me know what is a hadoop stack? Thanks, Burberry On Mon, Apr 22, 2013 at 10:19 AM, Keith Wiley kwi...@keithwiley.com wrote: Simple question: When I issue a hadoop fs -du command and/or when I view the namenode web UI to see HDFS disk

Re: Is disk use reported with replication?

2013-04-23 Thread Harsh J
Hi Keith, The fs -du computes length of files, and would not report replicated on-disk size. HDFS disk utilization OTOH, is the current, simple report of used/free disk space, which would certainly include replicated data. On Mon, Apr 22, 2013 at 10:49 PM, Keith Wiley kwi...@keithwiley.com

Re: JobSubmissionFiles: past , present, and future?

2013-04-12 Thread Jay Vyas
To update on this, it was just pointed out to me by matt farrallee that the auto fix of permissions is for a failsafe in case of a race condition, and not meant to mend bad permissions in all cases: https://github.com/apache/hadoop-common/commit/f25dc04795a0e9836e3f237c802bfc1fe8a243ad

Re: Protect from accidental deletes

2013-04-02 Thread kojie . fu
you can set the property fs.trash.interval From: Artem Ervits Date: 2013-04-02 05:04 To: common-user@hadoop.apache.org Subject: Protect from accidental deletes Hello all, I'd like to know what users are doing to protect themselves from accidental deletes of files and directories in HDFS?

RE: Protect from accidental deletes

2013-04-02 Thread ramon.pin
Hi Artem, right now HDFS has a trash functionality that moves files removed with 'hadoop dfs -rm' to an intermediate directory (/trash). You can configure how may time a file spends in that directory before it's actually removed from the filesystem. Look for 'fs.trash.interval' on your

Re: How can I unsubscribe from this list?

2013-03-17 Thread Harsh J
From your email header: List-Unsubscribe: mailto:common-user-unsubscr...@hadoop.apache.org On Wed, Mar 13, 2013 at 10:42 AM, Alex Luya alexander.l...@gmail.com wrote: can't find a way to unsubscribe from this list. -- Harsh J

Re: hardware for hdfs

2013-03-17 Thread Rita
any thought? On Wed, Mar 13, 2013 at 7:17 PM, Rita rmorgan...@gmail.com wrote: i am planning to build a hdfs cluster primary for streaming large files (10g avg size). I was wondering if anyone can recommend a good hardware vendor. -- --- Get your facts first, then you can distort them as

Re: how to resolve conflicts with jar dependencies

2013-03-12 Thread Luke Lu
The problem is resolved in the next release of hadoop (2.0.3-alpha cf. MAPREDUCE-1700) For hadoop 1.x based releases/distributions, put -Dmapreduce.user.classpath.first=true on the hadoop command line and/or client config On Tue, Mar 12, 2013 at 6:49 AM, Jane Wayne

Re: problems under same hosts and different ip addresses

2013-03-09 Thread Rajiv Chittajallu
check dfs.include in your namenode. Entries in there should resolve to new addresses. On Feb 19, 2013, at 18:23, Henry JunYoung KIM henry.jy...@gmail.com wrote: hi, hadoopers. Recently, we've moved our clusters to another idc center. We keep the same host-names, but, they have now

Re: Running hadoop on directory structure

2013-03-07 Thread mlabour
Here is a possible solution To add a root directory structure such as the following as InputPath, do the following For the output to mirror the input and to build based on Harsh J's response You might be able to not use a reducer and use the MultipleOutput from the mapper directly.

Re: How to handle sensitive data

2013-03-01 Thread abhishek
Michael, So as you said, do you want the upstream to encrypt data before sending it to HDFS. Regards Abhishek On Feb 15, 2013, at 8:47 AM, Michael Segel michael_se...@hotmail.com wrote: Simple, have your app encrypt the field prior to writing to HDFS. Also consider HBase. On Feb 14,

Re: How to handle sensitive data

2013-03-01 Thread abhishek
too. De: Michael Segel michael_se...@hotmail.com Para: common-user@hadoop.apache.org CC: cdh-u...@cloudera.org Enviados: Viernes, 15 de Febrero 2013 8:47:16 Asunto: Re: How to handle sensitive data Simple, have your app encrypt the field prior to writing to HDFS. Also consider HBase

Re: How to handle sensitive data

2013-03-01 Thread Robert Marshall
application. I recommend to use HBase too. De: Michael Segel michael_se...@hotmail.com Para: common-user@hadoop.apache.org CC: cdh-u...@cloudera.org Enviados: Viernes, 15 de Febrero 2013 8:47:16 Asunto: Re: How to handle sensitive data Simple, have your app encrypt the field prior

Re: Locks in HDFS

2013-02-22 Thread abhishek
Harsh, Can we load the file into HDFS with one replication and lock the file. Regards Abhishek On Feb 22, 2013, at 1:03 AM, Harsh J ha...@cloudera.com wrote: HDFS does not have such a client-side feature, but your applications can use Apache Zookeeper to coordinate and implement this on

Re: Locks in HDFS

2013-02-22 Thread Harsh J
Hi Abhishek, I fail to understand what you mean by that; but HDFS generally has no client-exposed file locking on reads. There's leases for preventing multiple writers to a single file, but nothing on the read side. Replication of the blocks under a file is a different concept and is completely

  1   2   3   4   5   6   7   8   9   10   >