Is it possible to set FS permissions (e.g. 755) in hdfs-site.xml?

2013-03-28 Thread Pedro Sá da Costa
Is it possible to set FS permissions (e.g. 755) in hdfs-site.xml? -- Best regards,

FSDataOutputStream can write in a file in a remote host?

2013-03-28 Thread Pedro Sá da Costa
FSDataOutputStream can write in a file in a remote host? -- Best regards,

Re: Is it possible to set FS permissions (e.g. 755) in hdfs-site.xml?

2013-03-28 Thread Vinod Kumar Vavilapalli
I suppose you mean the default FS permissions. You can use dfs.umask for that. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Mar 28, 2013, at 5:23 AM, Pedro Sá da Costa wrote: Is it possible to set FS permissions (e.g. 755) in hdfs-site.xml? -- Best

Which hadoop installation should I use on ubuntu server?

2013-03-28 Thread David Parks
I'm moving off AWS MapReduce to our own cluster, I'm installing Hadoop on Ubuntu Server 12.10. I see a .deb installer and installed that, but it seems like files are all over the place `/usr/share/Hadoop`, `/etc/hadoop`, `/usr/bin/hadoop`. And the documentation is a bit harder to follow:

Re: Which hadoop installation should I use on ubuntu server?

2013-03-28 Thread Nitin Pawar
apache bigtop has builds done for ubuntu you can check them at jenkins mentioned on bigtop.apache.org On Thu, Mar 28, 2013 at 11:37 AM, David Parks davidpark...@yahoo.comwrote: I’m moving off AWS MapReduce to our own cluster, I’m installing Hadoop on Ubuntu Server 12.10. ** ** I see

Re: Auto clean DistCache?

2013-03-28 Thread Harsh J
The DistributedCache is cleaned automatically and no user intervention (aside of size limitation changes, which may be an administrative requirement) is generally required to delete the older distributed cache files. This is observable in code and is also noted in TDG, 2ed.: Tom White: The

Re: Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput

2013-03-28 Thread Ted Dunning
The EMR distributions have special versions of the s3 file system. They might be helpful here. Of course, you likely aren't running those if you are seeing 5MB/s. An extreme alternative would be to light up an EMR cluster, copy to it, then to S3. On Thu, Mar 28, 2013 at 4:54 AM, Himanish

Re: Static class vs Normal Class when to use

2013-03-28 Thread Ted Dunning
Another Ted piping in. For Hadoop use, it is dangerous to use anything but a static class for your mapper and reducer functions since you may accidentally think that you can access a closed variable from the parent. A static class cannot reference those values so you know that you haven't made

Re: Which hadoop installation should I use on ubuntu server?

2013-03-28 Thread Ted Dunning
Also, Canonical just announced that MapR is available in the Partner repos. On Thu, Mar 28, 2013 at 7:22 AM, Nitin Pawar nitinpawar...@gmail.comwrote: apache bigtop has builds done for ubuntu you can check them at jenkins mentioned on bigtop.apache.org On Thu, Mar 28, 2013 at 11:37 AM,

DFSOutputStream.sync() method latency time

2013-03-28 Thread lei liu
When client write data, if there are three replicates, the sync method latency time formula should be: sync method latency time = first datanode receive data time + sencond datanode receive data time + third datanode receive data time. if the three datanode receive data time all are 2

Fair scheduler not linked in the document index page for hadoop 2.x

2013-03-28 Thread Zheng, Kai
Hi all, Fair Scheduler link is not added in the document index page for hadoop 2.x, as Capacity Scheduler does, like http://hadoop.apache.org/docs/r2.0.3-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html Should we add it if it can be experimentally used? Regards, Kai

Find reducer for a key

2013-03-28 Thread Alberto Cordioli
Hi everyone, how can i know the keys that are associated to a particular reducer in the setup method? Let's assume in the setup method to read from a file where each line is a string that will become a key emitted from mappers. For each of these lines I would like to know if the string will be a

Re: Find reducer for a key

2013-03-28 Thread Hemanth Yamijala
Hi, Not sure if I am answering your question, but this is the background. Every MapReduce job has a partitioner associated to it. The default partitioner is a HashPartitioner. You can as a user write your own partitioner as well and plug it into the job. The partitioner is responsible for

Re: Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput

2013-03-28 Thread Himanish Kushary
Hi Dave, Thanks for your reply. Our hadoop instance is inside our corporate LAN.Could you please provide some details on how i could use the s3distcp from amazon to transfer data from our on-premises hadoop to amazon s3. Wouldn't some kind of VPN be needed between the Amazon EMR instance and our

[no subject]

2013-03-28 Thread oualid ait wafli
Hi Sameone know samething about EMC distribution for Big Data which itegrate Hadoop and other tools ? Thanks

Re: Find reducer for a key

2013-03-28 Thread Alberto Cordioli
Hi Hemanth, thanks for your reply. Yes, this partially answered to my question. I know how hash partitioner works and I guessed something similar. The piece that I missed was that mapred.task.partition returns the partition number of the reducer. So, putting al the pieces together I undersand

Re: Find reducer for a key

2013-03-28 Thread Hemanth Yamijala
Hmm. That feels like a join. Can't you read the input file on the map side and output those keys along with the original map output keys.. That way the reducer would automatically get both together ? On Thu, Mar 28, 2013 at 5:20 PM, Alberto Cordioli cordioli.albe...@gmail.com wrote: Hi

Re: Find reducer for a key

2013-03-28 Thread Alberto Cordioli
Yes, that is a possible solution. But since the MR job has another scope, the mappers already read other files (very large) and output tuples. You cannot control the number of mappers and hence the risk is that a lot of mappers will be created, and each of them read also the other file instead of

Re: Hadoop Mapreduce fails with permission management enabled

2013-03-28 Thread Marcos Sousa
I solved the problem using Capacity Scheduler, because I'm using 1.0.4 It is known issue solved in version 1.2.0 ( https://issues.apache.org/jira/browse/MAPREDUCE-4398). On Thu, Mar 28, 2013 at 11:08 AM, Bertrand Dechoux decho...@gmail.comwrote: Permission denied: user=*realtime*,

Re:

2013-03-28 Thread Yanbo Liang
You can get detail information from the Greenplum website: http://www.greenplum.com/products/pivotal-hd 2013/3/28 oualid ait wafli oualid.aitwa...@gmail.com Hi Sameone know samething about EMC distribution for Big Data which itegrate Hadoop and other tools ? Thanks

Re: DFSOutputStream.sync() method latency time

2013-03-28 Thread Yanbo Liang
1st when client wants to write data to HDFS, it should be create DFSOutputStream. Then the client write data to this output stream and this stream will transfer data to all DataNodes with the constructed pipeline by the means of Packet whose size is 64KB. These two operations is concurrent, so the

Re: Inspect a context object and see whats in it

2013-03-28 Thread Yanbo Liang
You can try to add some probes to source code and recompile it. If you want to know the keys and values you add at each step, you can add print code to map() function of class Mapper and reduce() function of class Reducer. The shortcoming is that you will produce many log output which may fill the

Re: Auto clean DistCache?

2013-03-28 Thread Jean-Marc Spaggiari
Thanks Harsh. My issue was not related to the number of files/folders but related to the total size of the DistributedCache. The directory where it's stored only has 7GB available... So I will setup the limit to 5GB with local.cache.size, or move it to the drives there I have the dfs files stored.

Re: Put file with WebHDFS, and can i put file by chunk with curl ?

2013-03-28 Thread Alejandro Abdelnur
you could use WebHDFS/HttpFS and the APPEND operation. thx On Wed, Mar 27, 2013 at 1:25 AM, 小学园PHP xxy-...@qq.com wrote: I want to put file to HDFS with curl, and more i need to put it by chunks. So, does somebody know if curl can upload file by chunk? Or ,who has work with WebHDFS by

Hadoop Streaming - how to specify mapper scripts hosting on HDFS

2013-03-28 Thread praveenesh kumar
Hi, I am trying to run a hadoop streaming job, where I want to specify my mapper script residing on HDFS. Currently its trying to locate the script on local FS only. Is there a option available through which I can specify hadoop streaming to look for the mapper script on HDFS, not on local FS.

Re: Which hadoop installation should I use on ubuntu server?

2013-03-28 Thread Marcos Luis Ortiz Valmaseda
In BigTop´s wiki, you can find this: https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+0.5.0#HowtoinstallHadoopdistributionfromBigtop0.5.0-Ubuntu%2864bit%2Clucid%2Cprecise%2Cquantal%29 2013/3/28 Ted Dunning tdunn...@maprtech.com Also, Canonical

Why do some blocks refuse to replicate...?

2013-03-28 Thread Felix GV
Hello, I've been running a virtualized CDH 4.2 cluster. I now want to migrate all my data to another (this time physical) set of slaves and then stop using the virtualized slaves. I added the new physical slaves in the cluster, and marked all the old virtualized slaves as decommissioned using

Re: Why do some blocks refuse to replicate...?

2013-03-28 Thread MARCOS MEDRADO RUBINELLI
Felix, After changing hdfs-site.xml, did you run hadoop dfsadmin -refreshNodes? That should have been enough, but you can try increasing the replication factor of these files, wait for them to be replicated to the new nodes, then setting it back to its original value. Cheers, Marcos In

Re: Why do some blocks refuse to replicate...?

2013-03-28 Thread Felix GV
Yes, I didn't specify how I was testing my changes, but basically, here's what I did: My hdfs-site.xml file was modified to include a reference the a file containing a list of all datanodes (via dfs.hosts) and a reference to a file containing decommissioned nodes (via dfs.hosts.exclude). After

Re: Why do some blocks refuse to replicate...?

2013-03-28 Thread Tapas Sarangi
Did you check if you have any disk that is read-only for the nodes that has the missing blocks ? If you know which are the blocks, you can manually copy the blocks and the corresponding '.meta' file to another node. Hadoop will re-read those blocks and replicate them. - On Mar 28, 2013,

Re: Why do some blocks refuse to replicate...?

2013-03-28 Thread Azuryy Yu
which hadoop version you used? On Mar 29, 2013 5:24 AM, Felix GV fe...@mate1inc.com wrote: Yes, I didn't specify how I was testing my changes, but basically, here's what I did: My hdfs-site.xml file was modified to include a reference the a file containing a list of all datanodes (via

Hadoop streaming weird problem

2013-03-28 Thread jamal sasha
Hi, I am facing a weird problem. My python scripts were working just fine. I made few modifications.. tested via: cat input.txt | python mapper.py | sort | python reducer.py runs just fine Ran on my local machine (pseudo-distributed mode) THat also runs just fine Deployed on clusters.. Now,

Re: Why do some blocks refuse to replicate...?

2013-03-28 Thread Felix GV
I'm using the version of hadoop in CDH 4.2, which is a version of Hadoop 2.0 with a bunch of patches on top... I've tried copying one block and its .meta file to one of my new DN, then restarted the DN service, and it did pick up the missing block and replicate it properly within the new slaves.

Re: Hadoop streaming weird problem

2013-03-28 Thread jamal sasha
Very much like this: http://stackoverflow.com/questions/13445126/python-code-is-valid-but-hadoop-streaming-produces-part-0-empty-file On Thu, Mar 28, 2013 at 5:10 PM, jamal sasha jamalsha...@gmail.com wrote: Hi, I am facing a weird problem. My python scripts were working just fine. I

Re: Why do some blocks refuse to replicate...?

2013-03-28 Thread Tapas Sarangi
On Mar 28, 2013, at 7:13 PM, Felix GV fe...@mate1inc.com wrote: I'm using the version of hadoop in CDH 4.2, which is a version of Hadoop 2.0 with a bunch of patches on top... I've tried copying one block and its .meta file to one of my new DN, then restarted the DN service, and it did

Re: Hadoop streaming weird problem

2013-03-28 Thread jamal sasha
oops never mind guys.. figured out the issue. sorry for spamming. On Thu, Mar 28, 2013 at 5:15 PM, jamal sasha jamalsha...@gmail.com wrote: Very much like this: http://stackoverflow.com/questions/13445126/python-code-is-valid-but-hadoop-streaming-produces-part-0-empty-file On Thu, Mar

Re: DFSOutputStream.sync() method latency time

2013-03-28 Thread lei liu
Thanks Yanbo for your reply. I test code are : FSDataOutputStream outputStream = fs.create(path); Random r = new Random(); long totalBytes = 0; String str = new String(new byte[1024]); while(totalBytes 1024 * 1024 * 500) { byte[] bytes =

file reappear even after deleted in hadoop 1.0.4

2013-03-28 Thread Henry Hung
Hi, I'm using hadoop 1.0.4. Today I want to delete a file in hdfs, but after a while, the file reappear again. I use both type of remove command: hadoop fs -rm and hadoop fs -rmr, but the file still reappear after a while. I inspect the namenode log and saw repetition of block/dir/removing lease

Re: DFSOutputStream.sync() method latency time

2013-03-28 Thread Yanbo Liang
The write method write data to memory of client, the sync method send package to pipeline I thin you made a mistake for understanding the write procedure of HDFS. It's right that the write method write data to memory of client, however the data in the client memory is sent to DataNodes at the

Re: Find reducer for a key

2013-03-28 Thread Hemanth Yamijala
Hi, The way I understand your requirement - you have a file that contains a set of keys. You want to read this file on every reducer and take only those entries of the set, whose keys correspond to the current reducer. If the above summary is correct, can I assume that you are potentially

Re: QIM failover callback patch

2013-03-28 Thread Azuryy Yu
Sorry! Todd has been reviewed it. On Fri, Mar 29, 2013 at 11:40 AM, Azuryy Yu azury...@gmail.com wrote: hi, who can review this one: https://issues.apache.org/jira/browse/HDFS-4631 thanks.

RE: file reappear even after deleted in hadoop 1.0.4

2013-03-28 Thread Henry Hung
Sorry, please ignore my question. It appears that the problem is from program that upload files into hadoop. I'm too quick to assume that the problem relies inside hadoop. Sorry for being a noob in hadoop. Best regards, Henry Hung. From: MA11 YTHung1 Sent: Friday, March 29, 2013 11:21 AM To:

RE: Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput

2013-03-28 Thread David Parks
None of that complexity, they distribute the jar publicly (not the source, but the jar). You can just add this to your libjars: s3n://region.elasticmapreduce/libs/s3distcp/latest/s3distcp.jar No VPN or anything, if you can access the internet you can get to S3. Follow their docs here: