Re: Question about Block size configuration

2015-05-12 Thread Himawan Mahardianto
thank you for the explanation, and how much byte each metadata will consuming in RAM if BS is 64MB or smaller than that? I heard every metadata will store on RAM right?

Pig 0.14.0 on Hadoop 2.6.0 deprecation errors

2015-05-12 Thread Anand Murali
Hi All: I have installed above and made corresponding changes to core-site,hdfs-site and mapred-site.xml and still get deprecation error. On system startup I run . .hadoop export HADOOP_HOME=/home/anand_vihar/hadoop-2.6.0 export JAVA_HOME=/home/anand_vihar/jdk1.7.0_75/ export

Re: Reading a sequence file from distributed cache

2015-05-12 Thread Marko Dinic
Hello Shahab, I'm using 1.2.1 in pseudo-distributed mode and the same code on a cluster with 0.20.2, but I'm having same problem in both cases. I'm hopping that 1.2.1 code is back-compatible with 0.20.2 cluster? Do you have any idea what could be the problem? And what do you mean by - Have

Re: Pig 0.14.0 on Hadoop 2.6.0 deprecation errors

2015-05-12 Thread Olivier Renault
You don't have an error. You are seeing normal info messages. Thanks, Olivier _ From:Anand Murali Subject:Pig 0.14.0 on Hadoop 2.6.0 deprecation errors To:user@hadoop.apache.org Hi All: I have installed above and made corresponding changes to core-site,hdfs-site

Re: namenode uestion

2015-05-12 Thread Harsh J
Unless you turn on dfs.client.use.datanode.hostname, the NN will always use IPs to denote replica location addresses. On Sun, May 10, 2015 at 9:41 PM, Pravin Sinha pks_chen...@yahoo.com wrote: Hi Asanjar, My understanding is that it returns serialized BlockLocation instances which holds the

Execute an external command with Hadoop 2.6.0

2015-05-12 Thread Pasquale Salza
Hi there, I have a Hadoop 2.6.0 cluster running on CentOS, Hortonworks distribution. I'm trying to execute an external command within a Mapper execution, but I did't manage to invoke a script neither with Shell.ShellCommandExecutor nor with ProcessBuilder. It is like it can't read from the host

Re: How to access value of variable in Driver class which has been declared and modified inside Mapper class?

2015-05-12 Thread Shahab Yunus
Here are some examples of how to use custom counters: http://www.ashishpaliwal.com/blog/2012/05/hadoop-recipe-using-custom-java-counters/ Regards, Shahab On May 12, 2015 1:29 PM, Shahab Yunus shahab.yu...@gmail.com wrote: Better options than using static variable are, imo: One option it use

How to access value of variable in Driver class which has been declared and modified inside Mapper class?

2015-05-12 Thread Answer Agrawal
Hi, I declared a variable and incremented/modified it inside Mapper class. Now I need to use the modified value of that variable in Driver class. I declared a static variable inside Mapper class and its modified value works in Driver class when I run the code in Eclipse IDE. But after creating

Re: How to access value of variable in Driver class which has been declared and modified inside Mapper class?

2015-05-12 Thread Shahab Yunus
Better options than using static variable are, imo: One option it use Counters. Check that API. We are using that for values that are numeric and we need those in the driver once the job finishes. You can create your custom counters too. Other option is (if you need more than just one value or

Re: Pig 0.14.0 on Hadoop 2.6.0 deprecation errors

2015-05-12 Thread Prashant Kommireddi
Something that needs correction, just that no one has gotten around to doing it. Please feel free to open a JIRA, even better if you would like to contribute a fix. On Tuesday, May 12, 2015, Anand Murali anand_vi...@yahoo.com wrote: Oliver: Many thanks for reply. If it is not an error why is

suppress empty part files output

2015-05-12 Thread Shushant Arora
while using multipleoutputs we use , LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class); to suppress default empty part files from reducers. Whats the syntax of using this in map-reduce action of oozie ?or in in xml file for ToolsRuner. Is

Smaller block size for more intense jobs

2015-05-12 Thread marko.dinic
Hello, I'm in doubt should I specify the block size to be smaller than 64MB in case that my mappers need to do intensive computations? I know that it is better to have larger files, since the replication and NameNode as a weak point, but I'm don't have that much data, but the operations

Re: distcp fails with s3n or s3a in 2.6.0

2015-05-12 Thread Stephen Armstrong
Thanks Chris, I don't know why I couldn't find that e-mail chain, but the mapreduce.application.classpath property is what I needed to change. Thanks for the help. Steve On Mon, May 11, 2015 at 9:59 PM, Chris Nauroth cnaur...@hortonworks.com wrote: Hello Steve, There was a similar

Re: Re: Filtering by value in Reducer

2015-05-12 Thread Drake민영근
Hi, Peter The missing records, they are just gone without no logs? How about your reduce tasks logs? Thanks Drake 민영근 Ph.D kt NexR On Tue, May 12, 2015 at 5:18 AM, Peter Ruch rutschifen...@gmail.com wrote: Hello, sum and threshold are both Integers. for the threshold variable I first add

Re: Question about Block size configuration

2015-05-12 Thread Drake민영근
Hi I think metadata size is not greatly different. The problem is the number of blocks. The block size is lesser than 64MB, more block generated with the same file size(if 32MB then 2x more blocks). And, yes. all metadata is in the namenode's heap memory. Thanks. Drake 민영근 Ph.D kt NexR On

output directory in Pig

2015-05-12 Thread Anand Murali
Dear All: I am running pig 0.14.0 on hadoop 2.6 pseudo mode.  I would like to know, where I can set job output path, such that I can manage output files. Reply most welcome. Thanks Regards  Anand Murali  

Re: Reading a sequence file from distributed cache

2015-05-12 Thread Shahab Yunus
getLocalCacheFiles is deprecated and can only access files that were downloaded locally to the node running the task. Use of getCacheFiles is encouraged now which downloads using a URI. Have you seen this?

Lost mapreduce applications displayed in UI

2015-05-12 Thread hitarth trivedi
Hi, My cluster suddenly stopped displaying application information in UI ( http://localhost:8088/cluster/apps). Although the counters like 'Apps Submitted' , 'Apps Completed', 'Apps Running' etc, all seems to increment accurately and display right information, whnever I start new mapreduce job.

Re: Lost mapreduce applications displayed in UI

2015-05-12 Thread Zhijie Shen
?Maybe you have hit the completed app limit (1 by default). Once the limit hits, the oldest completed app will be removed from cache. - Zhijie From: hitarth trivedi t.hita...@gmail.com Sent: Tuesday, May 12, 2015 3:32 PM To: user@hadoop.apache.org Subject:

Re: Re: Re: Filtering by value in Reducer

2015-05-12 Thread Peter Ruch
Hi, I already skimmed through the logs but I could not find anything special. I am just really confused why I am having this problem. If the Iterable... for a specific key contains all of the observed values - and it seems to do so otherwise the program wouldn't work correctly in the standard

Re: output directory in Pig

2015-05-12 Thread Ted Yu
Looks like a question for pig mailing list: http://pig.apache.org/mailing_lists.html#Users Cheers On May 12, 2015, at 4:14 AM, Anand Murali anand_vi...@yahoo.com wrote: Dear All: I am running pig 0.14.0 on hadoop 2.6 pseudo mode. I would like to know, where I can set job output path,

Re: Reading a sequence file from distributed cache

2015-05-12 Thread Marko Dinic
Dear Shahab, Thanks, I didn't understand that. Now I get it. Best regards, Marko On Tue 12 May 2015 01:38:52 PM CEST, Shahab Yunus wrote: getLocalCacheFiles is deprecated and can only access files that were downloaded locally to the node running the task. Use of getCacheFiles is encouraged

Re: Re: Re: Filtering by value in Reducer

2015-05-12 Thread Shahab Yunus
Have you tried explicitly printing or logging in you reducer around the code that compares and then outputs the values? Maybe that will give you a clue that what is happening? Debug the threshold value that you get in the reducer and whether that is what you have set or not (in case of when you

Re: Smaller block size for more intense jobs

2015-05-12 Thread Harshit Mathur
Hi Marko, If your files are very small (less than the block size) then a lot of map tasks will get executed, but as the initialization and overheads degrades the overall performance, so it might appear that the single map is executing very fast but the overall job execution will take more time.

Cannot initialize cluser issue - Why jobclient-tests jar is needed?

2015-05-12 Thread rab ra
Hello In one of my use case, i am running a hadoop job using the following command java -cp /etc/hadoop/conf myjob.class This command gave some error that cannot initialize cluster. please check the configuration for mapreduce.framework.name and the correspond server address i understand that

Re: Re: Re: Re: Filtering by value in Reducer

2015-05-12 Thread Peter Ruch
Hi, No, I did not create any custom logs, I was only looking through the standard logs. I just started out with Hadoop and did not think of explicitly logging that part of the code, as I thought that I am simply missing a small detail that someone of you might spot. But I will definitely

Re: URI missing scheme and authority in job start with new FileSystem implementation

2015-05-12 Thread Silvan Kaiser
Hi Varun, hi List! Just a small success feedback note: It took me quite a while but in the end i found out that not mine but AbstractFileSystem.java's resolvePath() method was used, sigh. Solution was simply to add an override in the DelegateToFileSystem impl, this override explicitely calls

RE: Lost mapreduce applications displayed in UI

2015-05-12 Thread Rohith Sharma K S
Hi, Do you remember the steps when applications won’t be displayed in RM web UI? I mean after which actions in the RM web UI applications are not displaying? Is there any filtering is applied in the UI like “Showing 0 to 0 of 0 entries (filtered from 4 total entries)” in the bottom of RM

Re: Reading a sequence file from distributed cache

2015-05-12 Thread Marko Dinic
Hello, I have used getCacheFiles() instead of getLocalCacheFiles() and now it works. Can someone please explain the difference between the two? I'm not able to find some good explanation about it to understand how it works. Thanks, Marko On 05/11/2015 11:25 PM, marko.di...@nissatech.com