Re: Job Tracker TaskTracker logs location Understaning

2013-01-28 Thread Jeff Bean
Give this a read, and let me know if there are still questions: http://blog.cloudera.com/blog/2009/09/apache-hadoop-log-files-where-to-find-them-in-cdh-and-what-info-they-contain/ On Mon, Jan 28, 2013 at 3:56 AM, Dhanasekaran Anbalagan bugcy...@gmail.comwrote: Hi Guys, How to understand

Re: Query: Hadoop's threat to Informatica

2013-01-18 Thread Jeff Bean
Informatica's take on the question: http://www.informatica.com/hadoop/ My take on the question: Hadoop is definitely disruptive and there have been times where we've been able to blow missed data pipeline SLAs out of the water using Hadoop where tools like Informatica were not able to. But

Re: Fair Scheduler is not Fair why?

2013-01-17 Thread Jeff Bean
it. On Wed, Jan 16, 2013 at 12:02 PM, Jeff Bean jwfb...@cloudera.com wrote: Validate your scheduler capacity and behavior by using sleep jobs. Submit sleep jobs to the pools that mirror your production jobs and just check that the scheduler pool allocation behaves as you expect. The nice thing about

Re: When reduce tasks start in MapReduce Streaming?

2013-01-16 Thread Jeff Bean
a streaming application (The reduces don't receive data as long as it is produced by the map tasks)? On 16 January 2013 05:41, Jeff Bean jwfb...@cloudera.com wrote: me property. The reduce method is not called until the mappers are done, and the reducers are not scheduled before

Re: Fair Scheduler is not Fair why?

2013-01-16 Thread Jeff Bean
Validate your scheduler capacity and behavior by using sleep jobs. Submit sleep jobs to the pools that mirror your production jobs and just check that the scheduler pool allocation behaves as you expect. The nice thing about sleep is that you can mimic your real jobs: numbers of tasks and how long

Re: A simple question - how to understand version number in Hadoop-0.20.2+923.421 ?

2013-01-16 Thread Jeff Bean
This is a Cloudera release level. Specifically, it's CDH3, update 5. It means we've taken hadoop-0.20.2, added 923 additional patches during beta for CDH3, 421 patches applied after the stable version of CDH. https://ccp.cloudera.com/display/DOC/CDH+Version+and+Packaging+Information A dot (.)

Re: When reduce tasks start in MapReduce Streaming?

2013-01-15 Thread Jeff Bean
Hi Pedro, Yes, Hadoop Streaming has the same property. The reduce method is not called until the mappers are done, and the reducers are not scheduled before the threshold set by mapred.reduce.slowstart.completed.maps is reached. On Tue, Jan 15, 2013 at 3:06 PM, Pedro Sá da Costa

Re: Profiler in Hadoop MapReduce

2012-12-30 Thread Jeff Bean
Hi Pedro, Have you read the documentation on profiling MapReduce? http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html#Profiling Jeff On Sat, Dec 15, 2012 at 7:20 AM, Pedro Sá da Costa psdc1...@gmail.comwrote: Hi I want to attach jprofiler to Hadoop MapReduce (MR). DO I need to

Re: Fixing Mis-replicated blocks

2011-10-20 Thread Jeff Bean
Do setrep -w on the increase to force the new replica before decreasing again. Of course, the little script only works if the replication factor is 3 on all the files. If it's a variable amount you should use the java API to get the existing factor and then increase by one and then decrease.

Re: About the combiner execution

2011-07-10 Thread Jeff Bean
Yes this is true. Combiner may never run if intermediate values don't need to shuffle out to disk before the final output is done. Also, combiner cant be substituted as a reducer. Sent from my iPad On Jul 10, 2011, at 4:42, Florin P florinp...@yahoo.com wrote: Hello! I've read on

Re: When does Reduce job start

2011-01-04 Thread Jeff Bean
It's part of the design that reduce() does not get called until the map phase is complete. You're seeing reduce report as started when map is at 90% complete because hadoop is shuffling data from the mappers that have completed. As currently designed, you can't prematurely start reduce() because

Re: MapFiles error Could not obtain block

2010-11-18 Thread Jeff Bean
Hi Kim, I saw this problem once, turned out the block was getting deleted before it was read. Check namenode for blk_-7027776556206952935_61338. What's the story there? Jeff On Thu, Nov 18, 2010 at 12:45 PM, Kim Vogt k...@simplegeo.com wrote: Hi, I'm using the MapFileOutputFormat to lookup

Re: string conversion problems

2010-07-16 Thread Jeff Bean
Is the tab the delimiter between records or between keys and values on the input? in other words does the input file look like this: a\tb b\tc c\ta or does it look like this: a b\tb c\tc a\t ? Jeff On Thu, Jul 15, 2010 at 6:18 PM, Nikolay Korovaiko korovai...@gmail.comwrote: Hi

Re: string conversion problems

2010-07-16 Thread Jeff Bean
, Jul 16, 2010 at 9:16 AM, Jeff Bean jwfb...@cloudera.com wrote: Is the tab the delimiter between records or between keys and values on the input? in other words does the input file look like this: a\tb b\tc c\ta or does it look like this: a b\tb c\tc a\t ? Jeff

Re: calling C programs from Hadoop

2010-05-31 Thread Jeff Bean
Hi Michael, Why did you determine that Hadoop streaming was insufficient for you? Jeff On Mon, May 31, 2010 at 9:17 AM, Michael Robinson hadoopmich...@gmail.comwrote: Hi Jef, I have a C program that processes very large data files which are compressed, so this program has to have full

Re: calling C programs from Hadoop

2010-05-30 Thread Jeff Bean
Hi Michael, How come you can't specify the C program as the mapper in streaming and just have no reducers? Jeff On Sat, May 29, 2010 at 6:14 PM, Michael Robinson hadoopmich...@gmail.comwrote: Thanks for your answers. I have read hadoop streaming and I think it is great, however what I am