Steps for container release

2015-02-20 Thread Fabio C.
Hi everyone, I was trying to understand the process that makes the resources of a container available again to the ResourceManager. As far as I can guess from the logs, the AM: - sends a stop request to the NodeManager for the specific container - suddenly tells the RM about the release of the

Re: Prune out data to a specific reduce task

2015-03-11 Thread Fabio C.
As far as I know the code running in each reducer is the same you specify in your reduce function, so if you know in advance the features of the data you want to ignore you can just instruct reducers to do so. If you are able to tell whether or not to keep an entry at the beginning, you can filter

Re: hadoop learning

2015-02-21 Thread Fabio C.
Hi Rishabh, I didn't know anything about Hadoop a few months ago, and I started from the very beginning. I don't suggest you to start with online documentation, that is always fragmented, incomplete and sometimes not even up to date. Also starting by directly using Hadoop is the fastest way to

Can RM ignore heartbeats?

2015-02-24 Thread Fabio C.
Hi everyone, I have a question about the ResourceManager behavior: when the ResourceManager allocates a container, it takes some time before the NMToken is sent and then received by the ApplicationMaster. During this time, it is possible to receive another heartbeat from the AM, equal to the last

Re: Question about log files

2015-04-06 Thread Fabio C.
I noticed that too, I think Hadoop keeps the file open all the time and when you delete it it is just no more able to write on it and doesn't try to recreate it. Not sure if it's a Log4j problem or an Hadoop one... yanghaogn, which is the *correct* way to delete the Hadoop logs? I didn't find