Re: How to process part of a file in Hadoop?

2014-02-07 Thread Harsh J
You can write a custom InputFormat whose #getSplits(...) returns your required InputSplit objects (with randomised offsets + lengths, etc.). On Fri, Feb 7, 2014 at 9:50 PM, Suresh S wrote: > Dear Friends, > > I have some very large file in HDFS with 3000+ blocks. > > I want run a job wi

[jira] [Created] (MAPREDUCE-5747) Potential null pointer deference in HsTasksBlock#render()

2014-02-07 Thread Ted Yu (JIRA)
Ted Yu created MAPREDUCE-5747: - Summary: Potential null pointer deference in HsTasksBlock#render() Key: MAPREDUCE-5747 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5747 Project: Hadoop Map/Reduce

[jira] [Resolved] (MAPREDUCE-3469) Port to 0.22 - Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2014-02-07 Thread Konstantin Shvachko (JIRA)
[ https://issues.apache.org/jira/browse/MAPREDUCE-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved MAPREDUCE-3469. Resolution: Duplicate > Port to 0.22 - Implement limits on per-job JobC

[jira] [Created] (MAPREDUCE-5746) Job diagnostics can implicate wrong task for a failed job

2014-02-07 Thread Jason Lowe (JIRA)
Jason Lowe created MAPREDUCE-5746: - Summary: Job diagnostics can implicate wrong task for a failed job Key: MAPREDUCE-5746 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5746 Project: Hadoop Map/

Re: Re-swizzle 2.3

2014-02-07 Thread Vinod Kumar Vavilapalli
Heres what I've done: - Reverted YARN-1493,YARN-1490,YARN-1041, YARN-1166,YARN-1566,YARN-1689,YARN-1661 from branch-2.3. - Updated YARN's CHANGES.txt in trunk, branch-2 and branch-2.3. - Updated these JIRAs to have 2.4 as the fix-version. - Compiled branch-2.3. Let me know if you run into any

Re: Re-swizzle 2.3

2014-02-07 Thread Vinod Kumar Vavilapalli
Haven't heard back from Jian. Reverting the set from branch-2.3 (only). Tx for the offline list. +Vinod On Fri, Feb 7, 2014 at 9:08 AM, Alejandro Abdelnur wrote: > Vinod, I have the patches to revert most of the JIRAs, the first batch, > I'll send them off line to you. > > Thanks. > > > On Thu,

Re: Re-swizzle 2.3

2014-02-07 Thread Alejandro Abdelnur
sire, as sandy said, lets keep it in branch 2 for now and if not resolved by 2.4 timeframe we'll revert them there. thx Alejandro (phone typing) > On Feb 7, 2014, at 10:14, Steve Loughran wrote: > >> On 6 February 2014 17:07, Alejandro Abdelnur wrote: >> >> Thanks Robert, >> >> All, >> >>

Re: Re-swizzle 2.3

2014-02-07 Thread Steve Loughran
On 6 February 2014 17:07, Alejandro Abdelnur wrote: > Thanks Robert, > > All, > > > > I'm inclined to revert them from branch-2 as well. > > -1 to that; if there are issues we should be able to find and fix them soon enough. Even if you aren't doing long-lived YARN services yet, even llama benefi

Re: Re-swizzle 2.3

2014-02-07 Thread Alejandro Abdelnur
Vinod, I have the patches to revert most of the JIRAs, the first batch, I'll send them off line to you. Thanks. On Thu, Feb 6, 2014 at 8:56 PM, Vinod Kumar Vavilapalli wrote: > > Thanks. please post your findings, Jian wrote this part of the code and > between him/me, we can take care of those

How to process part of a file in Hadoop?

2014-02-07 Thread Suresh S
Dear Friends, I have some very large file in HDFS with 3000+ blocks. I want run a job with various input size. I want to use the same file as a input. Usually the number of task is equal to number of blocks/splits. Suppose the job with 2 task need to process randomly any two block of th