oom during shuffle in 2.0.4

2013-05-06 Thread Radim Kolar
Dne 6.5.2013 17:09, Jason Lowe napsal(a): This may be related to MAPREDUCE-5168 . There's a memory leak of sorts in the shuffle if many map outputs end up being merged from disk. my case is different. In memory merge of 281 MB. These are fir

oom during shuffle

2013-05-05 Thread Radim Kolar
i retested it and oom error is also in 2.0.3, but not in 1.x. how much memory is needed for reducer during shuffle and sort? Peak usage is memoryLimit or there have to be additional memory for mapred.reduce.parallel.copies fetch buffers? MergerManager: memoryLimit=326264416, maxSingleShuffleL

[jira] [Created] (MAPREDUCE-5209) ShuffleScheduler log message incorrect

2013-05-05 Thread Radim Kolar (JIRA)
Radim Kolar created MAPREDUCE-5209: -- Summary: ShuffleScheduler log message incorrect Key: MAPREDUCE-5209 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5209 Project: Hadoop Map/Reduce

2.0.4 beta vs 2.0.3

2013-05-04 Thread Radim Kolar
After upgrade i am getting out of heap space during shuffle. I am using compressed mapper outputs and 200 mb sort buffers. Was something important changed? like for example allocating 200mb * number of fetchers now. 2013-05-04 04:02:10,209 WARN [main] org.apache.hadoop.mapred.YarnChild: Excep

Re: JVM vs container memory configs

2013-05-04 Thread Radim Kolar
While looking into MAPREDUCE-5207 (adding defaults for mapreduce.{map|reduce}.memory.mb), I was wondering how much headroom should be left on top of mapred.child.java.opts (or other similar JVM opts) for the container memory itself? I would like to have separated java option settings for map/red

MAPREDUCE-4594

2013-04-25 Thread Radim Kolar
I really need to get this committed because my patch to cascading depends on it. https://issues.apache.org/jira/browse/MAPREDUCE-4594

Re: combiner without reducer

2013-04-17 Thread Radim Kolar
Dne 16.4.2013 11:08, Arpan Rajani napsal(a): Radim, Is it not happening if you set reducers = 0 ? if you set reducers = 0, then class set as combiner is ignored. My proposal is if you set reducers = 0 and combiner, then combiner is run. No API change is needed.

[jira] [Created] (MAPREDUCE-5153) Support for running combiners without reducers

2013-04-16 Thread Radim Kolar (JIRA)
Radim Kolar created MAPREDUCE-5153: -- Summary: Support for running combiners without reducers Key: MAPREDUCE-5153 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5153 Project: Hadoop Map/Reduce

combiner without reducer

2013-04-15 Thread Radim Kolar
i have need for running combiner without reducers. Workflow mapper -> sort -> combiner -> hdfs. do you think that you will ever support that scenario? Its quite common because few hadoop frameworks supports it, but they emulate it in mapper by caching results into memory.

partitioner with lifecycle

2013-01-01 Thread Radim Kolar
is there sufficient interest in partitioner with lifecycle? https://issues.apache.org/jira/browse/MAPREDUCE-4594

[jira] [Resolved] (MAPREDUCE-4256) Improve resource scheduling

2013-01-01 Thread Radim Kolar (JIRA)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar resolved MAPREDUCE-4256. Resolution: Duplicate replaced by YARN-2 > Improve resou

[jira] [Resolved] (MAPREDUCE-4851) add lifecycle to Comparators

2013-01-01 Thread Radim Kolar (JIRA)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar resolved MAPREDUCE-4851. Resolution: Not A Problem > add lifecycle to Comparat

[jira] [Resolved] (MAPREDUCE-3772) MultipleOutputs output lost if baseOutputPath starts with ../

2013-01-01 Thread Radim Kolar (JIRA)
[ https://issues.apache.org/jira/browse/MAPREDUCE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar resolved MAPREDUCE-3772. Resolution: Invalid Not a bug, but user error. > MultipleOutp

comparators with lifecycle

2012-12-18 Thread Radim Kolar
I plan to implement comparators with lifecycle and make them backward compatible. plan: create Class LifecycleComparator implements Comparator after new instance of comparator is created then check if its subclass of LifecycleComparator and if yes, then invoke init/shutdown methods. https://i

[jira] [Created] (MAPREDUCE-4887) Rehashing partitioner for better distribution

2012-12-17 Thread Radim Kolar (JIRA)
Radim Kolar created MAPREDUCE-4887: -- Summary: Rehashing partitioner for better distribution Key: MAPREDUCE-4887 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887 Project: Hadoop Map/Reduce

Re: How to compile the JobTracker.java

2012-12-17 Thread Radim Kolar
I want to customize the JobTracker.java, but I am not getting the source code on Eclipse. it did not builds in eclipse out of box.

Re: One output file per node

2012-12-13 Thread Radim Kolar
if you have strong data locality demands, then try http://peregrine_mapreduce.bitbucket.org/ Its 2x faster then hadoop for multipass job types. It has also very fast node recovery. I plan to do this for hdfs, concept is similar to "virtual nodes". Its not hadoop or HDFS compatible and it has n

Re: One output file per node

2012-12-12 Thread Radim Kolar
you need custom outputcomitter

[jira] [Created] (MAPREDUCE-4851) add lifecycle to Comparators

2012-12-06 Thread Radim Kolar (JIRA)
Radim Kolar created MAPREDUCE-4851: -- Summary: add lifecycle to Comparators Key: MAPREDUCE-4851 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4851 Project: Hadoop Map/Reduce Issue

[jira] [Created] (MAPREDUCE-4839) TextPartioner for hashing Text with good hashing function to get better distribution

2012-12-01 Thread Radim Kolar (JIRA)
Radim Kolar created MAPREDUCE-4839: -- Summary: TextPartioner for hashing Text with good hashing function to get better distribution Key: MAPREDUCE-4839 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4839

[jira] [Created] (MAPREDUCE-4827) Increase hash quality of HashPartitioner

2012-11-28 Thread Radim Kolar (JIRA)
Radim Kolar created MAPREDUCE-4827: -- Summary: Increase hash quality of HashPartitioner Key: MAPREDUCE-4827 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827 Project: Hadoop Map/Reduce

[jira] [Resolved] (MAPREDUCE-4509) Make link in "Aggregation is not enabled. Try the nodemanager at"

2012-11-22 Thread Radim Kolar (JIRA)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar resolved MAPREDUCE-4509. Resolution: Not A Problem > Make link in "Aggregation is not enabled.

[jira] [Resolved] (MAPREDUCE-4630) API for setting dfs.block.size

2012-11-22 Thread Radim Kolar (JIRA)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar resolved MAPREDUCE-4630. Resolution: Not A Problem > API for setting dfs.block.s

Re: QA bot in YARN project

2012-11-19 Thread Radim Kolar
are there any difficulties to setup qa bot to be automatic for YARN project?

QA bot in YARN project

2012-11-18 Thread Radim Kolar
QA bot does not seems to run automatically for patches submitted to YARN project. I remember from earlier conversation that is started by hand. How can i start it for my patches?

pluggable resources

2012-10-22 Thread Radim Kolar
I have proposal for improved resource scheduling. https://issues.apache.org/jira/browse/MAPREDUCE-4256 as i see, development seems to go other way for example in https://issues.apache.org/jira/browse/YARN-2 for every added kind of resource there has to be significant rework. you do not see b

Re: Can some committer commit MAPREDUCE-4479?

2012-10-21 Thread Radim Kolar
Dne 19.10.2012 19:44, Robert Evans napsal(a): Looking at it now. Thanks for being a squeaky wheel :). take a look at https://issues.apache.org/jira/browse/HADOOP-8811 as well.

[jira] [Created] (MAPREDUCE-4630) API for setting dfs.block.size

2012-09-03 Thread Radim Kolar (JIRA)
Radim Kolar created MAPREDUCE-4630: -- Summary: API for setting dfs.block.size Key: MAPREDUCE-4630 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4630 Project: Hadoop Map/Reduce Issue

Re: init/shutdown API for partitioner

2012-08-25 Thread Radim Kolar
Can you file an MR JIRA with the following questions answered on it: Of what use will it be? Can you describe a use-case? use case is spring context + datagrid. which needs init/shutdown cycle. Same stuff needs to be added to RawComparator too.

Re: init/shutdown API for partitioner

2012-08-25 Thread Radim Kolar
Radim, Can you file an MR JIRA with the following questions answered on it: Of what use will it be? Can you describe a use-case? https://issues.apache.org/jira/browse/MAPREDUCE-4594

init/shutdown API for partitioner

2012-08-25 Thread Radim Kolar
Mapper partioner supports only Configurable API which can be used for basic init. Problem is that there is no shutdown function. I propose to use standard setup() cleanup() functions like in mapper / reducer

Re: [jira] [Created] (MAPREDUCE-4275) Plugable process tree

2012-08-24 Thread Radim Kolar
can be this committed already? I need it for freebsd port. Patch + review was done. https://issues.apache.org/jira/browse/MAPREDUCE-4275

[jira] [Created] (MAPREDUCE-4509) Make link in "Aggregation is not enabled. Try the nodemanager at"

2012-08-02 Thread Radim Kolar (JIRA)
Radim Kolar created MAPREDUCE-4509: -- Summary: Make link in "Aggregation is not enabled. Try the nodemanager at" Key: MAPREDUCE-4509 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4509

MAPREDUCE-4068

2012-08-01 Thread Radim Kolar
MAPREDUCE-4068 is just YARN bug or it is done on purpose because placing 3rd party libs into lib/ job subdirectory turned to be bad design pattern?

Re: Need review

2012-07-30 Thread Radim Kolar
Sure, I'll take a look. I'll ping Bikas who wrote the original too! I need to get review on patch https://issues.apache.org/jira/browse/MAPREDUCE-4275 i uploaded updated version with some linuxism removed. Should be fine now

Need review

2012-07-21 Thread Radim Kolar
I need to get review on patch https://issues.apache.org/jira/browse/MAPREDUCE-4275

[jira] [Resolved] (MAPREDUCE-4114) saveVersion.sh fails if build directory contains space

2012-06-18 Thread Radim Kolar (JIRA)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar resolved MAPREDUCE-4114. Resolution: Duplicate > saveVersion.sh fails if build directory contains sp

Re: Request for Jira issue review

2012-05-31 Thread Radim Kolar
add me to list with https://issues.apache.org/jira/browse/MAPREDUCE-4275

resource schedulling

2012-05-21 Thread Radim Kolar
I made some ideas about improved resource scheduling in MAPREDUCE-4256, please take a look at it and share your comments.

[jira] [Created] (MAPREDUCE-4275) Plugable process tree

2012-05-21 Thread Radim Kolar (JIRA)
Radim Kolar created MAPREDUCE-4275: -- Summary: Plugable process tree Key: MAPREDUCE-4275 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4275 Project: Hadoop Map/Reduce Issue Type

Re: Review Request.

2012-05-19 Thread Radim Kolar
> I have submitted patches for the following issues, can someone please review them. I have also experience that patch review process can take months. There are complains in JIRA like "I can not rebase this patch forever". New active people should be given commit rights to things moving forward

[jira] [Created] (MAPREDUCE-4256) Improve resource scheduling

2012-05-14 Thread Radim Kolar (JIRA)
Radim Kolar created MAPREDUCE-4256: -- Summary: Improve resource scheduling Key: MAPREDUCE-4256 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4256 Project: Hadoop Map/Reduce Issue Type

Re: Building first time

2012-05-08 Thread Radim Kolar
I am interested in working on mapreduce package, so not sure if I need to compile the whole tree. I work on branch-0.23. It can be just imported into SpringToolsSuite, then click on Run -> Maven -> type in 'compile' target. It compiles module it just fails on Avro stuff. But it is good enough

[jira] [Created] (MAPREDUCE-4209) junit dependency in hadoop-mapreduce-client is missing scope test

2012-04-29 Thread Radim Kolar (JIRA)
Radim Kolar created MAPREDUCE-4209: -- Summary: junit dependency in hadoop-mapreduce-client is missing scope test Key: MAPREDUCE-4209 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4209 Project

OS Independent container monitor

2012-04-25 Thread Radim Kolar
Currently contained monitor works only on Linux. Looking at code org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMonitorImpl it seems best to have pluggable processinfotree object instead of ContainerMonitor. ProcfsBasedProcessTree will be pluggable implementation

ResourceCalculatorPlugin.getCpuUsage

2012-04-17 Thread Radim Kolar
float ResourceCalculatorPlugin.getCpuUsage() with javadoc comment /** * Obtain the CPU usage % of the machine. Return -1 if it is unavailable * * @return CPU usage in % */ is supposed to return 1.0 or 100.0 for fully CPU utilised machine?

[jira] [Created] (MAPREDUCE-4116) Missing POM dependency for hadoop-yarn-common

2012-04-06 Thread Radim Kolar (Created) (JIRA)
Affects Versions: 0.23.1 Reporter: Radim Kolar hadoop-yarn-common is missing dependency on hadoop-common. some things like Configured from it are used. This dependency should be added to POM -- This message is automatically generated by JIRA. If you think it was sent incorrectly

Re: Trying to compile 0.23.1 branch

2012-04-05 Thread Radim Kolar
This should already be fixed in branch-2 (Compilation with OpenJDK6). Can you check there as well? submitted job to CI build failed due to MAPREDUCE-4115

Re: Trying to compile 0.23.1 branch

2012-04-05 Thread Radim Kolar
Dne 5.4.2012 21:52, Harsh J napsal(a): Radim, This should already be fixed in branch-2 (Compilation with OpenJDK6). Can you check there as well? submitted job to CI Why are you building 0.23.1 specifically though? building on freebsd is not easy, so i needed to start with something which is

[jira] [Created] (MAPREDUCE-4115) hadoop-project-dist/pom.xml is invalid

2012-04-05 Thread Radim Kolar (Created) (JIRA)
Versions: 0.23.1 Reporter: Radim Kolar Mentioned POM is invalid. It fails XML validation and can not be deployed into repository with validating repository manager, such as Artifactory. Problematic are ">" inside antrun-plugin configuration. It should be > Line 342 a

[jira] [Created] (MAPREDUCE-4114) saveVersion.sh fails if build directory contains space

2012-04-05 Thread Radim Kolar (Created) (JIRA)
Components: build Affects Versions: 0.23.2 Environment: FreeBSD 8.2, 64bit Reporter: Radim Kolar if you rename build directory to something without space like /tmp/hadoop then it works [INFO

Re: Trying to compile 0.23.1 branch

2012-04-05 Thread Radim Kolar
Dne 5.4.2012 14:51, Radim Kolar napsal(a): [ERROR] /tmp/hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Counters.java:[93,49] incompatible types; inferred type argument(s) org.apache.hadoop.mapreduce.Counter

Re: Trying to compile 0.23.1 branch

2012-04-05 Thread Radim Kolar
[ERROR] /tmp/hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Counters.java:[93,49] incompatible types; inferred type argument(s) org.apache.hadoop.mapreduce.Counter,java.lang.Object do not conform to bounds of

Trying to compile 0.23.1 branch

2012-04-05 Thread Radim Kolar
mvn compile -DskipTests -Dcommons.daemon.os.name=linux -Dcommons.daemon.os.arch=i686 can't get it compiled - openjdk version "1.6.0_30". hadoop needs java 5? [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile (default-compile) on project hadoop-mapredu

[jira] [Created] (MAPREDUCE-4101) nodemanager depends on /bin/bash

2012-04-03 Thread Radim Kolar (Created) (JIRA)
Versions: 0.23.1 Environment: FreeBSD 8.2 / 64 bit Reporter: Radim Kolar Currently nodemanager depends on bash shell. It should be well documented for system not having bash installed by default such as FreeBSD. Because only basic functionality of bash is used, probably changing

[jira] [Created] (MAPREDUCE-3968) add support for getNumMapTasks() into mapreduce JobContext

2012-03-04 Thread Radim Kolar (Created) (JIRA)
: Improvement Components: mrv1 Affects Versions: 0.22.0 Environment: hadoop 0.22 Reporter: Radim Kolar Priority: Minor In old mapred api there was way to query number of mappers: job.getNumMapTasks()) No such function exists in new mapreduce api -- This

[jira] [Created] (MAPREDUCE-3772) MultipleOutputs output lost if baseOutputPath starts with ../

2012-01-31 Thread Radim Kolar (Created) (JIRA)
: Bug Components: mrv1 Affects Versions: 0.22.0 Environment: FreeBSD Reporter: Radim Kolar Priority: Minor Lets say you have output directory set: FileOutputFormat.setOutputPath(job, "/tmp/multi1/out"); and want to place output from Multi