Simon, Congratulations! You found a bug in DUCC's Web Server. It was incorrectly rounding up when reporting the number of shares for a machine. This issue is addressed by Jira 4104 <https://issues.apache.org/jira/browse/UIMA-4104>.
Lou. On Fri, Nov 14, 2014 at 7:49 AM, Jim Challenger <chall...@gmail.com> wrote: > Simon, > It looks like the problem is the amount of RAM on your machine. It's > going to be hard to get any meaningful work running on < about 8G. > > Here's what to do to get the test job to run on your 4G machine: > 1. In the resources folder, edit ducc.properties and change this: > ducc.jd.host.memory.size=2GB > to this: > ducc.jd.host.memory.size=1GB > > This is the amount of RAM that DUCC reserves for itself to manage > it's "head" processes. > > 2. In the examples/simple folder, edit 1.job and change this: > process_memory_size 2 > to this: > process_memory_size 1 > > This is the amount of memory in GB that the sample 1.job is > requesting. > > 3. Stop ducc and restart it so the ducc processes reset the > jd.host.memory size from the new ducc.properties. > > 4. Rerun 1.job and all should be well. > > Here are the gory details from the RM log, if you're interested. In > the RM log, I see these lines. > > 13 Nov 2014 22:04:14,909 INFO RM.NodePool - queryMachines N/A > Name Order Active Shares Unused Shares Memory (MB) > Jobs > -------------------- ----- ------------- ------------- ----------- > ------ ... > .us-west-2.compute.internal 3 2 1 3955 7 [1] > > This says you have 3G of **usable-by-ducc** RAM, of which 2G are used by > the reservation/job "7", and that you have 1GB free. The reason you have > only 3GB **usable** is that usually the hardware/opsys will reserve a small > part of the installed RAM for itself, so the reported RAM is a tad > smaller. To avoid overcommitting the system, we use the reported value, > not the installed value. Most or all of the jobs here will easily > overwhelm even the largest machines if we don't do this. > > Next, these lines show the actual schedule the RM is trying to build. > Dormant: > ID JobName User Class Shares > Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst > J_________8 Test_job_1 ducc normal 0 > 2 0 2 2 15 15 true 8 > > Reserved: > ID JobName User Class Shares > Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst > R_________7 Job_Driver System JobDriver 1 > 2 2 0 2 0 0 0 1 > > This confirms that the DUCC reservation "7" occupies 2G, and that job "8" > is requesting 2G but is "dormant", i.e. waiting for resources. Since there > is only 3G available on this machine, job 8 will wait. > > Best, > Jim > > > > > >