Simon,
It looks like the problem is the amount of RAM on your machine. It's going to be hard to get any meaningful work running on < about 8G.

    Here's what to do to get the test job to run on your 4G machine:
    1.  In the resources folder, edit ducc.properties and change this:
              ducc.jd.host.memory.size=2GB
         to this:
              ducc.jd.host.memory.size=1GB

This is the amount of RAM that DUCC reserves for itself to manage it's "head" processes.

    2.  In the examples/simple folder, edit 1.job and change this:
             process_memory_size            2
         to this:
             process_memory_size            1

This is the amount of memory in GB that the sample 1.job is requesting.

3. Stop ducc and restart it so the ducc processes reset the jd.host.memory size from the new ducc.properties.

     4.  Rerun 1.job and all should be well.

Here are the gory details from the RM log, if you're interested. In the RM log, I see these lines.

13 Nov 2014 22:04:14,909  INFO RM.NodePool - queryMachines     N/A
Name Order Active Shares Unused Shares Memory (MB) Jobs -------------------- ----- ------------- ------------- ----------- ------ ...
.us-west-2.compute.internal     3 2             1        3955 7 [1]

This says you have 3G of **usable-by-ducc** RAM, of which 2G are used by the reservation/job "7", and that you have 1GB free. The reason you have only 3GB **usable** is that usually the hardware/opsys will reserve a small part of the installed RAM for itself, so the reported RAM is a tad smaller. To avoid overcommitting the system, we use the reported value, not the installed value. Most or all of the jobs here will easily overwhelm even the largest machines if we don't do this.

Next,  these lines show the actual schedule the RM is trying to build.
Dormant:
ID JobName User Class Shares Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst J_________8 Test_job_1 ducc normal 0 2 0 2 2 15 15 true 8

Reserved:
ID JobName User Class Shares Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst R_________7 Job_Driver System JobDriver 1 2 2 0 2 0 0 0 1

This confirms that the DUCC reservation "7" occupies 2G, and that job "8" is requesting 2G but is "dormant", i.e. waiting for resources. Since there is only 3G available on this machine, job 8 will wait.

Best,
Jim





Reply via email to