[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

Andras Gyori (Jira) Tue, 02 Feb 2021 00:09:07 -0800


    [ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276924#comment-17276924
 ]


Andras Gyori commented on YARN-10427:
-------------------------------------

Thank you [~snemeth] for the extremely detailed analysis! I have check through 
the main points of your feedback and I agree with your solution. My additions 
to the topic:
 * The logic is quite confusing, as I agree with the proposed approach to 
prevent lastStep to be invoked multiple times.
 * I can see, that the processResponseQueue method sets isFinished variable to 
true, after invoking lastStep. This could be useful for additional checks, 
inside MRAMSimulator#lastStep. I do not think invoking those clear methods on 
the collections multiple times is problematic, but it might be worth a double 
check.

{code:java}
@Override
  public void lastStep() throws Exception {
    super.lastStep();

    // clear data structures, but still invoked twice
    allMaps.clear();
    allReduces.clear();
    assignedMaps.clear();
    assignedReduces.clear();
    pendingFailedMaps.clear();
    pendingFailedReduces.clear();
    pendingMaps.clear();
    pendingReduces.clear();
    scheduledMaps.clear();
    scheduledReduces.clear();
    responseQueue.clear();
  }
{code}
All in all, I have no objections, its +1.


> Duplicate Job IDs in SLS output
> -------------------------------
>
>                 Key: YARN-10427
>                 URL: https://issues.apache.org/jira/browse/YARN-10427
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler-load-simulator
>    Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0
>         Environment: I ran the attached inputs on my MacBook Pro, using 
> Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also 
> tested against 3.2.1 and 3.3.0 release branches.
>  
>            Reporter: Drew Merrill
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, 
> YARN-10427.001.patch, YARN-10427.002.patch, YARN-10427.003.patch, 
> fair-scheduler.xml, inputsls.json, jobruntime.csv, jobruntime.csv, 
> mapred-site.xml, sls-runner.xml, yarn-site.xml
>
>
> Hello, I'm hoping someone can help me resolve or understand some issues I've 
> been having with the YARN Scheduler Load Simulator (SLS). I've been 
> experimenting with SLS for several months now at work as we're trying to 
> build a simulation model to characterize our enterprise Hadoop infrastructure 
> for purposes of future capacity planning. In the process of attempting to 
> verify and validate the SLS output, I've encountered a number of issues 
> including runtime exceptions and bad output. The focus of this issue is the 
> bad output. In all my simulation runs, the jobruntime.csv output seems to 
> have one or more of the following problems: no output, duplicate job ids, 
> and/or missing job ids.
>  
> Because of where I work, I'm unable to provide the exact inputs I typically 
> use, but I'm able to reproduce the problem of the duplicate Job IDS using 
> some simplified inputs and configuration files, which I've attached, along 
> with the output I obtained.
>  
> The command I used to run the simulation:
> {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json 
> --output-dir=sls-run-1 --print-simulation 
> --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}}
>  
> Can anyone help me understand what would cause the duplicate Job IDs in the 
> output? Is this a bug in Hadoop or a problem with my inputs? Thanks in 
> advance.
>  
> PS: This is my first issue I've ever opened so please be kind if I've missed 
> something or am not understanding something obvious about the way Hadoop 
> works. I'll gladly follow-up with more info as requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

Reply via email to