[ 
https://issues.apache.org/jira/browse/YARN-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763056#comment-16763056
 ] 

Szilard Nemeth commented on YARN-9265:
--------------------------------------

Hi [~pbacsko]!
It's also worth to mention here for reference that what type of output you plan 
the script will produce based on the output of "aocl diagnose". Is this going 
to be the same fpga device specification string as the value of property 
{{yarn.nodemanager.resource-plugins.fpga.available-devices}} would contain or 
any other intermediate format?

> FPGA plugin fails to recognize Intel Processing Accelerator Card
> ----------------------------------------------------------------
>
>                 Key: YARN-9265
>                 URL: https://issues.apache.org/jira/browse/YARN-9265
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 3.1.0
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Critical
>
> The plugin cannot autodetect Intel FPGA PAC (Processing Accelerator Card).
> There are two major issues.
> Problem #1
> The output of aocl diagnose:
> {noformat}
> --------------------------------------------------------------------
> Device Name:
> acl0
>  
> Package Pat:
> /home/pbacsko/inteldevstack/intelFPGA_pro/hld/board/opencl_bsp
>  
> Vendor: Intel Corp
>  
> Physical Dev Name   Status            Information
>  
> pac_a10_f200000     Passed            PAC Arria 10 Platform (pac_a10_f200000)
>                                       PCIe 08:00.0
>                                       FPGA temperature = 79 degrees C.
>  
> DIAGNOSTIC_PASSED
> --------------------------------------------------------------------
>  
> Call "aocl diagnose <device-names>" to run diagnose for specified devices
> Call "aocl diagnose all" to run diagnose for all devices
> {noformat}
> The plugin fails to recognize this and fails with the following message:
> {noformat}
> 2019-01-25 06:46:02,834 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaResourcePlugin:
>  Using FPGA vendor plugin: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin
> 2019-01-25 06:46:02,943 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaDiscoverer:
>  Trying to diagnose FPGA information ...
> 2019-01-25 06:46:03,085 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule:
>  Using traffic control bandwidth handler
> 2019-01-25 06:46:03,108 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
>  Initializing mounted controller cpu at /sys/fs/cgroup/cpu,cpuacct/yarn
> 2019-01-25 06:46:03,139 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceHandlerImpl:
>  FPGA Plugin bootstrap success.
> 2019-01-25 06:46:03,247 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Couldn't find (?i)bus:slot.func\s=\s.*, pattern
> 2019-01-25 06:46:03,248 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Couldn't find (?i)Total\sCard\sPower\sUsage\s=\s.* pattern
> 2019-01-25 06:46:03,251 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Failed to get major-minor number from reading /dev/pac_a10_f300000
> 2019-01-25 06:46:03,252 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to 
> bootstrap configured resource subsystems!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
>  No FPGA devices detected!
> {noformat}
> Problem #2
> The plugin assumes that the file name under {{/dev}} can be derived from the 
> "Physical Dev Name", but this is wrong. For example, it thinks that the 
> device file is {{/dev/pac_a10_f300000}} which is not the case, the actual 
> file is {{/dev/intel-fpga-port.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to