[ https://issues.apache.org/jira/browse/YARN-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773193#comment-16773193 ]
Peter Bacsko edited comment on YARN-9265 at 2/20/19 4:56 PM: ------------------------------------------------------------- I made a slight modification in {{FpgaDiscoverer.discover()}}, replaced the existing iterator-based logic with some nice streams/lambda. was (Author: pbacsko): I made a slight modification in {{FpgaDiscoverer.discover()}}, replaced the existing iterator-based logic with some nice streams/lambda logic. > FPGA plugin fails to recognize Intel Processing Accelerator Card > ---------------------------------------------------------------- > > Key: YARN-9265 > URL: https://issues.apache.org/jira/browse/YARN-9265 > Project: Hadoop YARN > Issue Type: Sub-task > Affects Versions: 3.1.0 > Reporter: Peter Bacsko > Assignee: Peter Bacsko > Priority: Critical > Attachments: YARN-9265-001.patch, YARN-9265-002.patch, > YARN-9265-003.patch, YARN-9265-004.patch, YARN-9265-005.patch, > YARN-9265-006.patch, YARN-9265-007.patch > > > The plugin cannot autodetect Intel FPGA PAC (Processing Accelerator Card). > There are two major issues. > Problem #1 > The output of aocl diagnose: > {noformat} > -------------------------------------------------------------------- > Device Name: > acl0 > > Package Pat: > /home/pbacsko/inteldevstack/intelFPGA_pro/hld/board/opencl_bsp > > Vendor: Intel Corp > > Physical Dev Name Status Information > > pac_a10_f200000 Passed PAC Arria 10 Platform (pac_a10_f200000) > PCIe 08:00.0 > FPGA temperature = 79 degrees C. > > DIAGNOSTIC_PASSED > -------------------------------------------------------------------- > > Call "aocl diagnose <device-names>" to run diagnose for specified devices > Call "aocl diagnose all" to run diagnose for all devices > {noformat} > The plugin fails to recognize this and fails with the following message: > {noformat} > 2019-01-25 06:46:02,834 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaResourcePlugin: > Using FPGA vendor plugin: > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin > 2019-01-25 06:46:02,943 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaDiscoverer: > Trying to diagnose FPGA information ... > 2019-01-25 06:46:03,085 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule: > Using traffic control bandwidth handler > 2019-01-25 06:46:03,108 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl: > Initializing mounted controller cpu at /sys/fs/cgroup/cpu,cpuacct/yarn > 2019-01-25 06:46:03,139 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceHandlerImpl: > FPGA Plugin bootstrap success. > 2019-01-25 06:46:03,247 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin: > Couldn't find (?i)bus:slot.func\s=\s.*, pattern > 2019-01-25 06:46:03,248 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin: > Couldn't find (?i)Total\sCard\sPower\sUsage\s=\s.* pattern > 2019-01-25 06:46:03,251 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin: > Failed to get major-minor number from reading /dev/pac_a10_f300000 > 2019-01-25 06:46:03,252 ERROR > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to > bootstrap configured resource subsystems! > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: > No FPGA devices detected! > {noformat} > Problem #2 > The plugin assumes that the file name under {{/dev}} can be derived from the > "Physical Dev Name", but this is wrong. For example, it thinks that the > device file is {{/dev/pac_a10_f300000}} which is not the case, the actual > file is {{/dev/intel-fpga-port.0}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org