Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using HSA runtime with dpdk-16.04 #18

Open
ajamshed opened this issue Jun 26, 2016 · 1 comment
Open

Using HSA runtime with dpdk-16.04 #18

ajamshed opened this issue Jun 26, 2016 · 1 comment

Comments

@ajamshed
Copy link

Hi,

I am still getting familiarized with the HSA runtime programming environment so this may sound like a simple question. I am developing a small networking application (IPv4 router) on a Kaveri machine that
uses a GPU module for IPv4 route lookups. My GPU module is written in OpenCL and I use cloc.sh to compile the kernel code to HSA code object (hsaco) format. I am using DPDK as my networking I/O driver for receiving and sending traffic.

I first tried to pass array of pointers to (rte_mbuf *) structures within the GPU kernel so that only the GPU directly retrieves the Ethernet frame (and the IPv4 header) so that the CPU does not waste any cycles in parsing the packet header fields (and avoid any necessary cache misses). Unfortunately, my program immediately crashes once the GPU tries to access packets' payload and I get the following messages in my dmesg log:

[33138.018390] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0001 address=0x0000000000000f80 flags=0x0005]
[33138.019049] kfd kfd: Invalid PPR device 0:1.0 pasid 1 address 0xFFFF91CD5E0D7000 flags 0x104
[33138.019050] kfd kfd: Sending SIGSEGV to HSA Process with PID 11200 
[33138.019052] kfd kfd: HSA Process (PID 11200) got unhandled exception
[33138.019718] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0001 address=0x0000000000000f80 flags=0x0005]
[33138.020395] kfd kfd: Invalid PPR device 0:1.0 pasid 1 address 0x35827A1F3000 flags 0x104
[33138.020397] kfd kfd: Sending SIGSEGV to HSA Process with PID 11200 
[33138.020398] kfd kfd: HSA Process (PID 11200) got unhandled exception

On more careful analysis I discovered that I am correctly passing the pointers but the kernel crashes once it tries to dereference the pointers.

I then tried to pass array of pointers to Ethernet frames to the GPU (CPU retrieves the packet pointer from the rte_mbuf structures) but this setup also triggered exactly the same crash as mentioned above.

I tried using hsa_memory_assign_agent() and hsa_memory_register() functions on the array of packet structures (both rte_mbuf * and uint8_t *) but I could not fix this problem. Any idea what I am doing wrong here?

H/W Specs:
model name : AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
cpu MHz : 4000.000
cache size : 2048 KB

S/W Specs:
Linux kernel version: 4.4.0-kfd-compute-rocm-rel-1.1.1-10
Intel dpdk-16.04
CLOC 1.0.11 (April 2016 update)
HSA Runtime v1.6
amdkfd v1.6.1

Thanks!

@ajamshed
Copy link
Author

With Adrian's help I was able to find a bug in my GPU kernel code. I was trying to access an out-of-bounds memory region. After fixing that bug, my program no longer crashes. However, whenever my kernel code tries to dereference any packet pointer, it only gets fields with zero values (whether it is an Ethernet MAC address (00:00:00:00:00:00), or an IP src addr (0x00) etc.). I am sure that my CPU part of the code is not bzero-ing the packet pointers.... when the kernel execution finishes and I tried to retrieve packet contents from the CPU side I see the right values. Any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant