Welcome to the Linux Foundation Forum!

6.6.20 kernel call traces __memcpy

We are migrating over kernel from 5.10.41 to 6.6.20 for riscv platform
Its working till 6.1.110 but when we are trying it for 6.6.20 its causing call trace as mentioned below

INIT: version booting
mount: /sys: sysfs already mounted on /sys.
Starting udev
[ 2.266078] udevd[125]: starting version 3.2.9
[ 9.922822] random: crng init done
[ 9.930519] udevd[126]: starting eudev-3.2.9
sysctl: cannot stat /proc/sys/net/ipv4/tcp_syncookies: No such file or directory
[ 10.876616] Oops - store (or AMO) access fault [#1]
[ 10.876637] Modules linked in:
[ 10.876646] CPU: 2 PID: 177 Comm: rc Not tainted 6.6.20-nb2+ #1
[ 10.876654] Hardware name: SiFive Pearl (DT)
[ 10.876657] epc : __memcpy+0x60/0xf8
[ 10.876676] ra : do_wp_page+0x2ea/0xc28
[ 10.876684] epc : ffffffff80751e80 ra : ffffffff80125d76 sp : ffffffc80090bca0
[ 10.876688] gp : ffffffff834bebc0 tp : ffffffd80bdccb40 t0 : 2101200122012901
[ 10.876691] t1 : 2f01220120013d01 t2 : 2f01720161017601 s0 : ffffffc80090bd50
[ 10.876695] s1 : ffffffc80090bd68 a0 : ffffffd800000000 a1 : ffffffd848ffe000
[ 10.876698] a2 : 0000000000001000 a3 : ffffffd848fff000 a4 : 6e0169016c016401
[ 10.876701] a5 : 2f01220120016b01 a6 : 2f01720161017601 a7 : 220170016d017401
[ 10.876704] s2 : ffffffd80bdc2d00 s3 : ffffffff82cad598 s4 : ffffffd8046b36e0
[ 10.876707] s5 : 0000000000001a55 s6 : ffffffff830ef460 s7 : ffffffff83200008
[ 10.876711] s8 : ffffffff834c0bd8 s9 : ffffffff834f5900 s10: ffffffd9f99adb50
[ 10.876713] s11: ffffffd9f89b5bc0 t3 : 61016c016f017601 t4 : 65016c0169017401
[ 10.876717] t5 : 70016d0174012f01 t6 : ffffffd800000000
[ 10.876719] status: 0000000200000120 badaddr: ffffffd800000000 cause: 0000000000000007
[ 10.876723] [] __memcpy+0x60/0xf8
[ 10.876731] [] __handle_mm_fault+0x64a/0xa18
[ 10.876738] [] handle_mm_fault+0x3e/0x10c
[ 10.876743] [] handle_page_fault+0xb0/0x350
[ 10.876752] [] do_page_fault+0x1e/0x36
[ 10.876763] [] ret_from_exception+0x0/0x64
[ 10.876779] Code: b303 0285 b383 0305 be03 0385 be83 0405 bf03 0485 (b023) 00ef
[ 10.876784] ---[ end trace 0000000000000000 ]---
[ 10.876787] Kernel panic - not syncing: Fatal exception in interrupt
[ 10.876791] SMP: stopping secondary CPUs
[ 11.043199] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Answers

  • @rizwan25

    Thank you for sharing the complete error details.

    Please try the following steps to troubleshoot and resolve the issue:

    Use addr2line on the failing vmlinux to identify the exact source line where __memcpy was called. This will help determine which code path passed the invalid pointer.

    If the caller appears to be a user-copy wrapper, check for swapped arguments or invalid pointer validation.

    Rebuild the kernel with the following configurations enabled to get a more detailed trace:

    CONFIG_DEBUG_VM

    CONFIG_PAGE_POISONING

    CONFIG_FRAME_POINTER

    If the issue persists and the source is still unclear, perform a git bisect between versions v6.1.110 and v6.6.20 using your automated boot or QEMU test setup.

    Example addr2line usage (adjust paths as needed):

    Assume vmlinux is located at /work/linux/vmlinux and epc from the oops is 0xffffffff80751e80

    VM=/work/linux/vmlinux
    EPC=0xffffffff80751e80

    Compute the offset

    OFFSET=$(printf '%x' $((0xFFFFFFFF80751E80 - 0xFFFFFFFF80000000)))

    Get the exact source line

    riscv64-linux-gnu-addr2line -e $VM -f -p 0x$OFFSET

    Repeat this for each instruction pointer listed in the oops.

    If you still encounter errors after following these steps, please share the updated error logs so I can investigate further and suggest the next steps.

    Nick R
    Cloud Team Lead
    AccuWeb.Cloud

  • Hi Nick
    Thanks for the reply

    As per the suggestion, Enabled the below configs, compiled kernel and booted board
    CONFIG_DEBUG_VM

    CONFIG_PAGE_POISONING

    CONFIG_FRAME_POINTER

    sysctl: cannot stat /proc/sys/net/ipv4/tcp_syncookies: No such file or directory
    [ 11.704069] Oops - store (or AMO) access fault [#1]
    [ 11.704083] Modules linked in:
    [ 11.704090] CPU: 0 PID: 169 Comm: rc Not tainted 6.6.20-nb2+ #2
    [ 11.704096] Hardware name: SiFive Pearl (DT)
    [ 11.704099] epc : __memcpy+0x60/0xf8
    [ 11.704114] ra : do_wp_page+0x32c/0xd9c
    [ 11.704123] epc : ffffffff8074e840 ra : ffffffff8012762e sp : ffffffc80088bca0
    [ 11.704127] gp : ffffffff834be4c0 tp : ffffffd80b60a040 t0 : 0000002ab8a130a0
    [ 11.704130] t1 : dfdfdfdfdfdfdfdf t2 : 0000000000000000 s0 : ffffffc80088bd50
    [ 11.704134] s1 : ffffffff82cad598 a0 : ffffffd800000000 a1 : ffffffd9f7cdf000
    [ 11.704137] a2 : 0000000000001000 a3 : ffffffd9f7ce0000 a4 : 0000000000000000
    [ 11.704140] a5 : 0000000000000031 a6 : 0000002ab89dce70 a7 : 0000002ab8928f10
    [ 11.704143] s2 : ffffffd80b5ef460 s3 : 0000000000001a55 s4 : ffffffd804425000
    [ 11.704145] s5 : ffffffff830ef240 s6 : ffffffff83200008 s7 : ffffffff834c0bd8
    [ 11.704148] s8 : ffffffff834f5900 s9 : ffffffd9ff7eac88 s10: ffffffd9f89b5bc0
    [ 11.704151] s11: ffffffc80088bd68 t3 : 0000000000000031 t4 : 0000002ab89e9010
    [ 11.704154] t5 : 0000000000000000 t6 : ffffffd800000000
    [ 11.704156] status: 0000000200000120 badaddr: ffffffd800000000 cause: 0000000000000007
    [ 11.704162] [] __memcpy+0x60/0xf8
    [ 11.704170] [] __handle_mm_fault+0x65e/0xab8
    [ 11.704175] [] handle_mm_fault+0x3e/0x10c
    [ 11.704180] [] handle_page_fault+0xb0/0x350
    [ 11.704192] [] do_page_fault+0x1e/0x36
    [ 11.704200] [] ret_from_exception+0x0/0x64

    /6.6/new/linux-kernel-nb2-lts$ aarch64-linux-gnu-addr2line -f -e vmlinux ffffffff8074e840
    __pi_memcpy
    ??:?

    Which is point to the file

    arch/riscv/lib/memcpy.S:109:SYM_FUNC_ALIAS(__pi_memcpy, __memcpy)

    Some Obervation from my side:

    During analysis of kernel revisions from 5.10 through 6.4 and above, I observed that the kernel fault begins occurring starting from version 6.4. In this release, new page-clearing mechanisms were introduced, along with updates to the cache management instructions — specifically, the transition from ZICBOM to ZICBOZ in the RISC-V architecture. Since our platform uses a custom SoC with custom RISC-V modifications, we have adapted the arch/riscv code accordingly to support this kernel version. I have included our changes for reference.

  • kavya@Kavya-SO:~/KAVYA/MINI_PC/porting/6.6/new/linux-kernel-nb2-lts$ aarch64-linux-gnu-addr2line -f -e vmlinux ffffffff8074e840
    __pi_memcpy
    /home/kavya/KAVYA/MINI_PC/porting/6.6/new/linux-kernel-nb2-lts/arch/riscv/lib/memcpy.S:54

Categories

Upcoming Training