Welcome to the Linux Foundation Forum!

Make the Linux kernel ReBAR-over-Thunderbolt friendly

Here's a suggestion for the kernel devs, now that Thunderbolt eGPUs have become more common: make the Linux kernel ReBAR-over-Thunderbolt friendly.

The current behavior is this: BAR 2's hardware register powers up at 256 MB — the default size programmed into the BAR's address decoder by Intel at the factory. The PCIe Resizable BAR capability advertises support for up to 16 GB, but it's passive — software must explicitly exercise it. When a Thunderbolt eGPU is hotplugged at runtime, the kernel's PCI subsystem enumerates the new device, reads the BAR at its 256 MB default, sizes the bridge windows to match, and assigns addresses — all before any driver loads. The ReBAR capability is never consulted(!) during this process.

The current workaround is thunderbolt.host_reset=0, which preserves the BIOS's PCIe tunnel and BAR assignments from POST (where the BIOS does exercise ReBAR). This delivers the full 16 GB BAR but only works for cold-plug(!) scenarios — if the eGPU is power-cycled at runtime, the new tunnel gets the 256 MB default.

The proper fix would be for the kernel's PCI hotplug resource assignment to first check for ReBAR capability during enumeration, resize the BAR to the largest supported size that fits within available bridge headroom, and then commit bridge windows and assign addresses. This is essentially what the BIOS does during POST. It hasn't been implemented yet because eGPU-over-Thunderbolt-with-ReBAR is (was?) a niche use case.

Well, no more niche use case. eGPU-over-Thunderbolt is becoming mainstream.

Comments

  • Adding empirical data on Linux 6.17 + ReBAR-over-TB4 with detailed dmesg of the
    allocator's failure mode. May be useful for whoever picks up the fix.

    Setup

    • Framework 13 (Intel Core Ultra 5 125H, BIOS INSYDE 3.06) — TB4 host, no
      Above 4G Decoding / Resizable BAR BIOS toggles exposed.

    • Razer Core X V2 — Intel Barlow Ridge 4-port USB4-v2 hub. Three downstream
      ports empty, one populated by the GPU.

    • MSI RTX 5060 Ti 16G — advertises ReBAR sizes 64MB through 16GB (per
      lspci -vvv and /sys/bus/pci/devices/.../resource1_resize = 0x7fc0).

    • Ubuntu 24.04, kernel 6.17.0-22-generic, NVIDIA 580.142-open.

    Failure mode

    GPU BAR 1 caps at 256 MB on every cmdline I tried. Standard guidance
    (pci=realloc=on, pci=hpmmioprefsize=N) does not lift this on this
    topology — see comparison below.

    What the allocator does (dmesg, current boot, with cmdline hints)

    pci 0000:00:07.2: PCI bridge to [bus 55-7e]
    pci 0000:00:07.2:   bridge window [mem 0x5010000000-0x580fffffff 64bit pref]   # initial 32 GB
    [V2 enumerated, 4 downstream sibling bridges]
    pcieport 0000:00:07.2: Assigned bridge window [mem 0x6028000000-0x784fffffff
                           64bit pref] to [bus 55-7e]
                           cannot fit 0x1830000000 required for 0000:55:00.0 bridging to [bus 56-7e]
    pcieport 0000:00:07.2: bridge window [mem 0x91000000-0x97ffffff]: failed to expand by 0x38000000
    pcieport 0000:00:07.2: bridge window [mem 0x91000000-0x97ffffff]: failed to add optional 38000000
    pci 0000:55:00.0: Assigned bridge window [mem 0x6028000000-0x784fffffff 64bit pref] to [bus 56-7e]
                      cannot fit 0x30000000 required for 0000:56:00.0 bridging to [bus 57]
    pci 0000:56:00.0: bridge window [mem 0x00000000-0x17ffffff 64bit pref] to [bus 57]
                      requires relaxed alignment rules
    pci 0000:56:00.0: bridge window [mem 0x08000000-0x1fffffff 64bit pref] to [bus 57]
                      add_size c100000 add_align 10000000
    pci 0000:56:00.0: bridge window [mem size 0x24100000 64bit pref]: can't assign; no space
    pci 0000:56:00.0: bridge window [mem size 0x24100000 64bit pref]: failed to assign
    pci 0000:56:00.0: bridge window [mem 0x6028000000-0x603fffffff 64bit pref]: assigned
    pci 0000:56:00.0: bridge window [mem 0x6028000000-0x603fffffff 64bit pref]: failed to expand by 0xc100000
    pci 0000:56:00.0: bridge window [mem 0x6028000000-0x603fffffff 64bit pref]: failed to add optional c100000
    pci 0000:57:00.0: BAR 1 [mem 0x6030000000-0x603fffffff 64bit pref]: assigned
    

    The kernel knows 56:00.0 (the populated downstream bridge) needs to grow.
    It tries to add 0xc100000 (193 MB) more on top of its existing 384 MB and
    fails because the parent (55:00.0) has no slack — the three empty sibling
    bridges already consumed it.

    Final bridge windows

    Bridge Populated Prefetchable window
    00:07.2 (host root port) yes 96 GB
    55:00.0 (V2 inbound) yes 96 GB (passed through)
    56:00.0 (populated, GPU) YES 384 MB ← starved
    56:01.0 (empty) no 32 GB
    56:02.0 (empty) no 32 GB
    56:03.0 (empty) no 32 GB

    GPU BAR 1 sizes to 256 MB, the largest power-of-2 that fits inside the 384 MB
    window after subtracting BAR 3 (32 MB) + ROM + alignment.

    Pristine vs. patched comparison

    Two boots on the same hardware, only the cmdline differs. Same eGPU, same
    GPU, same physical port.

    • Pristine: default Ubuntu cmdline, no pci= parameters.
    • Patched: pci=realloc=on pci=hpmmiosize=256M,hpmmioprefsize=32G (the
      flags this thread typically points users toward).
    Bridge / BAR Pristine Patched
    00:07.2 window (host root port) 32 GB 96 GB
    55:00.0 window (V2 inbound) 32 GB 96 GB
    56:00.0 window (populated, GPU) 384 MB 384 MB
    Empty sibling windows (each) 10.5 GB 32 GB
    GPU BAR 1 256 MB 256 MB

    The hints grow everything except the bridge that actually has a 16 GB BAR
    request behind it. The redistribution pass triggered by pci=realloc=on cannot
    recover from this — by the time it tries to expand the populated bridge, the
    empty siblings have already been allocated their hint-sized windows.

    Where the fix probably lives

    The hot-plug bridge sizing path applies the pci=hpmmioprefsize hint
    uniformly across all hot-pluggable downstream ports of a switch, regardless of
    whether anything is currently behind those ports. When ReBAR is in play, this
    is exactly backwards: the populated port (whose downstream device advertises a
    large ReBAR) should get a larger share, not the same share, as empty ports.

    A reasonable heuristic: when distributing prefetchable budget across siblings
    of a hot-plug switch, weight by the maximum ReBAR size each downstream device
    advertises (default 0 for empty ports, the normal hint for cold-pluggable
    unknown topologies). This would let the populated port absorb most of the
    parent's window and leave each empty port with the minimum needed for future
    hot-plug events.

    What I tried that didn't work (for completeness)

    • pci=realloc=on (alone or with size hints) — no effect on populated bridge.
    • pci=hpmmiosize=256M,hpmmioprefsize=32G — grows empties, populated unchanged.
    • Live PCI rescan via sysfs (echo 1 > .../remove, then echo 1 > /sys/bus/pci/rescan) — leaves parent bridge with prefetchable window
      disabled. GPU doesn't re-enumerate cleanly. Worse, not better.

    • NVIDIA driver upgrade 570 → 580 — driver requests resize on a window that
      the allocator has already locked at 384 MB. No effect.

    • Cold boot with eGPU pre-attached — no help. BIOS doesn't size for ReBAR on
      this platform (Framework BIOS lacks the toggle), so the kernel sees the
      same starting state as a hotplug.

    Downstream impact

    Heavy GPU work over the 256 MB BAR triggers an NVIDIA full-chip-reset cascade
    (Xid 119: GSP RPC timeout → Xid 154: GPU Reset Required), reproduced via
    Fallout 76 on this hardware. Confirmed identical mechanism on both Xorg and
    Wayland sessions; only the blast radius differs (Xorg session takes the
    desktop with it, Wayland on PRIME on-demand keeps the iGPU compositor alive
    because it isn't dependent on the dead NVIDIA card).

Categories

Upcoming Training