Summary: The size of PERCPU is currently hard-coded to 8192 bytes. 4096-byte ____cacheline_internodealigned_in_smp (etc.) for vSMP can easily let structures be too big to be placed in PERCPU. Kernel modules using SRCU cannot be loaded and raise two warnings: "percpu: allocation failed, size=???? align=32 atomic=0, alloc from reserved chunk failed" and "my_module: Could not allocate ???? bytes percpu data", where ???? is at least 8192. References: 1. INTERNODE_CACHE_BYTES was 4096 for vSMP since https://github.com/torvalds/linux/commit/e405d067298b2b960bf20318e91ed842157c65bc. 2. https://github.com/torvalds/linux/blob/v4.19-rc4/include/linux/percpu.h#L17 The size of PERCPU area (reserved) is hard-coded (8<<10 = 8192). 3. https://github.com/torvalds/linux/blob/v4.19-rc4/include/linux/srcutree.h#L43 The Data structure for SRCU, namely the "struct srcu_data", has this field: spinlock_t __private lock ____cacheline_internodealigned_in_smp; An actual case: From 4.18, DRM uses "DEFINE_STATIC_SRCU(drm_unplug_srcu);" to protect flag "unplugged", https://github.com/torvalds/linux/commit/bee330f3d67273a68dcb99f59480d59553c008b2. When I set CONFIG_DRM=m (with CONFIG_VSMP=y), the drm.ko.xz cannot be loaded into kernel. I checked "readelf -S" and "objdump --syms --section=.data..percpu". There was "drm_unplug_srcu_srcu_data" in section ".data..percpu" with the size of 0x2000 bytes (= 8192).
This same problem popped up here: https://bugzilla.kernel.org/show_bug.cgi?id=202511
I've saw this issue too, a patch to fix it was submitted to lkml more than a month ago but it got no traction. see https://lkml.org/lkml/2019/1/21/578