Bug 214861 - UBSAN_OBJECT_SIZE=y results in a non-booting kernel (32 bit, i686)
Summary: UBSAN_OBJECT_SIZE=y results in a non-booting kernel (32 bit, i686)
Status: RESOLVED CODE_FIX
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Sanitizers (show other bugs)
Hardware: i386 Linux
: P1 normal
Assignee: MM/Sanitizers virtual assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-28 19:13 UTC by Erhard F.
Modified: 2022-02-07 16:54 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.15-rc7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel .config (kernel 5.15-rc7, Shuttle XPC FS51, Pentium 4) (105.12 KB, text/plain)
2021-10-28 19:13 UTC, Erhard F.
Details
kernel dmesg (kernel 5.15-rc7 without CONFIG_UBSAN_OBJECT_SIZE, Shuttle XPC FS51, Pentium 4) (38.30 KB, text/plain)
2021-10-28 19:16 UTC, Erhard F.
Details
kernel .config (kernel 5.16-rc3, Shuttle XPC FS51, Pentium 4) (105.47 KB, text/plain)
2021-11-30 21:35 UTC, Erhard F.
Details

Description Erhard F. 2021-10-28 19:13:05 UTC
Created attachment 299353 [details]
kernel .config (kernel 5.15-rc7, Shuttle XPC FS51, Pentium 4)

There seems to be a problem with UBSAN_OBJECT_SIZE, at least on my Pentium 4 box.

The machine boots fine with CONFIG_UBSAN_BOUNDS, UBSAN_ARRAY_BOUNDS, UBSAN_SHIFT, UBSAN_BOOL, UBSAN_ENUM, UBSAN_SANITIZE_ALL enabled but when I build the kernel with UBSAN_OBJECT_SIZE=y the machine won't boot at all.

The problem seems to be very early at boot as I don't get any output on screen or via netconsole, only a blinking grey cursor on an otherwise black screen...

Kernel is built with clang 12.0.1 on another machine (Ryzen 5950x) and copied over to the target machine via nfs.

The attached kernel config is the failing one, the attached dmesg is the same config without UBSAN_OBJECT_SIZE, just booting fine.
Comment 1 Erhard F. 2021-10-28 19:16:21 UTC
Created attachment 299355 [details]
kernel dmesg (kernel 5.15-rc7 without CONFIG_UBSAN_OBJECT_SIZE, Shuttle XPC FS51, Pentium 4)

 # lspci 
00:00.0 Host bridge: Silicon Integrated Systems [SiS] 651 Host (rev 02)
00:01.0 PCI bridge: Silicon Integrated Systems [SiS] AGP Port (virtual PCI-to-PCI bridge)
00:02.0 ISA bridge: Silicon Integrated Systems [SiS] SiS962 [MuTIOL Media IO] LPC Controller (rev 14)
00:02.1 SMBus: Silicon Integrated Systems [SiS] SiS961/2/3 SMBus controller
00:02.5 IDE interface: Silicon Integrated Systems [SiS] 5513 IDE Controller
00:02.7 Multimedia audio controller: Silicon Integrated Systems [SiS] SiS7012 AC'97 Sound Controller (rev a0)
00:03.0 USB controller: Silicon Integrated Systems [SiS] USB 1.1 Controller (rev 0f)
00:03.1 USB controller: Silicon Integrated Systems [SiS] USB 1.1 Controller (rev 0f)
00:03.2 USB controller: Silicon Integrated Systems [SiS] USB 1.1 Controller (rev 0f)
00:03.3 USB controller: Silicon Integrated Systems [SiS] USB 2.0 Controller
00:0a.0 Network controller: Ralink corp. RT2500 Wireless 802.11bg (rev 01)
00:0f.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8100/8101L/8139 PCI Fast Ethernet Adapter (rev 10)
00:10.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller (rev 46)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV350 [Radeon 9550/9600/X1050 Series]
01:00.1 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] RV350 [Radeon 9550/9600/X1050 Series] (Secondary)


 # lscpu 
Architecture:           i686
  CPU op-mode(s):       32-bit
  Address sizes:        36 bits physical, 32 bits virtual
  Byte Order:           Little Endian
CPU(s):                 2
  On-line CPU(s) list:  0,1
Vendor ID:              GenuineIntel
  Model name:           Intel(R) Pentium(R) 4 CPU 3.06GHz
    CPU family:         15
    Model:              2
    Thread(s) per core: 2
    Core(s) per socket: 1
    Socket(s):          1
    Stepping:           7
    BogoMIPS:           6149.42
    Flags:              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush 
                        dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cpuid cid xtpr
Vulnerabilities:        
  Itlb multihit:        Processor vulnerable
  L1tf:                 Vulnerable
  Mds:                  Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
  Meltdown:             Vulnerable
  Spec store bypass:    Vulnerable
  Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:           Mitigation; Full generic retpoline, STIBP disabled, RSB filling
  Srbds:                Not affected
  Tsx async abort:      Not affected
Comment 2 Erhard F. 2021-11-30 21:35:49 UTC
Created attachment 299803 [details]
kernel .config (kernel 5.16-rc3, Shuttle XPC FS51, Pentium 4)

Also happens on 5.16-rc3.
Comment 3 Marco Elver 2021-11-30 23:39:26 UTC
Thanks for reminding, there was a recent patch that also highlighted that UBSAN_OBJECT_SIZE is broken:

https://lkml.kernel.org/r/20211111003519.1050494-1-tadeusz.struk@linaro.org

I've done a preliminary analysis of fsanitize=object-size, and it appears that it's only rarely reporting real UB and could in fact be a compiler-warning instead. I'm pondering removing UBSAN_OBJECT_SIZE from the kernel. I'll update this bug with more details if I confirm my suspicions.
Comment 4 Marco Elver 2021-12-03 10:41:45 UTC
As a reminder, in its current implementation, fsanitize=object-size relies on __builtin_object_size of the converted-from object to determine if the converted-to object could -- due to its size -- __potentially__ access out-of-bounds bytes.

Also UBSAN_OBJECT_SIZE is already unavailable in kernel builds with GCC, so the below is Clang-specific.

Example program that illustrates what [1] is doing:

---
// clang -O2 -fsanitize=object-size ubsan.c
struct sk_buff_head {
  volatile struct sk_buff_head *prev;
  volatile struct sk_buff_head *next;
};
struct sk_buff {
  struct sk_buff_head list;
  int foo;
  long bar;
  short baz;
};
void ubsan_fail(struct sk_buff *skb)
{
  skb->list.next++;
#if 0 // uncommenting this no longer causes failure! strange, isn't it?!
  skb->list.prev++;
#endif
}
int skbuf_xmit(struct sk_buff *skb)
{
  struct sk_buff_head list = {};
  _Static_assert(__builtin_object_size(&list, 0) == sizeof(struct sk_buff_head), "");
  _Static_assert(_Alignof(struct sk_buff_head) == _Alignof(struct sk_buff), ""); // not UB!
  ubsan_fail((struct sk_buff *)&list);
  return 42;
}
int main()
{
  struct sk_buff s = {};
  return skbuf_xmit(&s);
}
---

I was really struggling to understand why a small change (see the #if 0) would suddenly no longer trigger fsanitize=object-size. I guess this has something to do with what optimizations are applied and the visibility of __builtin_object_size.

Furthermore, as is evident from the first _Static_assert, it's clear the size of 'list' is always known to the compiler, which fsanitize=object-size relies on. Yet subtle code changes no longer trigger UBSan.

And in the above there is no UB as far as I can interpret the C standard -- "6.3.2.3 Pointers": "A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined."

So what does fsanitize=object-size actually try to catch? Per documentation [2] "an attempt to potentially use bytes [...]" -- yet, its report is generated on the cast itself, not on the access.

It doesn't actually catch UB pointer casts, and it seems its only purpose is to report on _potential for_ out-of-bounds accesses, although it's only guessing. To catch UB pointer casts due to alignment issues we have -Wcast-align. And to catch real out-of-bounds we have ASan (in kernel, KASAN).

I think the major clue here is this line:
https://github.com/llvm/llvm-project/blob/c41b318423c4dbf0a65f81e5e9a816c1710ba4f6/clang/lib/CodeGen/CGExpr.cpp#L733
, which has existed since its inception: https://github.com/llvm/llvm-project/commit/69d0d2626a4f5

If the plan was always to rely on ASan or some other dynamic check to check for OOB, this has never happened, and fsanitize=object-size is therefore incomplete.

Like a compiler-warning, fsanitize=object-size appears to report on poor design or interface choices, but is oblivious if a real out-of-bounds access occurred. This sort of checking should not happen at runtime.

So my conclusion from this is:

        1. fsanitize=object-size is unreliable, and cannot
           deterministically be triggered.

        2. It does not actually report real UB if the objects have
           compatible alignment and no out-of-bounds bytes are accessed.

        3. fsanitize=object-size should be a compiler warning.

I think there are already several compiler warnings that can catch real OOB where the compiler knows the object size (-Warray-bounds), and for the rest there is ASan (in kernel, KASAN).

I would therefore propose to remove UBSAN_OBJECT_SIZE, given the above and the fact it's already unusable in the Linux kernel.

[1] https://lkml.kernel.org/r/20211111003519.1050494-1-tadeusz.struk@linaro.org
[2] https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
Comment 5 Erhard F. 2021-12-03 17:55:34 UTC
Thanks for the insight! I did realize that using clang for building the kernel entrails a few specificities but I didn't know UBSAN_OBJECT_SIZE was one of them.
Comment 6 Marco Elver 2021-12-06 09:57:04 UTC
Patch to remove UBSAN_OBJECT_SIZE was sent: https://lkml.kernel.org/r/20211203235346.110809-1-keescook@chromium.org
Comment 7 Erhard F. 2022-02-07 16:54:13 UTC
Just revisited the issue on v5.17-rc3. Seems your patch was accepted as UBSAN_OBJECT_SIZE is no longer available when building a x86_32 kernel with clang.

Closing as fixed. Thanks!

Note You need to log in before you can comment on or make changes to this bug.