I see a hang on boot whenever DAMON is enabled. The specific commit that causes this is listed below. There is no printk / dmesg output, only the message about an initrd being loaded by EFIStup. Then a hard hang. Removing the commit below - or disabling DAMON entirely - fixes the issue. commit 059342d1dd4e01d634184793fa3f8437e62afaa1 Author: Hailong Tu <tuhailong@gmail.com> Date: Fri Apr 29 14:37:00 2022 -0700 mm/damon/reclaim: fix the timer always stays active The timer stays active even if the reclaim mechanism is never enabled. It is unnecessary overhead can be completely avoided by using module_param_cb() for enabled flag. Link: https://lkml.kernel.org/r/20220421125910.1052459-1-tuhailong@gmail.com Signed-off-by: Hailong Tu <tuhailong@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> ver_linux: If some fields are empty or look unusual you may have an old version. Compare to the current minimal requirements in Documentation/Changes. Linux surprise 5.18.0-GTW1 #1285 SMP PREEMPT_DYNAMIC Sat Jun 4 11:40:01 EDT 2022 x86_64 GNU/Linux GNU C 12.1.0 GNU Make 4.3 Binutils 2.38 Util-linux 2.38 Mount 2.38 Module-init-tools 29 E2fsprogs 1.46.5 Jfsutils 1.1.15 Reiserfsprogs 3.6.27 Xfsprogs 5.18.0 PPP 2.4.9 Nfs-utils 2.6.1 Bison 3.8.2 Flex 2.6.4 Linux C++ Library 6.0.30 Dynamic linker (ldd) 2.35 Procps 3.3.17 Net-tools 2.10 Kbd 2.5.0 Console-tools 2.5.0 Sh-utils 9.1 Udev 251 Wireless-tools 30 lspci: ~/linux ⌂ v5.18 … lspci 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge 00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge 00:03.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge 00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 3 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 7 01:00.0 Non-Volatile memory controller: Sandisk Corp WD PC SN810 / Black SN850 NVMe SSD (rev 01) 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch Upstream 03:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:05.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:0a.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 05:00.0 Non-Volatile memory controller: Sandisk Corp WD PC SN810 / Black SN850 NVMe SSD (rev 01) 06:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a) 07:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) 08:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 01) 09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP 09:00.1 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller 09:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller 0a:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51) 0b:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51) 0c:00.0 Non-Volatile memory controller: Intel Corporation Optane SSD 900P Series 0d:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c7) 0e:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch 0f:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 24 [Radeon RX 6400 / 6500 XT] (rev c7) 0f:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller 10:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function 11:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP 11:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP 11:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller 11:00.4 Audio device: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller
Update: The hang happens only when the kernel parameter is on the kernel command line (damon_reclaim.enabled=Y) Enabling DAMON_RECLAIM and then enabling by writing Y to /sys/module/damon_reclaim/parameters/enabled after boot does not cause a hang.
That was poorly written. Enabling DAMON_RECLAIM in the kernel config, but not enabling it on the kernel command line, doesn't hang - even when DAMON_RECLAIM is subsequently enabled after boot.
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Sat, 04 Jun 2022 15:49:50 +0000 bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=216072 > > Bug ID: 216072 > Summary: regression: > ccccccgcdkgekhjervgbdfbhdjugcjkfdhiegeuugugtHang at > boot when DAMON is enabled > Product: Memory Management > Version: 2.5 > Kernel Version: 5.19 pre-rc1 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > Assignee: akpm@linux-foundation.org > Reporter: gwhite@kupulau.com > Regression: No > > I see a hang on boot whenever DAMON is enabled. The specific commit that > causes this is listed below. There is no printk / dmesg output, only the > message about an initrd being loaded by EFIStup. Then a hard hang. Removing > the commit below - or disabling DAMON entirely - fixes the issue. > > commit 059342d1dd4e01d634184793fa3f8437e62afaa1 > Author: Hailong Tu <tuhailong@gmail.com> > Date: Fri Apr 29 14:37:00 2022 -0700 > > mm/damon/reclaim: fix the timer always stays active > > The timer stays active even if the reclaim mechanism is never enabled. > It > is unnecessary overhead can be completely avoided by using > module_param_cb() for enabled flag. > > Link: > https://lkml.kernel.org/r/20220421125910.1052459-1-tuhailong@gmail.com > Signed-off-by: Hailong Tu <tuhailong@gmail.com> > Reviewed-by: SeongJae Park <sj@kernel.org> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > > ver_linux: > > If some fields are empty or look unusual you may have an old version. > Compare to the current minimal requirements in Documentation/Changes. > > Linux surprise 5.18.0-GTW1 #1285 SMP PREEMPT_DYNAMIC Sat Jun 4 11:40:01 EDT > 2022 x86_64 GNU/Linux > > GNU C 12.1.0 > GNU Make 4.3 > Binutils 2.38 > Util-linux 2.38 > Mount 2.38 > Module-init-tools 29 > E2fsprogs 1.46.5 > Jfsutils 1.1.15 > Reiserfsprogs 3.6.27 > Xfsprogs 5.18.0 > PPP 2.4.9 > Nfs-utils 2.6.1 > Bison 3.8.2 > Flex 2.6.4 > Linux C++ Library 6.0.30 > Dynamic linker (ldd) 2.35 > Procps 3.3.17 > Net-tools 2.10 > Kbd 2.5.0 > Console-tools 2.5.0 > Sh-utils 9.1 > Udev 251 > Wireless-tools 30 > > lspci: > > ~/linux ⌂ v5.18 … lspci > 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root > Complex > 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU > 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe > Dummy Host Bridge > 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP > Bridge > 00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP > Bridge > 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe > Dummy Host Bridge > 00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe > Dummy Host Bridge > 00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP > Bridge > 00:03.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP > Bridge > 00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe > Dummy Host Bridge > 00:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe > Dummy Host Bridge > 00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe > Dummy Host Bridge > 00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse > Internal PCIe GPP Bridge 0 to bus[E:B] > 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe > Dummy Host Bridge > 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse > Internal PCIe GPP Bridge 0 to bus[E:B] > 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev > 61) > 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev > 51) > 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data > Fabric: Device 18h; Function 0 > 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data > Fabric: Device 18h; Function 1 > 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data > Fabric: Device 18h; Function 2 > 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data > Fabric: Device 18h; Function 3 > 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data > Fabric: Device 18h; Function 4 > 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data > Fabric: Device 18h; Function 5 > 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data > Fabric: Device 18h; Function 6 > 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data > Fabric: Device 18h; Function 7 > 01:00.0 Non-Volatile memory controller: Sandisk Corp WD PC SN810 / Black > SN850 > NVMe SSD (rev 01) > 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch > Upstream > 03:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP > Bridge > 03:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP > Bridge > 03:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP > Bridge > 03:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP > Bridge > 03:05.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP > Bridge > 03:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP > Bridge > 03:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP > Bridge > 03:0a.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP > Bridge > 04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD > Controller SM981/PM981/PM983 > 05:00.0 Non-Volatile memory controller: Sandisk Corp WD PC SN810 / Black > SN850 > NVMe SSD (rev 01) > 06:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a) > 07:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network > Connection > (rev 03) > 08:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE > Controller (rev 01) > 09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. > [AMD] Starship/Matisse Reserved SPP > 09:00.1 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 > Host > Controller > 09:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 > Host > Controller > 0a:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA > Controller > [AHCI mode] (rev 51) > 0b:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA > Controller > [AHCI mode] (rev 51) > 0c:00.0 Non-Volatile memory controller: Intel Corporation Optane SSD 900P > Series > 0d:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL > Upstream > Port of PCI Express Switch (rev c7) > 0e:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL > Downstream Port of PCI Express Switch > 0f:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] > Navi > 24 [Radeon RX 6400 / 6500 XT] (rev c7) > 0f:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 > HDMI/DP > Audio Controller > 10:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. > [AMD] Starship/Matisse PCIe Dummy Function > 11:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. > [AMD] Starship/Matisse Reserved SPP > 11:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD] > Starship/Matisse Cryptographic Coprocessor PSPCPP > 11:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 > Host > Controller > 11:00.4 Audio device: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD > Audio Controller > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You are the assignee for the bug.
Cc-ing damon@lists.linux.dev Thank you for reporting this, Greg! And thank you for forwarding this, Andrew! On Sat, 4 Jun 2022 11:27:06 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Sat, 04 Jun 2022 15:49:50 +0000 bugzilla-daemon@kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=216072 > > > > Bug ID: 216072 > > Summary: regression: > > ccccccgcdkgekhjervgbdfbhdjugcjkfdhiegeuugugtHang at > > boot when DAMON is enabled > > Product: Memory Management > > Version: 2.5 > > Kernel Version: 5.19 pre-rc1 > > Hardware: All > > OS: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Other > > Assignee: akpm@linux-foundation.org > > Reporter: gwhite@kupulau.com > > Regression: No > > > > I see a hang on boot whenever DAMON is enabled. The specific commit that > > causes this is listed below. There is no printk / dmesg output, only the > > message about an initrd being loaded by EFIStup. Then a hard hang. > Removing > > the commit below - or disabling DAMON entirely - fixes the issue. > > > > commit 059342d1dd4e01d634184793fa3f8437e62afaa1 > > Author: Hailong Tu <tuhailong@gmail.com> > > Date: Fri Apr 29 14:37:00 2022 -0700 > > > > mm/damon/reclaim: fix the timer always stays active > > > > The timer stays active even if the reclaim mechanism is never enabled. > It > > is unnecessary overhead can be completely avoided by using > > module_param_cb() for enabled flag. > > > > Link: > > https://lkml.kernel.org/r/20220421125910.1052459-1-tuhailong@gmail.com > > Signed-off-by: Hailong Tu <tuhailong@gmail.com> > > Reviewed-by: SeongJae Park <sj@kernel.org> > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Greg has further mentioned that the issue can be reproduced when the kernel is booting with damon_reclaim.enabled=Y parameter, and I was also reproducible on my test machine. DAMON_RECLAIM calls 'schedule_delayed_work()', which uses 'system_wq', from a parameter store callback ('enabled_store()'), which is called from 'parse_args()', which is again called from 'start_kernel()'. And 'system_wq' is initialized from 'workqueue_init_early()', which is called from 'start_kernel()' after 'parse_args()'. Therefore the 'schedule_delayed_work()' touches the uninitialized 'system_wq', and the init process gets kernel NULL pointer dereference, and the system hangs. I further confirmed below simple change fixes this issue. I will format it as a patch and send soon. diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c index 53c0c084f046..78984c8d1047 100644 --- a/mm/damon/reclaim.c +++ b/mm/damon/reclaim.c @@ -374,6 +374,8 @@ static void damon_reclaim_timer_fn(struct work_struct *work) } static DECLARE_DELAYED_WORK(damon_reclaim_timer, damon_reclaim_timer_fn); +static bool damon_reclaim_initialized; + static int enabled_store(const char *val, const struct kernel_param *kp) { @@ -382,6 +384,9 @@ static int enabled_store(const char *val, if (rc < 0) return rc; + if (!damon_reclaim_initialized) + return rc; + if (enabled) schedule_delayed_work(&damon_reclaim_timer, 0); @@ -450,6 +455,8 @@ static int __init damon_reclaim_init(void) damon_add_target(ctx, target); schedule_delayed_work(&damon_reclaim_timer, 0); + + damon_reclaim_initialized = true; return 0; } Thanks, SJ [...]