Bug 206181 - [PATCH] x86_32: Panic caused by systemd-udevd on Hyper-V (triggered by memory hot-add)
Summary: [PATCH] x86_32: Panic caused by systemd-udevd on Hyper-V (triggered by memory...
Status: NEW
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-01-13 08:38 UTC by Taketo Kabe
Modified: 2020-01-23 13:51 UTC (History)
0 users

See Also:
Kernel Version: 4.19.95
Tree: Mainline
Regression: No


Attachments
add printk's to see hv_balloon memory hot-add function (6.41 KB, patch)
2020-01-17 12:24 UTC, Taketo Kabe
Details | Diff
do not __online_page_free() the hot-added memory in hv_balloon (615 bytes, patch)
2020-01-22 06:57 UTC, Taketo Kabe
Details | Diff

Description Taketo Kabe 2020-01-13 08:38:02 UTC
When compiling vanilla kernel-4.19.95 for i586 (x86_32) and using
i686 userland provided for CentOS 8, the kernel panics as Hyper-V guest.

Real machines (Pentium 4, Pentium M) doesn't panic, so this also may be
related to Hyper-V drivers.

[   68.520975] BUG: unable to handle kernel paging request at ebfff000
[   68.522740] *pde = 00000000
[   68.523462] Oops: 0002 [#1] SMP
[   68.524961] CPU: 0 PID: 1271 Comm: systemd-udevd Not tainted 4.19.95-1.el8.i586 #1
[   68.527320] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012
[   68.527788] EIP: wp_page_copy+0x9c/0x780
[   68.527788] Code: 75 e8 85 f6 0f 84 9c 05 00 00 8b 45 e8 e8 1c 5d e7 ff 89 45 d4 8b 45 e4 e8 11 5d e7 ff 8b 55 d4 8d 78 04 8b 0a 83 e7 fc 89 d6 <89> 08 8b 8a fc 0f 00 00 89 88 fc 0f 00 00 89 c1 29 f9 89 55 d4 29
[   68.527788] EAX: ebfff000 EBX: dfb8ff04 ECX: 02535c58 EDX: d6353000
[   68.527788] ESI: d6353000 EDI: ebfff004 EBP: dfb8fec8 ESP: dfb8fe98
[   68.527788] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210282
[   68.527788] CR0: 80050033 CR2: ebfff000 CR3: 1e4aa000 CR4: 003406d0
[   68.527788] Call Trace:
[   68.527788]  do_wp_page+0x8a/0x600
[   68.527788]  handle_mm_fault+0x8b0/0xfb0
[   68.527788]  __do_page_fault+0x1c3/0x480
[   68.527788]  do_page_fault+0x25/0xf0
[   68.527788]  ? __do_page_fault+0x480/0x480
[   68.527788]  common_exception+0x11d/0x12e
[   68.527788] EIP: 0xb7b10e7c
[   68.527788] Code: 85 db 0f 85 da 01 00 00 29 fe 83 fe 0f 0f 86 e5 00 00 00 8b 45 40 8d 14 39 8b 5c 24 0c 39 58 0c 0f 85 5b 01 00 00 8b 5c 24 0c <89> 42 08 89 5a 0c 89 55 40 89 50 0c 81 ff ef 03 00 00 77 03 89 55
[   68.527788] EAX: b7c377d8 EBX: b7c377d8 ECX: 0251c7e8 EDX: 0251c898
[   68.527788] ESI: 00000090 EDI: 000000b0 EBP: b7c377a0 ESP: bf893460
[   68.527788] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00210246
[   68.527788] Modules linked in: xfs libfc rfkill zram intel_rapl_perf sg pcspkr i2c_piix4 hv_utils hv_balloon joydev ext4 mbcache jbd2 loop nls_utf8 isofs sr_mod cdrom ata_generic 8021q garp mrp stp llc sd_mod ata_piix hv_netvsc hyperv_keyboard hid_hyperv libata hv_storvsc scsi_transport_fc hyperv_fb crc32_pclmul serio_raw hv_vmbus sunrpc xts lrw dm_crypt dm_round_robin dm_multipath dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_zero dm_mod linear raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_intel raid1 raid0 iscsi_ibft squashfs cramfs be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 libcxgbi libcxgb iscsi_tcp libiscsi_tcp libiscsi edd scsi_transport_iscsi
[   68.527788] CR2: 00000000ebfff000
[   68.527788] ---[ end trace e948b492d0b4535b ]---
[   68.527788] EIP: wp_page_copy+0x9c/0x780
[   68.527788] Code: 75 e8 85 f6 0f 84 9c 05 00 00 8b 45 e8 e8 1c 5d e7 ff 89 45 d4 8b 45 e4 e8 11 5d e7 ff 8b 55 d4 8d 78 04 8b 0a 83 e7 fc 89 d6 <89> 08 8b 8a fc 0f 00 00 89 88 fc 0f 00 00 89 c1 29 f9 89 55 d4 29
[   68.527788] EAX: ebfff000 EBX: dfb8ff04 ECX: 02535c58 EDX: d6353000
[   68.527788] ESI: d6353000 EDI: ebfff004 EBP: dfb8fec8 ESP: d2c633bc
[   68.527788] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210282
[   68.527788] CR0: 80050033 CR2: ebfff000 CR3: 1e4aa000 CR4: 003406d0
[   68.527788] Kernel panic - not syncing: Fatal exception
[   68.527788] Kernel Offset: 0x11000000 from 0xc1000000 (relocation range: 0xc0000000-0xe07effff)
[   68.527788] ---[ end Kernel panic - not syncing: Fatal exception ]---
Comment 1 Taketo Kabe 2020-01-13 08:44:18 UTC
I've tracked this down to "udevadm trigger --type=devices --action=add"
invoked from /usr/lib/systemd/system/systemd-udev-trigger.service .

To control when the panic occurs:
- Boot the kernel with "init=/bin/sh" kernel option.
- In the init shell, type
  sh# chmod -x /usr/bin/udevadm
  to make systemd-udev service fail.
- Continue booting by typing
  sh# exec /usr/lib/systemd/systemd

The system doesn't panic when udevadm is disabled.

To cause the panic:
- sh# chmod +x /usr/bin/udevadm
- sh# udevadm trigger --type=devices --action=add

After 46 seconds, the kernel will panic.
Invoking "udevadm monitor &" beforehand does not log anything suspicious;
udevd seems quiscent.

Since systemd-udevd is related, this bug may not be a memory management
but something related to systemd-udevd<=>kernel interface (inotify?)
Comment 2 Taketo Kabe 2020-01-13 08:46:31 UTC
FYI: kernel-4.19.94 also panics, but at different location:

[  202.109493] BUG: unable to handle kernel paging request at eb800000
[  202.114991] *pde = 00000000
[  202.115635] Oops: 0002 [#1] SMP
[  202.116538] CPU: 0 PID: 184 Comm: kworker/0:4 Not tainted 4.19.94-1.el8.i586 #1
[  202.116882] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012
[  202.116882] Workqueue: events hot_add_req [hv_balloon]
[  202.116882] EIP: sparse_add_one_section+0xcb/0x12e
[  202.116882] Code: 45 ec e8 ce 5b 00 00 8b 55 e4 89 45 e8 89 d0 c1 e0 04 8d b0 c0 f7 dc c4 f6 80 c0 f7 dc c4 01 75 44 b0 ff b9 00 00 0a 00 89 df <f3> aa 89 f0 2d c0 f7 dc c4 c1 f8 04 3b 05 80 f7 dc c4 7e 05 a3 80
[  202.116882] EAX: 000000ff EBX: eb800000 ECX: 000a0000 EDX: 0000000d
[  202.116882] ESI: c4dcf890 EDI: eb800000 EBP: df8bbe48 ESP: df8bbe2c
[  202.116882] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010046
[  202.116882] CR0: 80050033 CR2: eb800000 CR3: 014fb000 CR4: 003406d0
[  202.116882] Call Trace:
[  202.116882]  __add_pages+0x89/0x100
[  202.116882]  arch_add_memory+0x3c/0x50
[  202.116882]  add_memory_resource+0x125/0x180
[  202.116882]  __add_memory+0xad/0x130
[  202.116882]  add_memory+0x2c/0x3a
[  202.116882]  hot_add_req+0x3de/0x60b [hv_balloon]
[  202.116882]  process_one_work+0x17b/0x340
[  202.116882]  worker_thread+0x39/0x3d0
[  202.116882]  kthread+0xf0/0x110
[  202.116882]  ? pwq_unbound_release_workfn+0xc0/0xc0
[  202.116882]  ? kthread_bind+0x30/0x30
[  202.116882]  ret_from_fork+0x2e/0x40
[  202.116882] Modules linked in: intel_rapl_perf pcspkr i2c_piix4 sg hv_balloon hv_utils joydev rfkill zram ext4 mbcache jbd2 loop nls_utf8 isofs sr_mod cdrom sd_mod ata_generic 8021q garp mrp stp llc hv_netvsc hv_storvsc scsi_transport_fc hyperv_keyboard hid_hyperv ata_piix libata hyperv_fb crc32_pclmul hv_vmbus serio_raw sunrpc xts lrw dm_crypt dm_round_robin dm_multipath dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_zero dm_mod linear raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_intel raid1 raid0 iscsi_ibft squashfs cramfs be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 libcxgbi libcxgb iscsi_tcp libiscsi_tcp libiscsi edd scsi_transport_iscsi
[  202.116882] CR2: 00000000eb800000
[  202.116882] ---[ end trace 8cf405307c8a67b5 ]---
[  202.116882] EIP: sparse_add_one_section+0xcb/0x12e
[  202.116882] Code: 45 ec e8 ce 5b 00 00 8b 55 e4 89 45 e8 89 d0 c1 e0 04 8d b0 c0 f7 dc c4 f6 80 c0 f7 dc c4 01 75 44 b0 ff b9 00 00 0a 00 89 df <f3> aa 89 f0 2d c0 f7 dc c4 c1 f8 04 3b 05 80 f7 dc c4 7e 05 a3 80
[  202.116882] EAX: 000000ff EBX: eb800000 ECX: 000a0000 EDX: 0000000d
[  202.116882] ESI: c4dcf890 EDI: eb800000 EBP: df8bbe48 ESP: c4c633bc
[  202.116882] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010046
[  202.116882] CR0: 80050033 CR2: eb800000 CR3: 014fb000 CR4: 003406d0
[  202.116882] Kernel panic - not syncing: Fatal exception
[  202.116882] Kernel Offset: 0x3000000 from 0xc1000000 (relocation range: 0xc0000000-0xe07effff)
[  202.116882] ---[ end Kernel panic - not syncing: Fatal exception ]---
Comment 3 Taketo Kabe 2020-01-13 09:57:15 UTC
After experimenting,
disabling hv_balloon driver by
"module_blacklist=hv_balloon" kernel option
will not cause the panic.

Seems something wrong for 32-bit drivers/hv/hv_balloon.c driver.
Comment 4 Taketo Kabe 2020-01-13 11:57:25 UTC
By passing module parameter via kernel commandline by
"hv_balloon.pressure_report_delay=60", the time between
"udevadm trigger --type=devices --action=add" and panic changes accordingly,
so something after pressure_report_delay expires
(default static uint pressure_report_delay=45)
seems to be wrong, in drivers/hv/hv_balloon.c:post_status() .

It doesn't seem to be wrong anywhere though from my eyes...
Comment 5 Taketo Kabe 2020-01-15 11:15:02 UTC
adding "hv_balloon.hot_add=0" kernel commandine seems to
make it not panic.

Something seems wrong with 32-bit memory hot-add codepath.
I've also seen several times a panic going through 
hot_add_req() [hv_balloon] .
Comment 6 Taketo Kabe 2020-01-16 04:04:25 UTC
Have succeeded in capturing a panic involving drivers/hv/hv_balloon.c:hot_add_req() .
Reproducability unknown.

[   73.268544] BUG: unable to handle kernel paging request at eb800000
[   73.270157] *pde = 00000000
[   73.271192] Oops: 0002 [#1] SMP
[   73.271897] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 4.19.95-1.el8.i586 #1
[   73.273535] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012
[   73.275487] Workqueue: events hot_add_req [hv_balloon]
[   73.276610] EIP: sparse_add_one_section+0xcb/0x12e
[   73.277715] Code: 45 ec e8 ce 5b 00 00 8b 55 e4 89 45 e8 89 d0 c1 e0 04 8d b0 c0 f7 1c c6 f6 80 c0 f7 1c c6 01 75 44 b0 ff b9 00 00 0a 00 89 df <f3> aa 89 f0 2d c0 f7 1c c6 c1 f8 04 3b 05 80 f7 1c c6 7e 05 a3 80
[   73.277753] EAX: 000000ff EBX: eb800000 ECX: 000a0000 EDX: 0000000d
[   73.277753] ESI: c61cf890 EDI: eb800000 EBP: dbd2be48 ESP: dbd2be2c
[   73.277753] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010046
[   73.277753] CR0: 80050033 CR2: eb800000 CR3: 06228000 CR4: 003406d0
[   73.277753] Call Trace:
[   73.277753]  __add_pages+0x89/0x100
[   73.277753]  arch_add_memory+0x3c/0x50
[   73.277753]  add_memory_resource+0x125/0x180
[   73.277753]  __add_memory+0xad/0x130
[   73.277753]  add_memory+0x2c/0x3a
[   73.277753]  hot_add_req+0x3de/0x60b [hv_balloon]
[   73.277753]  process_one_work+0x17b/0x340
[   73.277753]  worker_thread+0x39/0x3d0
[   73.277753]  kthread+0xf0/0x110
[   73.277753]  ? pwq_unbound_release_workfn+0xc0/0xc0
[   73.277753]  ? kthread_bind+0x30/0x30
[   73.277753]  ret_from_fork+0x2e/0x40
[   73.277753] Modules linked in: xfs libfc rfkill zram intel_rapl_perf i2c_piix4 pcspkr sg hv_utils hv_balloon joydev ext4 mbcache jbd2 loop nls_utf8 isofs sr_mod cdrom sd_mod 8021q garp ata_generic mrp stp llc hv_netvsc hv_storvsc scsi_transport_fc hyperv_keyboard hid_hyperv ata_piix libata hyperv_fb crc32_pclmul hv_vmbus serio_raw sunrpc xts lrw dm_crypt dm_round_robin dm_multipath dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_zero dm_mod linear raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_intel raid1 raid0 iscsi_ibft squashfs cramfs be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 libcxgbi libcxgb iscsi_tcp libiscsi_tcp libiscsi edd scsi_transport_iscsi
[   73.277753] CR2: 00000000eb800000
[   73.277753] ---[ end trace 6d1ee839ab38607d ]---
[   73.277753] EIP: sparse_add_one_section+0xcb/0x12e
[   73.277753] Code: 45 ec e8 ce 5b 00 00 8b 55 e4 89 45 e8 89 d0 c1 e0 04 8d b0 c0 f7 1c c6 f6 80 c0 f7 1c c6 01 75 44 b0 ff b9 00 00 0a 00 89 df <f3> aa 89 f0 2d c0 f7 1c c6 c1 f8 04 3b 05 80 f7 1c c6 7e 05 a3 80
[   73.277753] EAX: 000000ff EBX: eb800000 ECX: 000a0000 EDX: 0000000d
[   73.277753] ESI: c61cf890 EDI: eb800000 EBP: dbd2be48 ESP: c60633bc
[   73.277753] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010046
[   73.277753] CR0: 80050033 CR2: eb800000 CR3: 06228000 CR4: 003406d0
[   73.277753] Kernel panic - not syncing: Fatal exception
[   73.277753] Kernel Offset: 0x4400000 from 0xc1000000 (relocation range: 0xc0000000-0xe07effff)
[   73.277753] ---[ end Kernel panic - not syncing: Fatal exception ]---
Comment 7 Taketo Kabe 2020-01-17 12:24:14 UTC
Created attachment 286859 [details]
add printk's to see hv_balloon memory hot-add function

Not for production
Comment 8 Taketo Kabe 2020-01-17 12:30:04 UTC
Patched hv_balloon.c with lots of printk's (pr_info()).
I first thought hv_balloon was adding memory outside MAXMEM range,
but actually it wasn't the case; it was failing after 
first add_memory() of 128MB.

Is hot-add add_memory() in X86_32 functional?


[   56.719004] hv_balloon: Max. dynamic memory size: 1048576 MB
[   57.275031] hv_balloon: Received DM_MEM_HOT_ADD_REQUEST size 24
[   57.277162] hv_balloon: DM_MEM_HOT_ADD_REQUEST received size 24, should be 16
[   57.280537] hv_balloon: Received partial hot-add request
[   57.282442] hv_balloon: ha_wrk.ha_page_range{.finfo.start_page=65536, .finfo.page_cnt=16896}
[   57.284524] hv_balloon: ha_wrk.ha_region_range{.finfo.start_page=65536, .finfo.page_cnt=229888}
[   57.287729] hv_balloon: Invoking process_hot_add(pg_start=65536, pfn_cnt=16896, tg_start=65536, rg_sz=229888
[   57.290013] hv_balloon: process_hot_add: rg_size=229888 ha_region={ .start_pfn=65536, .ha_end_pfn=65536, .covered_start_pfn=65536, .covered_end_pfn=65536, .end_pfn=295424 }
[   57.293727] hv_balloon: add_memory(nid=0 PFN_PHYS(start_pfn=65536)=0x10000000, HA_CHUNK<<PAGE_SHIFT=134217728)
[   57.557961] BUG: unable to handle kernel paging request at d3fff000
[   57.564008] *pde = 00000000
[   57.566879] Oops: 0002 [#1] SMP
[   57.567597] CPU: 0 PID: 405 Comm: systemd-udevd Tainted: G            E     4.19.95-1.el8.i586 #1
[   57.567597] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012
[   57.567597] EIP: wp_page_copy+0x9c/0x780
[   57.567597] Code: 75 e8 85 f6 0f 84 9c 05 00 00 8b 45 e8 e8 1c 5d e7 ff 89 45 d4 8b 45 e4 e8 11 5d e7 ff 8b 55 d4 8d 78 04 8b 0a 83 e7 fc 89 d6 <89> 08 8b 8a fc 0f 00 00 89 88 fc 0f 00 00 89 c1 29 f9 89 55 d4 29
[   57.567597] EAX: d3fff000 EBX: c566df04 ECX: 00000000 EDX: c3868000
[   57.567597] ESI: c3868000 EDI: d3fff004 EBP: c566dec8 ESP: c566de98
[   57.567597] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210282
[   57.567597] CR0: 80050033 CR2: d3fff000 CR3: 051bc000 CR4: 003406d0
[   57.567597] Call Trace:
[   57.567597]  do_wp_page+0x8a/0x600
[   57.567597]  handle_mm_fault+0x8b0/0xfb0
[   57.567597]  __do_page_fault+0x1c3/0x480
[   57.567597]  do_page_fault+0x25/0xf0
[   57.567597]  ? __do_page_fault+0x480/0x480
[   57.567597]  common_exception+0x11d/0x12e
[   57.567597] EIP: 0xb7b3f104
[   57.567597] Code: 29 f9 89 4c 24 10 83 f9 0f 0f 86 92 00 00 00 8b 45 40 8d 14 3e 8b 4c 24 0c 39 48 0c 75 74 8b 4c 24 0c 81 7c 24 10 ef 03 00 00 <89> 42 08 89 4a 0c 89 55 40 89 50 0c 76 0e c7 42 10 00 00 00 00 c7
[   57.567597] EAX: b7c657d8 EBX: 00001190 ECX: b7c657d8 EDX: 0169a108
[   57.567597] ESI: 016990f8 EDI: 00001010 EBP: b7c657a0 ESP: bfefcc80
[   57.567597] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00210293
[   57.567597] Modules linked in: rfkill snd_pcm snd_timer crc32_pclmul snd soundcore intel_rapl_perf pcspkr hv_balloon(E) hv_netvsc i2c_piix4 hyperv_fb hv_utils sg joydev ip_tables ext4 mbcache jbd2 sr_mod cdrom sd_mod ata_generic ata_piix hyperv_keyboard hid_hyperv hv_storvsc scsi_transport_fc libata crc32c_intel serio_raw hv_vmbus
[   57.567597] CR2: 00000000d3fff000
[   57.567597] ---[ end trace 08a505f0f046453d ]---
[   57.567597] EIP: wp_page_copy+0x9c/0x780
[   57.567597] Code: 75 e8 85 f6 0f 84 9c 05 00 00 8b 45 e8 e8 1c 5d e7 ff 89 45 d4 8b 45 e4 e8 11 5d e7 ff 8b 55 d4 8d 78 04 8b 0a 83 e7 fc 89 d6 <89> 08 8b 8a fc 0f 00 00 89 88 fc 0f 00 00 89 c1 29 f9 89 55 d4 29
[   57.567597] EAX: d3fff000 EBX: c566df04 ECX: 00000000 EDX: c3868000
[   57.567597] ESI: c3868000 EDI: d3fff004 EBP: c566dec8 ESP: c32633bc
[   57.567597] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210282
[   57.567597] CR0: 80050033 CR2: d3fff000 CR3: 051bc000 CR4: 003406d0
[   57.567597] Kernel panic - not syncing: Fatal exception
[   57.567597] Kernel Offset: 0x1600000 from 0xc1000000 (relocation range: 0xc0000000-0xc87effff)
[   57.567597] ---[ end Kernel panic - not syncing: Fatal exception ]---
Comment 9 Taketo Kabe 2020-01-19 14:25:01 UTC
When you squelch calling register_memory() from drivers/base/memory.c:init_memory_block(), it won't panic.

drivers/base/memory.c:register_memory(struct memory_block mem) initializes
struct device mem->dev and passes it down to device_register(&memory->dev),

but my wild guess is memory->dev initialization is not enough.
I don't have idea what is missing.
Current register_memory():

/*
 * register_memory - Setup a sysfs device for a memory block
 */
static
int register_memory(struct memory_block *memory)
{
        int ret;

        memory->dev.bus = &memory_subsys;
        memory->dev.id = memory->start_section_nr / sections_per_block;
        memory->dev.release = memory_block_release;
        memory->dev.groups = memory_memblk_attr_groups;
        memory->dev.offline = memory->state == MEM_OFFLINE;

        ret = device_register(&memory->dev);
        if (ret)
                put_device(&memory->dev);

        return ret;
}
Comment 10 Taketo Kabe 2020-01-22 01:34:00 UTC
drivers/base/memory.c:hotplug_memory_register() looks like this:

/*
 * need an interface for the VM to add new memory regions,
 * but without onlining it.
 */
int hotplug_memory_register(int nid, struct mem_section *section)
{
        int ret = 0;
        struct memory_block *mem;

        mutex_lock(&mem_sysfs_mutex);

        mem = find_memory_block(section);
        if (mem) {
                mem->section_count++;
                put_device(&mem->dev);
        } else {
                ret = init_memory_block(&mem, section, MEM_OFFLINE);
                if (ret)
                        goto out;
                mem->section_count++;
        }

out:
        mutex_unlock(&mem_sysfs_mutex);
        return ret;
}

There's nothing suspicious here, but difference between nonproblematic
memory add during boot is init_memory_block(,,MEM_ONLINE);
if I changed it to init_memory_block(,,MEM_ONLINE); in 
hotplug_memory_register(), it doesn't panic.

Seems like hot-adding offline memory (and callback for onlining it?)
seems to have some problem.
Comment 11 Taketo Kabe 2020-01-22 06:57:34 UTC
Created attachment 286941 [details]
do not __online_page_free() the hot-added memory in hv_balloon

Patch which makes it not panic, and the hot-added memory properly recognized by the Hyper-V guest
Comment 12 Taketo Kabe 2020-01-22 07:08:12 UTC
The patch posted as Comment 11
https://bugzilla.kernel.org/show_bug.cgi?id=206181#c11
seems to make kernel not panic and recognize the hot-added memory
triggered by memory pressure, and added by hv_balloon module.

I guess you shouldn't __online_page_free(pg) the page you just hot-added 
and brought online.
__online_page_free() should be called when memory hot-remove was requested,
which current hv_balloon driver doesn't implement now.

Any thoughts?
I don't know why memory hot-add was working for others before the patch.

Note You need to log in before you can comment on or make changes to this bug.