hi all, encountered the following crash at compact_memory: divide error: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:06.0/0000:03:00.0/host2/port-2:4/end_device-2:4/target2:0:4/2:0:4:0/block/sdv/queue/scheduler CPU 7 ...... Pid: 17129, comm: bash Not tainted 2.6.32-358.el6.x86_64 #1 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M. RIP: 0010:[<ffffffff8113fe62>] [<ffffffff8113fe62>] fragmentation_index+0x72/0x90 RSP: 0018:ffff880b72437ca8 EFLAGS: 00010246 RAX: 00000000003a6150 RBX: 000000000000005f RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00000000ffffffff RDI: ffff8800001003c8 RBP: ffff880b72437ca8 R08: 000000000000000d R09: 0000000000000ef2 R10: 0000000000000000 R11: 0000000000000c00 R12: ffff880000100000 R13: 00000000ffffffff R14: 000000000000000e R15: ffff880000100000 FS: 00007fe93f52e700(0000) GS:ffff880c5a6c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000019b7760 CR3: 0000000b692f2000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process bash (pid: 17129, threadinfo ffff880b72436000, task ffff880912c00aa0) Stack: ffff880b72437cd8 ffffffff81168423 ffff880b72437cc8 ffff880b72437db8 <d> 0000000000000010 0000000000000ff0 ffff880b72437d98 ffffffff81168a75 <d> 0000000000000286 ffff88181d4c4a40 0000000000019838 0000000000000286 Call Trace: [<ffffffff81168423>] compaction_suitable+0x63/0xc0 [<ffffffff81168a75>] compact_zone+0x35/0x950 [<ffffffff811745b5>] ? free_percpu+0xb5/0x140 [<ffffffff81092b23>] ? schedule_on_each_cpu+0x133/0x160 [<ffffffff8116949c>] compact_node+0x10c/0x120 [<ffffffff8116953c>] sysctl_compaction_handler+0x5c/0x90 [<ffffffff811fa517>] proc_sys_call_handler+0x97/0xd0 [<ffffffff811fa564>] proc_sys_write+0x14/0x20 [<ffffffff81187368>] vfs_write+0xb8/0x1a0 [<ffffffff81187c61>] sys_write+0x51/0x90 [<ffffffff8100b052>] system_call_fastpath+0x16/0x1b Code: 30 c0 4d 85 c0 75 02 c9 c3 4d 85 d2 b8 18 fc ff ff 75 f4 49 69 c1 e8 03 00 00 89 f1 ba 01 00 00 00 48 d3 e2 45 89 c0 89 d1 31 d2 <48> f7 f1 31 d2 c9 48 05 e8 03 00 00 49 f7 f0 49 89 c0 b8 e8 03 RIP [<ffffffff8113fe62>] fragmentation_index+0x72/0x90
I tried to look at and probably may be the reason is order equals to -1. static int compact_node(int nid) { struct compact_control cc = { .nr_freepages = 0, .nr_migratepages = 0, .order = -1,-------------> order is -1. .sync = true, }; ...... compact_zone(zone, &cc); ...... so, 1UL << order will be 0. This patch may fix it: diff --git a/mm/vmstat.c b/mm/vmstat.c index 69f9aff..9794319 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -870,6 +870,9 @@ static int __fragmentation_index(unsigned int order, struct contig_page_info *in { unsigned long requested = 1UL << order; + if(!requested) + return 0; + if (!info->free_blocks_total) return 0;
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 01 Aug 2017 09:32:00 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=196555 > > Bug ID: 196555 > Summary: divide error at __fragmentation_index > Product: Memory Management > Version: 2.5 > Kernel Version: all > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: Other > Assignee: akpm@linux-foundation.org > Reporter: wen.yang99@zte.com.cn > Regression: No > > hi all, > encountered the following crash at compact_memory: > > divide error: 0000 [#1] SMP > last sysfs file: > > /sys/devices/pci0000:00/0000:00:06.0/0000:03:00.0/host2/port-2:4/end_device-2:4/target2:0:4/2:0:4:0/block/sdv/queue/scheduler > CPU 7 > ...... > > Pid: 17129, comm: bash Not tainted 2.6.32-358.el6.x86_64 #1 To be filled by > O.E.M. To be filled by O.E.M./To be filled by O.E.M. > RIP: 0010:[<ffffffff8113fe62>] [<ffffffff8113fe62>] > fragmentation_index+0x72/0x90 > RSP: 0018:ffff880b72437ca8 EFLAGS: 00010246 > RAX: 00000000003a6150 RBX: 000000000000005f RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 00000000ffffffff RDI: ffff8800001003c8 > RBP: ffff880b72437ca8 R08: 000000000000000d R09: 0000000000000ef2 > R10: 0000000000000000 R11: 0000000000000c00 R12: ffff880000100000 > R13: 00000000ffffffff R14: 000000000000000e R15: ffff880000100000 > FS: 00007fe93f52e700(0000) GS:ffff880c5a6c0000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000000019b7760 CR3: 0000000b692f2000 CR4: 00000000000007e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process bash (pid: 17129, threadinfo ffff880b72436000, task ffff880912c00aa0) > Stack: > ffff880b72437cd8 ffffffff81168423 ffff880b72437cc8 ffff880b72437db8 > <d> 0000000000000010 0000000000000ff0 ffff880b72437d98 ffffffff81168a75 > <d> 0000000000000286 ffff88181d4c4a40 0000000000019838 0000000000000286 > Call Trace: > [<ffffffff81168423>] compaction_suitable+0x63/0xc0 > [<ffffffff81168a75>] compact_zone+0x35/0x950 > [<ffffffff811745b5>] ? free_percpu+0xb5/0x140 > [<ffffffff81092b23>] ? schedule_on_each_cpu+0x133/0x160 > [<ffffffff8116949c>] compact_node+0x10c/0x120 > [<ffffffff8116953c>] sysctl_compaction_handler+0x5c/0x90 > [<ffffffff811fa517>] proc_sys_call_handler+0x97/0xd0 > [<ffffffff811fa564>] proc_sys_write+0x14/0x20 > [<ffffffff81187368>] vfs_write+0xb8/0x1a0 > [<ffffffff81187c61>] sys_write+0x51/0x90 > [<ffffffff8100b052>] system_call_fastpath+0x16/0x1b > Code: 30 c0 4d 85 c0 75 02 c9 c3 4d 85 d2 b8 18 fc ff ff 75 f4 49 69 c1 e8 03 > 00 00 89 f1 ba 01 00 00 00 48 d3 e2 45 89 c0 89 d1 31 d2 <48> f7 f1 31 d2 c9 > 48 > 05 e8 03 00 00 49 f7 f0 49 89 c0 b8 e8 03 > RIP [<ffffffff8113fe62>] fragmentation_index+0x72/0x90 and... > https://bugzilla.kernel.org/show_bug.cgi?id=196555 > > --- Comment #1 from yangwen (wen.yang99@zte.com.cn) --- > > I tried to look at and probably may be the reason is order equals to -1. > > static int compact_node(int nid) > { > > struct compact_control cc = { > .nr_freepages = 0, > .nr_migratepages = 0, > .order = -1,-------------> order is -1. > .sync = true, > }; > ...... > compact_zone(zone, &cc); > ...... > > so, 1UL << order will be 0. > > This patch may fix it: > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 69f9aff..9794319 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -870,6 +870,9 @@ static int __fragmentation_index(unsigned int order, > struct > contig_page_info *in > { > unsigned long requested = 1UL << order; > > + if(!requested) > + return 0; > + > if (!info->free_blocks_total) > return 0; > Guys, can you please take a look? Also... what the heck does "order = -1" mean? If it's supposed to be some magical wildcard dont-care sentinel then this is not a widely-known thing at all and very little code is set up to handle it. Seems fragile (as demonstrated here!) and hacky. Perhaps order==-1 is only supposed to mean something (but what?) in the context of compact_control.order, but even then I'm having trouble finding which code prevents the -1 value from leaking to places which aren't set up to handle it. And compact_control.order==-1 doesn't appear to be documented... Thanks.
On 08/02/2017 12:00 AM, Andrew Morton wrote: > > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Tue, 01 Aug 2017 09:32:00 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > >> https://bugzilla.kernel.org/show_bug.cgi?id=196555 >> >> Bug ID: 196555 >> Summary: divide error at __fragmentation_index >> Product: Memory Management >> Version: 2.5 >> Kernel Version: all >> Hardware: All >> OS: Linux >> Tree: Mainline >> Status: NEW >> Severity: high >> Priority: P1 >> Component: Other >> Assignee: akpm@linux-foundation.org >> Reporter: wen.yang99@zte.com.cn >> Regression: No >> >> hi all, >> encountered the following crash at compact_memory: >> >> divide error: 0000 [#1] SMP >> last sysfs file: >> >> /sys/devices/pci0000:00/0000:00:06.0/0000:03:00.0/host2/port-2:4/end_device-2:4/target2:0:4/2:0:4:0/block/sdv/queue/scheduler >> CPU 7 >> ...... >> >> Pid: 17129, comm: bash Not tainted 2.6.32-358.el6.x86_64 #1 To be filled by This looks like some ancient RHEL kernel? You will have to ask Red Hat for support, sorry. Current mainline doesn't have this issue AFAICS, and 2.6 stable series are discontinued for some time. >> O.E.M. To be filled by O.E.M./To be filled by O.E.M. >> RIP: 0010:[<ffffffff8113fe62>] [<ffffffff8113fe62>] >> fragmentation_index+0x72/0x90 >> RSP: 0018:ffff880b72437ca8 EFLAGS: 00010246 >> RAX: 00000000003a6150 RBX: 000000000000005f RCX: 0000000000000000 >> RDX: 0000000000000000 RSI: 00000000ffffffff RDI: ffff8800001003c8 >> RBP: ffff880b72437ca8 R08: 000000000000000d R09: 0000000000000ef2 >> R10: 0000000000000000 R11: 0000000000000c00 R12: ffff880000100000 >> R13: 00000000ffffffff R14: 000000000000000e R15: ffff880000100000 >> FS: 00007fe93f52e700(0000) GS:ffff880c5a6c0000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00000000019b7760 CR3: 0000000b692f2000 CR4: 00000000000007e0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process bash (pid: 17129, threadinfo ffff880b72436000, task >> ffff880912c00aa0) >> Stack: >> ffff880b72437cd8 ffffffff81168423 ffff880b72437cc8 ffff880b72437db8 >> <d> 0000000000000010 0000000000000ff0 ffff880b72437d98 ffffffff81168a75 >> <d> 0000000000000286 ffff88181d4c4a40 0000000000019838 0000000000000286 >> Call Trace: >> [<ffffffff81168423>] compaction_suitable+0x63/0xc0 >> [<ffffffff81168a75>] compact_zone+0x35/0x950 >> [<ffffffff811745b5>] ? free_percpu+0xb5/0x140 >> [<ffffffff81092b23>] ? schedule_on_each_cpu+0x133/0x160 >> [<ffffffff8116949c>] compact_node+0x10c/0x120 >> [<ffffffff8116953c>] sysctl_compaction_handler+0x5c/0x90 >> [<ffffffff811fa517>] proc_sys_call_handler+0x97/0xd0 >> [<ffffffff811fa564>] proc_sys_write+0x14/0x20 >> [<ffffffff81187368>] vfs_write+0xb8/0x1a0 >> [<ffffffff81187c61>] sys_write+0x51/0x90 >> [<ffffffff8100b052>] system_call_fastpath+0x16/0x1b >> Code: 30 c0 4d 85 c0 75 02 c9 c3 4d 85 d2 b8 18 fc ff ff 75 f4 49 69 c1 e8 >> 03 >> 00 00 89 f1 ba 01 00 00 00 48 d3 e2 45 89 c0 89 d1 31 d2 <48> f7 f1 31 d2 c9 >> 48 >> 05 e8 03 00 00 49 f7 f0 49 89 c0 b8 e8 03 >> RIP [<ffffffff8113fe62>] fragmentation_index+0x72/0x90 > > and... > >> https://bugzilla.kernel.org/show_bug.cgi?id=196555 >> >> --- Comment #1 from yangwen (wen.yang99@zte.com.cn) --- >> >> I tried to look at and probably may be the reason is order equals to -1. >> >> static int compact_node(int nid) >> { >> >> struct compact_control cc = { >> .nr_freepages = 0, >> .nr_migratepages = 0, >> .order = -1,-------------> order is -1. >> .sync = true, >> }; >> ...... >> compact_zone(zone, &cc); >> ...... >> >> so, 1UL << order will be 0. >> >> This patch may fix it: >> >> diff --git a/mm/vmstat.c b/mm/vmstat.c >> index 69f9aff..9794319 100644 >> --- a/mm/vmstat.c >> +++ b/mm/vmstat.c >> @@ -870,6 +870,9 @@ static int __fragmentation_index(unsigned int order, >> struct >> contig_page_info *in >> { >> unsigned long requested = 1UL << order; >> >> + if(!requested) >> + return 0; >> + >> if (!info->free_blocks_total) >> return 0; >> > > Guys, can you please take a look? > > Also... what the heck does "order = -1" mean? If it's supposed to be > some magical wildcard dont-care sentinel then this is not a > widely-known thing at all and very little code is set up to handle it. > Seems fragile (as demonstrated here!) and hacky. There's a is_via_compact_memory() helper for checking it. > Perhaps order==-1 is only supposed to mean something (but what?) in the > context of compact_control.order, but even then I'm having trouble > finding which code prevents the -1 value from leaking to places which > aren't set up to handle it. > > And compact_control.order==-1 doesn't appear to be documented... Yeah, a normal bool flag in compact_control for /sys triggered compaction, with order set to a safe value e.g. pageblock_order would be safer IMHO. I will cook something. > Thanks. >
Thanks very much !