Bug 13549 - Kernel oops while online resizing of an ext4 filesystem
Summary: Kernel oops while online resizing of an ext4 filesystem
Status: CLOSED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Eric Sandeen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-06-16 09:13 UTC by Alessandro Polverini
Modified: 2012-06-08 12:01 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.29-2-amd64
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Alessandro Polverini 2009-06-16 09:13:52 UTC
I extended the device with lvm, then I tried to resize2f the filesystem and I got these errors:

Jun 16 09:49:00 nb1a kernel: [677830.909054] PGD 5bf79067 PUD 59768067 PMD 0
Jun 16 09:49:00 nb1a kernel: [677830.909144] CPU 1
Jun 16 09:49:00 nb1a kernel: [677830.909163] Modules linked in: ext4 jbd2 crc16 tun ip6table_filter ip6_tables iptable_raw xt_comment xt_recent xt_po
licy ipt_ULOG ipt_TTL ipt_ttl ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp
 nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tf
tp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf
_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_li
mit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_n
at nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle iptable_filter ip_tables x_tables nfnetlink drbd cn autofs4
Jun 16 09:49:00 nb1a kernel: nfsd nfs lockd nfs_acl auth_rpcgss sunrpc ipv6 quota_v2 quota_tree xfs exportfs loop firewire_sbp2 firewire_core crc_itu
_t evdev snd_pcm snd_timer snd soundcore pcspkr snd_page_alloc psmouse serio_raw k8temp shpchp pci_hotplug i2c_nforce2 i2c_core button ext3 jbd mbcac
he dm_mirror dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ide_pci_generic sd_mod crc_t10dif amd74xx ide_core sata_nv floppy 3w_9xxx r816
9 mii ata_generic forcedeth libata scsi_mod ehci_hcd ohci_hcd thermal processor fan thermal_sys
Jun 16 09:49:00 nb1a kernel: [677830.909795] Pid: 4620, comm: resize2fs Not tainted 2.6.29-2-amd64 #1 H8DM8-2
Jun 16 09:49:00 nb1a kernel: [677830.909822] RIP: 0010:[<ffffffffa069a97e>]  [<ffffffffa069a97e>] ext4_group_add+0x135c/0x143e [ext4]
Jun 16 09:49:00 nb1a kernel: [677830.909875] RSP: 0000:ffff8800b6f29c48  EFLAGS: 00010202
Jun 16 09:49:00 nb1a kernel: [677830.909900] RAX: 000000000000b8c8 RBX: ffff880157a38000 RCX: 000000000000b8c8
Jun 16 09:49:00 nb1a kernel: [677830.909940] RDX: 0000000000007dfe RSI: ffff8800b6f29e48 RDI: ffff880157a380e8
Jun 16 09:49:00 nb1a kernel: [677830.909981] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
Jun 16 09:49:00 nb1a kernel: [677830.910022] R10: 0000000000001719 R11: 0000000000000002 R12: 0000000000000000
Jun 16 09:49:00 nb1a kernel: [677830.910062] R13: ffff8802118e1800 R14: ffff8800263e93f0 R15: 0000000000000200
Jun 16 09:49:00 nb1a kernel: [677830.910103] FS:  00007f2decdc6700(0000) GS:ffff88021ea99e40(0000) knlGS:0000000000000000
Jun 16 09:49:00 nb1a kernel: [677830.910146] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun 16 09:49:00 nb1a kernel: [677830.910171] CR2: 000000000000b8cc CR3: 0000000083b24000 CR4: 00000000000006e0
Jun 16 09:49:00 nb1a kernel: [677830.910212] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 16 09:49:00 nb1a kernel: [677830.910253] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun 16 09:49:00 nb1a kernel: [677830.910294] Process resize2fs (pid: 4620, threadinfo ffff8800b6f28000, task ffff8801fe4e8000)
Jun 16 09:49:00 nb1a kernel: [677830.910355]  ffffffff80479d3c 0000000000000202 000000000000d2c7 ffffffff802846db
Jun 16 09:49:00 nb1a kernel: [677830.911053]  RSP <ffff8800b6f29c48>
Jun 16 09:49:00 nb1a kernel: [677830.911364] ---[ end trace 46ff5ad2da854d81 ]---
Jun 16 09:49:00 nb1a kernel: [677830.911423] ------------[ cut here ]------------



Jun 16 09:49:00 nb1a kernel: [677830.911479] WARNING: at /build/buildd/linux-2.6-2.6.29/debian/build/source_amd64_none/kernel/exit.c:983 do_exit+0x41/0x7b1()
Jun 16 09:49:00 nb1a kernel: [677830.911560] Hardware name: H8DM8-2
Jun 16 09:49:00 nb1a kernel: [677830.911614] Modules linked in: ext4 jbd2 crc16 tun ip6table_filter ip6_tables iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_TTL ipt_ttl ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle iptable_filter ip_tables x_tables nfnetlink drbd cn autofs4
Jun 16 09:49:00 nb1a kernel: nfsd nfs lockd nfs_acl auth_rpcgss sunrpc ipv6 quota_v2 quota_tree xfs exportfs loop firewire_sbp2 firewire_core crc_itu_t evdev snd_pcm snd_timer snd soundcore pcspkr snd_page_alloc psmouse serio_raw k8temp shpchp pci_hotplug i2c_nforce2 i2c_core button ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ide_pci_generic sd_mod crc_t10dif amd74xx ide_core sata_nv floppy 3w_9xxx r8169 mii ata_generic forcedeth libata scsi_mod ehci_hcd ohci_hcd thermal processor fan thermal_sys
Jun 16 09:49:00 nb1a kernel: [677830.921994] Pid: 4620, comm: resize2fs Tainted: G      D    2.6.29-2-amd64 #1
Jun 16 09:49:00 nb1a kernel: [677830.922067] Call Trace:
Jun 16 09:49:00 nb1a kernel: [677830.922121]  [<ffffffff80243a29>] warn_slowpath+0xd8/0x112
Jun 16 09:49:00 nb1a kernel: [677830.922181]  [<ffffffff80227dbc>] default_spin_lock_flags+0x5/0x9
Jun 16 09:49:00 nb1a kernel: [677830.922240]  [<ffffffff80479d3c>] _spin_lock_irqsave+0x24/0x2c
Jun 16 09:49:00 nb1a kernel: [677830.922299]  [<ffffffff80479d3c>] _spin_lock_irqsave+0x24/0x2c
Jun 16 09:49:00 nb1a kernel: [677830.922358]  [<ffffffff8024412a>] release_console_sem+0x197/0x1cc
Jun 16 09:49:00 nb1a kernel: [677830.922419]  [<ffffffff80477eac>] printk+0x4e/0x5a
Jun 16 09:49:00 nb1a kernel: [677830.922476]  [<ffffffff80227dbc>] default_spin_lock_flags+0x5/0x9
Jun 16 09:49:00 nb1a kernel: [677830.922536]  [<ffffffff80246bb4>] do_exit+0x41/0x7b1
Jun 16 09:49:00 nb1a kernel: [677830.922594]  [<ffffffff802592cc>] up+0xe/0x36
Jun 16 09:49:00 nb1a kernel: [677830.922650]  [<ffffffff80479d3c>] _spin_lock_irqsave+0x24/0x2c
Jun 16 09:49:00 nb1a kernel: [677830.922709]  [<ffffffff8024412a>] release_console_sem+0x197/0x1cc
Jun 16 09:49:00 nb1a kernel: [677830.922769]  [<ffffffff80214d99>] oops_end+0xb6/0xbb
Jun 16 09:49:00 nb1a kernel: [677830.922827]  [<ffffffff8022e86a>] do_page_fault+0x8c4/0x966
Jun 16 09:49:00 nb1a kernel: [677830.922887]  [<ffffffff80292096>] __rmqueue_smallest+0xa6/0x126
Jun 16 09:49:00 nb1a kernel: [677830.922947]  [<ffffffff80292133>] __rmqueue+0x1d/0x1fd
Jun 16 09:49:00 nb1a kernel: [677830.923005]  [<ffffffff80211b8e>] common_interrupt+0xe/0x13
Jun 16 09:49:00 nb1a kernel: [677830.923066]  [<ffffffff80344070>] cfq_set_request+0x0/0x30e
Jun 16 09:49:00 nb1a kernel: [677830.923124]  [<ffffffff8029238f>] rmqueue_bulk+0x7c/0x8c
Jun 16 09:49:00 nb1a kernel: [677830.923184]  [<ffffffff8029e2a1>] zone_statistics+0x3c/0x5f
Jun 16 09:49:00 nb1a kernel: [677830.923243]  [<ffffffff80292096>] __rmqueue_smallest+0xa6/0x126
Jun 16 09:49:00 nb1a kernel: [677830.923303]  [<ffffffff8029e2a1>] zone_statistics+0x3c/0x5f
Jun 16 09:49:00 nb1a kernel: [677830.923362]  [<ffffffff802943cc>] __alloc_pages_internal+0xd2/0x422
Jun 16 09:49:00 nb1a kernel: [677830.923421]  [<ffffffff80479de2>] _spin_lock+0x5/0x7
Jun 16 09:49:00 nb1a kernel: [677830.923483]  [<ffffffffa06761cd>] __jbd2_journal_temp_unlink_buffer+0x35/0xf5 [jbd2]
Jun 16 09:49:00 nb1a kernel: [677830.923562]  [<ffffffffa067633b>] __jbd2_journal_file_buffer+0xae/0x144 [jbd2]
Jun 16 09:49:00 nb1a kernel: [677830.923640]  [<ffffffffa06773a5>] do_get_write_access+0x3cf/0x414 [jbd2]
Jun 16 09:49:00 nb1a kernel: [677830.923700]  [<ffffffff80479de2>] _spin_lock+0x5/0x7
Jun 16 09:49:00 nb1a kernel: [677830.923758]  [<ffffffff8047a0c5>] page_fault+0x25/0x30
Jun 16 09:49:00 nb1a kernel: [677830.923824]  [<ffffffffa069a97e>] ext4_group_add+0x135c/0x143e [ext4]
Jun 16 09:49:00 nb1a kernel: [677830.923891]  [<ffffffffa069a946>] ext4_group_add+0x1324/0x143e [ext4]
Jun 16 09:49:00 nb1a kernel: [677830.923951]  [<ffffffff80479d3c>] _spin_lock_irqsave+0x24/0x2c
Jun 16 09:49:00 nb1a kernel: [677830.924011]  [<ffffffff802846db>] delayacct_end+0x7d/0x88
Jun 16 09:49:00 nb1a kernel: [677830.924070]  [<ffffffff802db3d0>] sync_dirty_buffer+0x5f/0x98
Jun 16 09:49:00 nb1a kernel: [677830.924128]  [<ffffffff80479de2>] _spin_lock+0x5/0x7
Jun 16 09:49:00 nb1a kernel: [677830.924189]  [<ffffffffa067b5bb>] jbd2_journal_update_superblock+0x133/0x13f [jbd2]
Jun 16 09:49:00 nb1a kernel: [677830.924279]  [<ffffffff80479de2>] _spin_lock+0x5/0x7
Jun 16 09:49:00 nb1a kernel: [677830.924344]  [<ffffffffa06905df>] ext4_ioctl+0x4dd/0x5d8 [ext4]
Jun 16 09:49:00 nb1a kernel: [677830.924403]  [<ffffffff8023f8a3>] finish_task_switch+0x2a/0xc7
Jun 16 09:49:00 nb1a kernel: [677830.924463]  [<ffffffff802c869e>] vfs_ioctl+0x21/0x6c
Jun 16 09:49:00 nb1a kernel: [677830.924521]  [<ffffffff802c8b22>] do_vfs_ioctl+0x439/0x472
Jun 16 09:49:00 nb1a kernel: [677830.924580]  [<ffffffff802bdaaf>] vfs_write+0xcd/0x102
Jun 16 09:49:00 nb1a kernel: [677830.924638]  [<ffffffff802c8bac>] sys_ioctl+0x51/0x70
Jun 16 09:49:00 nb1a kernel: [677830.924696]  [<ffffffff802110aa>] system_call_fastpath+0x16/0x1b
Jun 16 09:49:00 nb1a kernel: [677830.924756] ---[ end trace 46ff5ad2da854d82 ]---
Jun 16 09:49:00 nb1a kernel: [677830.924189]  [<ffffffffa067b5bb>] jbd2_journal_update_superblock+0x133/0x13f [jbd2]
Jun 16 09:49:00 nb1a kernel: [677830.924279]  [<ffffffff80479de2>] _spin_lock+0x5/0x7
Jun 16 09:49:00 nb1a kernel: [677830.924344]  [<ffffffffa06905df>] ext4_ioctl+0x4dd/0x5d8 [ext4]
Jun 16 09:49:00 nb1a kernel: [677830.924403]  [<ffffffff8023f8a3>] finish_task_switch+0x2a/0xc7
Jun 16 09:49:00 nb1a kernel: [677830.924463]  [<ffffffff802c869e>] vfs_ioctl+0x21/0x6c
Jun 16 09:49:00 nb1a kernel: [677830.924521]  [<ffffffff802c8b22>] do_vfs_ioctl+0x439/0x472
Jun 16 09:49:00 nb1a kernel: [677830.924580]  [<ffffffff802bdaaf>] vfs_write+0xcd/0x102
Jun 16 09:49:00 nb1a kernel: [677830.924638]  [<ffffffff802c8bac>] sys_ioctl+0x51/0x70
Jun 16 09:49:00 nb1a kernel: [677830.924696]  [<ffffffff802110aa>] system_call_fastpath+0x16/0x1b
Jun 16 09:49:00 nb1a kernel: [677830.924756] ---[ end trace 46ff5ad2da854d82 ]---

P.S.: Please add "debian" on the tree dropdown, exact debian package is:
linux-image-2.6.29-2-amd64 version 2.6.29-5
Comment 1 Roland Kletzing 2009-09-18 15:25:00 UTC
this isn`t a drbd device (as that out of tree module seems to be loaded) - is it ?
Comment 2 Alessandro Polverini 2009-09-18 19:03:11 UTC
not it's not a drbd device but a lvm one
Comment 3 Eric Sandeen 2010-03-15 19:34:54 UTC
Alessandro, apologies for this bug sitting for so long.

Is this still something you can reproduce?

If so, an e2image -h of the original filesystem, and the size of the block device you're resizing to, might help us reproduce the problem.

Thanks,
-Eric
Comment 4 Alessandro Polverini 2010-03-20 21:53:31 UTC
I updated to debian kernel 2.6.32-3-amd64 and I can reproduce the problem.
When trying to resize the fs I got this on the console:

nb1a kernel: [ 2518.347172] Oops: 0002 [#1] SMP
Message from syslogd@nb1a at Sat Mar 20 22:44:02 2010 ...
nb1a kernel: [ 2518.347195] last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/i2c-1/1-002c/fan8_input
nb1a kernel: [ 2518.348350] Stack:
nb1a kernel: [ 2518.348699] Call Trace:
nb1a kernel: [ 2518.348699] Code: 48 8b 74 24 50 48 8b 7c 24 50 8b 8b 40 03 00 00 8b 06 8b 57 28 d3 e8 89 c0 48 6b c0 0c 48 89 c1 48 03 8b 48 03 00 00 48 8d 71 04 <f0> 01 51 04 49 8b 95 80 02 00 00 48 03 83 48 03 00 00 8b 52 18
nb1a kernel: [ 2518.348699] CR2: 00000000000153d0

While this is what I can see in /var/log/messages:

Mar 20 22:44:02 nb1a kernel: [ 2518.347146] PGD 41d326067 PUD 41d5c5067 PMD 0 
Mar 20 22:44:02 nb1a kernel: [ 2518.347239] CPU 0 
Mar 20 22:44:02 nb1a kernel: [ 2518.347259] Modules linked in: tun ip6table_filter ip6_tables iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle iptable_filter ip_tables x_tables nfnetlink drbd cn autofs4 nfsd nfs lockd fscache nfs_acl a
Mar 20 22:44:02 nb1a kernel: th_rpcgss sunrpc xfs exportfs ext4 jbd2 crc16 w83793 w83627hf hwmon_vid ipmi_msghandler psmouse amd64_edac_mod evdev serio_raw snd_pcm snd_timer edac_core edac_mce_amd k8temp shpchp snd soundcore snd_page_alloc pci_hotplug button processor pcspkr i2c_nforce2 i2c_core ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ide_pci_generic amd74xx ide_core sata_nv sd_mod crc_t10dif ata_generic floppy r8169 mii 3w_9xxx ehci_hcd ohci_hcd libata forcedeth scsi_mod usbcore nls_base thermal fan thermal_sys [last unloaded: scsi_wait_scan]
Mar 20 22:44:02 nb1a kernel: [ 2518.347914] Pid: 13590, comm: resize2fs Not tainted 2.6.32-3-amd64 #1 H8DM8-2
Mar 20 22:44:02 nb1a kernel: [ 2518.347942] RIP: 0010:[<ffffffffa0310863>]  [<ffffffffa0310863>] ext4_group_add+0x136c/0x1451 [ext4]
Mar 20 22:44:02 nb1a kernel: [ 2518.347992] RSP: 0018:ffff88021e7c1c38  EFLAGS: 00010206
Mar 20 22:44:02 nb1a kernel: [ 2518.348017] RAX: 00000000000153cc RBX: ffff88041d8e7800 RCX: 00000000000153cc
Mar 20 22:44:02 nb1a kernel: [ 2518.348045] RDX: 0000000000007dfe RSI: 00000000000153d0 RDI: ffff88021e7c1e48
Mar 20 22:44:02 nb1a kernel: [ 2518.348072] RBP: ffff8801e1ed1d20 R08: ffff88041ee03380 R09: 0000000000000000
Mar 20 22:44:02 nb1a kernel: [ 2518.348100] R10: 000000501d59c868 R11: 0000000000000200 R12: 0000000000000000
Mar 20 22:44:02 nb1a kernel: [ 2518.348128] R13: ffff88041d8e1c00 R14: ffff88021ed67600 R15: 0000000000000200
Mar 20 22:44:02 nb1a kernel: [ 2518.348156] FS:  00007f33572bc700(0000) GS:ffff880008c00000(0000) knlGS:0000000000000000
Mar 20 22:44:02 nb1a kernel: [ 2518.348198] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Mar 20 22:44:02 nb1a kernel: [ 2518.348224] CR2: 00000000000153d0 CR3: 000000041db72000 CR4: 00000000000006f0
Mar 20 22:44:02 nb1a kernel: [ 2518.348251] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 20 22:44:02 nb1a kernel: [ 2518.348279] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 20 22:44:02 nb1a kernel: [ 2518.348307] Process resize2fs (pid: 13590, threadinfo ffff88021e7c0000, task ffff8801ea533170)
Mar 20 22:44:02 nb1a kernel: [ 2518.348369]  00000000fcdf07e0 ffffffff8106beb9 ffff8802098927f8 ffff880209892808
Mar 20 22:44:02 nb1a kernel: [ 2518.348399] <0> ffff8802098927e8 0000000000000292 0000000000000292 ffffffff81098f06
Mar 20 22:44:02 nb1a kernel: [ 2518.348444] <0> 0000000000000000 000080000000fed6 ffff88021e7c1e48 0000000000001c51
Mar 20 22:44:02 nb1a kernel: [ 2518.348699]  [<ffffffff8106beb9>] ? ktime_get_ts+0x68/0xb2
Mar 20 22:44:02 nb1a kernel: [ 2518.348699]  [<ffffffff81098f06>] ? delayacct_end+0x74/0x7f
Mar 20 22:44:02 nb1a kernel: [ 2518.348699]  [<ffffffff81064a84>] ? wake_bit_function+0x0/0x23
Mar 20 22:44:02 nb1a kernel: [ 2518.348699]  [<ffffffff8110ca96>] ? sync_dirty_buffer+0x5b/0x93
Mar 20 22:44:02 nb1a kernel: [ 2518.348699]  [<ffffffffa02fb804>] ? ext4_ioctl+0x5ee/0x753 [ext4]
Mar 20 22:44:02 nb1a kernel: [ 2518.348699]  [<ffffffff810f8fa2>] ? vfs_ioctl+0x21/0x6c
Mar 20 22:44:02 nb1a kernel: [ 2518.348699]  [<ffffffff810f94f0>] ? do_vfs_ioctl+0x48d/0x4cb
Mar 20 22:44:02 nb1a kernel: [ 2518.348699]  [<ffffffff810edc16>] ? vfs_write+0xcd/0x102
Mar 20 22:44:02 nb1a kernel: [ 2518.348699]  [<ffffffff810f957f>] ? sys_ioctl+0x51/0x70
Mar 20 22:44:02 nb1a kernel: [ 2518.348699]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
Mar 20 22:44:02 nb1a kernel: [ 2518.348699]  RSP <ffff88021e7c1c38>
Mar 20 22:44:02 nb1a kernel: [ 2518.354183] ---[ end trace 9ee7dc16cc7e9d74 ]---

The size of the partition:
# df /mnt/disk2/
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg1-backup
                     935361592 878396176  56965416  94% /mnt/disk2

I'll now try to reboot the machine and collect to e2image -h data you requested.
Comment 5 Alessandro Polverini 2010-03-21 08:32:35 UTC
The -h option of e2image is not recognised on my tool, sorry, I tried e2image 1.41.10.
The bzip2 compressed output of e2image of my partition is around 200Mb long, please everybody who is interested in it contact me so I can prepare a download link for it.
Comment 6 Eric Sandeen 2010-03-22 16:51:25 UTC
I'm very sorry, my fingers translated my brain wrong ;)  Ideally we can start with dumpe2fs -h (not e2image -h, sorry) to quickly recreate a similar filesystem, and try expanding it.

Keep that e2image around though just in case (although we would need one created with -r to use it)

Thanks,

-Eric
Comment 7 Alessandro Polverini 2010-03-22 17:00:22 UTC
Hello Eric,
here is the output of dumpe2fs (the file system was mounted when running it).
Please note that now the FS has been expanded, but the operation has been done (like the other times) offline (unmounted). The kernel hangs when trying to do it online (mounted).

dumpe2fs 1.41.3 (12-Oct-2008)
Filesystem volume name:   <none>
Last mounted on:          /mnt/disk2
Filesystem UUID:          b0e46f62-5ac9-48f5-8582-f602652385d3
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              62259200
Block count:              249036800
Reserved block count:     0
Free blocks:              23664521
Free inodes:              55316695
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      964
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Filesystem created:       Sun Nov 16 15:50:37 2008
Last mount time:          Mon Mar 22 07:22:00 2010
Last write time:          Mon Mar 22 07:22:00 2010
Mount count:              2
Maximum mount count:      32
Last checked:             Sun Mar 21 09:28:05 2010
Check interval:           15552000 (6 months)
Next check after:         Fri Sep 17 10:28:05 2010
Lifetime writes:          7204 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      7aad3588-6b5c-4bcb-b60f-0227bef217b7
Journal backup:           inode blocks
Journal size:             128M

Please tell me if I can be of further assistance,
Alex
Comment 8 Eric Sandeen 2010-03-22 19:08:48 UTC
Hm, so now that dumpe2fs output is already expanded ... ideally we'd have had the pre-resize geometry to work with ... but I'll see what I can do.

-Eric
Comment 9 Alessandro Polverini 2010-03-22 20:16:20 UTC
The problem is reproducible at will, this is not the first time I resized the partition and everytime there was a kernel oops.
Comment 10 Eric Sandeen 2010-03-22 20:26:22 UTC
ah so you've been incrementally resizing it and it oopses each time?  hmmm
Comment 11 Alessandro Polverini 2010-03-22 20:41:25 UTC
Yes, a kernel oops each time.
And each time I made a new test I upgraded the kernel, so I verified the problem on all versions from 2.6.29 to 2.6.32, if I remember well.
Comment 12 Eric Sandeen 2010-03-22 21:51:35 UTC
I'm afraid I've not been able to reproduce it:

create a sparse loopback filesystem:

[root@inode test]# touch fsfile
[root@inode test]# truncate --size 1020054732800 fsfile 
[root@inode test]# ls -lh fsfile 
-rw-r--r--. 1 root root 950G Mar 22 16:28 fsfile
[root@inode test]# mkfs.ext4 fsfile 
mke2fs 1.41.10 (10-Feb-2009)

<at this point I checked that the filesystem size and features are similar to your fs>

[root@inode test]# ls -lh fsfile 
-rw-r--r--. 1 root root 950G Mar 22 16:28 fsfile

Grow the container by about 50G:

[root@inode test]# truncate --size 1099511627776 fsfile 
[root@inode test]# ls -lh fsfile 
-rw-r--r--. 1 root root 1.0T Mar 22 16:32 fsfile

Resize while mounted:

[root@inode test]# mount -o loop fsfile  mnt/
[root@inode test]# resize2fs /dev/loop0 
resize2fs 1.41.10 (10-Feb-2009)
Filesystem at /dev/loop0 is mounted on /mnt/test/mnt; on-line resizing required
old desc_blocks = 60, new_desc_blocks = 64
Performing an on-line resize of /dev/loop0 to 268435456 (4k) blocks.
The filesystem on /dev/loop0 is now 268435456 blocks long.

No oops ... so I'm not sure what is going on here.
Comment 13 Alessandro Polverini 2010-03-23 20:14:38 UTC
I don't know if this can help but my file system is very "complex", with 15+ millions of files/directories most of which are hard links (it's a backuppc volume).
Comment 14 Eric Sandeen 2010-03-23 20:50:32 UTC
Well, I hate to say it, but it may be time for a compressed e2image -r to try to replicate the problem.

-Eric
Comment 15 Alessandro Polverini 2010-03-23 21:19:30 UTC
Hello Eric and thanks, I've sent you the link to download it.
If someone else is interested please ask me in private.
Comment 16 Christoph Biedl 2010-03-27 17:48:48 UTC
The last days I ran into the same problems on several machines and was finally able to reproduce the BUG in 2.6.32.10, 2.6.33, 2.6.34-rc1.  The trick is to resize a file system that once was ext3.



Steps to reproduce:

# create a volume group vg_test. (lvm was mainly used for convenience)

# create a logical volume
lvcreate -n test -L 128m vg_test

DEV=/dev/vg_test/test
# create an ext3 filesystem
mke2fs -j $DEV

# convert to ext4
tune2fs -O extents,uninit_bg,dir_index,flex_bg,huge_file,dir_nlink,extra_isize $DEV
e2fsck -yDf -C0 $DEV

# mount
mkdir /tmp/text
mount -o noatime $DEV /tmp/test

# resize LV
lvresize -L +4m $DEV

# online resize ext4
resize2fs -p $DEV



Observed behaviour:


kernel: BUG: unable to handle kernel NULL pointer dereference at 00000184
kernel: IP: [<c10c9f66>] ext4_group_add+0xf8f/0x104d
kernel: *pde = 00000000
kernel: Oops: 0002 [#1]
kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host0/target0:0:1/0:0:1:0/block/sdb/sdb1/dev
kernel:
kernel: Pid: 1302, comm: resize2fs Not tainted 2.6.34-rc2 #7 /VirtualBox
kernel: EIP: 0060:[<c10c9f66>] EFLAGS: 00010202 CPU: 0
kernel: EIP is at ext4_group_add+0xf8f/0x104d
kernel: EAX: 00000180 EBX: cfba8200 ECX: 00007dfe EDX: 00000180
kernel: ESI: ce03def0 EDI: 00000000 EBP: 00100001 ESP: ce03de40
kernel:  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0069
kernel: Process resize2fs (pid: 1302, ti=ce03c000 task=cf8e4280 task.ti=ce03c000)
kernel: Stack:
kernel:  00100202 00000000 ffffffff cfba8400 cf4ca3ec 00000000 ce03def0 cfba8200
kernel: <0> 00000000 00100202 00000000 00000002 00000000 cfba8200 cdc4b400 00100001
kernel: <0> 00000000 00000000 00000020 00000080 cfba8304 00000000 cfba8304 00000020
kernel: Call Trace:
kernel:  [<c10c0c96>] ? ext4_ioctl+0x57a/0x674
kernel:  [<c114e1c9>] ? do_output_char+0x84/0x191
kernel:  [<c10c071c>] ? ext4_ioctl+0x0/0x674
kernel:  [<c10753e6>] ? vfs_ioctl+0x12/0x42
kernel:  [<c10758dc>] ? do_vfs_ioctl+0x438/0x47c
kernel:  [<c106c83d>] ? vfs_write+0xf7/0x131
kernel:  [<c107594d>] ? sys_ioctl+0x2d/0x44
kernel:  [<c1237f75>] ? syscall_call+0x7/0xb
kernel: Code: 00 59 8b 40 38 f6 40 61 02 74 38 8b 5c 24 34 8b 74 24 18 8b 8b b0 01 00 00 8b 06 8b 93 b4 01 00 00 d3 e8 8b 4e 24 6b c0 0c 01 c2 <01> 4a 04 8b 4c 24 0c 03 83 b
kernel: EIP: [<c10c9f66>] ext4_group_add+0xf8f/0x104d SS:ESP 0069:ce03de40
kernel: CR2: 0000000000000184
kernel: ---[ end trace a1e9f008f870cb3b ]---

The code is (let's see whether bugzilla preserves the formatting):

        if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG)) {
                ext4_group_t flex_group;
                flex_group = ext4_flex_group(sbi, input->group);
                atomic_add(input->free_blocks_count,
                           &sbi->s_flex_groups[flex_group].free_blocks);
c10c9f46:       8b 5c 24 34             mov    0x34(%esp),%ebx
c10c9f4a:       8b 74 24 18             mov    0x18(%esp),%esi
c10c9f4e:       8b 8b b0 01 00 00       mov    0x1b0(%ebx),%ecx
c10c9f54:       8b 06                   mov    (%esi),%eax
c10c9f56:       8b 93 b4 01 00 00       mov    0x1b4(%ebx),%edx
c10c9f5c:       d3 e8                   shr    %cl,%eax
 *
 * Atomically adds @i to @v.
 */
static inline void atomic_add(int i, atomic_t *v)
{
        asm volatile(LOCK_PREFIX "addl %1,%0"
c10c9f5e:       8b 4e 24                mov    0x24(%esi),%ecx
c10c9f61:       6b c0 0c                imul   $0xc,%eax,%eax
c10c9f64:       01 c2                   add    %eax,%edx
c10c9f66:       01 4a 04                add    %ecx,0x4(%edx)
c10c9f69:       8b 4c 24 0c             mov    0xc(%esp),%ecx
                atomic_add(EXT4_INODES_PER_GROUP(sb),
                           &sbi->s_flex_groups[flex_group].free_inodes);
c10c9f6d:       03 83 b4 01 00 00       add    0x1b4(%ebx),%eax
c10c9f73:       8b 91 5c 01 00 00       mov    0x15c(%ecx),%edx
c10c9f79:       8b 52 0c                mov    0xc(%edx),%edx
c10c9f7c:       01 10                   add    %edx,(%eax)
        }


e2fsprogs is from Debian lenny (1.41.3-1).

Let me know if you're interested in the kernel .config.
Comment 17 Eric Sandeen 2010-03-27 18:26:26 UTC
Great, thanks for the reproducer, I'll give that a shot.

Alessandro, I have your image but still haven't made time to test with it, hopefully soon (maybe today) - did you start your filesystem out as ext3?
Comment 18 Eric Sandeen 2010-03-27 19:16:09 UTC
Reproducer works perfectly, thanks.

So here's the issue; sbi->flex_groups[] doesn't get filled out in ext4_fill_flex_info() because:

        if (groups_per_flex < 2) {
                sbi->s_log_groups_per_flex = 0;
                return 1;
        }

but resize is unconditionally doing this in ext4_group_add as long as the FLEX_BG feature is set:

                atomic_add(input->free_blocks_count,
                           &sbi->s_flex_groups[flex_group].free_blocks);

so for a NULL s_flex groups it went boom.

Every other access to ->s_flex_groups checks s_log_groups_per_flex first, so this should be the proper fix:

Index: linux-2.6/fs/ext4/resize.c
===================================================================
--- linux-2.6.orig/fs/ext4/resize.c
+++ linux-2.6/fs/ext4/resize.c
@@ -930,7 +930,8 @@ int ext4_group_add(struct super_block *s
 	percpu_counter_add(&sbi->s_freeinodes_counter,
 			   EXT4_INODES_PER_GROUP(sb));
 
-	if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG)) {
+	if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG) &&
+	    sbi->s_log_groups_per_flex) {
 		ext4_group_t flex_group;
 		flex_group = ext4_flex_group(sbi, input->group);
 		atomic_add(input->free_blocks_count,

This fixes the reproducer, need to double check it on Alessandro's image.

-Eric
Comment 19 Eric Sandeen 2010-03-27 19:28:11 UTC
Alessandro, your image also has groups_per_flex == 1 so it is surely the same thing.

-Eric
Comment 20 Alessandro Polverini 2010-03-28 09:17:03 UTC
(In reply to comment #17)
> Alessandro, I have your image but still haven't made time to test with it,
> hopefully soon (maybe today) - did you start your filesystem out as ext3?

Yes!
I'm sorry I forgot this detail when opening the bug report, I really forgot since I converted it a lot of time ago.

I hope your fix will be accepted in time for 2.6.34 :)
Comment 21 Christoph Biedl 2010-03-28 12:45:31 UTC
Confirmed: The patch fixes the problem in both kernels tested, 2.6.27.45 and 2.6.32.10.

How's the further procedure? Will there be a notification if this gets included in the main kernel, or will I have to check the commits for myself? I'd like to notify Greg to include this patch in the long-term stable kernels ASAP then.

Eric: I am impressed by your speedy response. Much appreciated.
Comment 22 Eric Sandeen 2010-03-29 15:55:00 UTC
(In reply to comment #21)
> Confirmed: The patch fixes the problem in both kernels tested, 2.6.27.45 and
> 2.6.32.10.
> 
> How's the further procedure? Will there be a notification if this gets
> included
> in the main kernel, or will I have to check the commits for myself? I'd like
> to
> notify Greg to include this patch in the long-term stable kernels ASAP then.

I think this should make it to .34 soon, and I can try to ping on this bug when that's done.  I agree that it should make it to -stable.

> Eric: I am impressed by your speedy response. Much appreciated.

Heh, thanks, but talk to poor Alessandro, who filed this bug last June.  :)  This clearly demonstrates the power of the compact testcase ;)

(Although, once I got Alessandro's image it would have led me to the solution as well, but it was just so big and unwieldy that I hadn't yet found time & space to work with it)
Comment 23 Christoph Biedl 2010-06-01 22:59:01 UTC
That's 42007efd569f1cf3bfb9a61da60ef6c2179508ca in git, I'm just preparing the mail for Greg. Time to close this bug IMHO.

Note You need to log in before you can comment on or make changes to this bug.