Bug 101011
Summary: | Kernel Oops when disconnecting a mounted ext4 usb stick | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | konradsa |
Component: | SCSI | Assignee: | linux-scsi (linux-scsi) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | andrey_utkin, hch, k.kotlenga, konradsa, linux-ext4, mail, ronny.standtke, tao1hua, taz.007, tytso |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.1.1-040101-generic | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
uname -a
cat /proc/version dmesg lspci -vvnn |
Description
konradsa
2015-07-05 20:17:08 UTC
Created attachment 181941 [details]
uname -a
Created attachment 181951 [details]
cat /proc/version
Created attachment 181961 [details]
dmesg
Created attachment 181971 [details]
lspci -vvnn
Can also reproduce on a non-tainted kernel (4.1.2) on an old laptop: jui 21 10:19:15 Aspire kernel: usb 3-3.3.4.1.1: USB disconnect, device number 15 jui 21 10:19:15 Aspire kernel: BUG: unable to handle kernel paging request at 34943000 jui 21 10:19:16 Aspire kernel: IP: [<c127cd7b>] __percpu_counter_add+0x1b/0xd0 jui 21 10:19:16 Aspire kernel: *pde = 00000000 jui 21 10:19:17 Aspire kernel: Oops: 0000 [#1] PREEMPT SMP jui 21 10:19:17 Aspire kernel: Modules linked in: joydev psmouse snd_hda_codec_hdmi pcspkr serio_raw iTCO_wdt iTCO_vendor_support i2c_i801 evdev mousedev mac_hid i915 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel ipw2200 snd_hda_controller snd_hda_codec drm_kms_helper 8139too libipw snd_hda_core drm 8139cp lib80211 snd_hwdep cfg80211 snd_pcm pcmcia i2c_algo_bit i2c_core mii rfkill snd_timer lpc_ich intel_agp yenta_socket intel_gtt agpgart pcmcia_rsrc pcmcia_core snd rng_core soundcore thermal battery shpchp video ac button acpi_cpufreq processor sch_fq_codel ip_tables x_tables ext4 crc16 mbcache jbd2 hid_generic usbhid hid uas usb_storage sr_mod cdrom sd_mod ata_generic pata_acpi atkbd libps2 ata_piix libata scsi_mod ehci_pci uhci_hcd ehci_hcd usbcore usb_common i8042 serio jui 21 10:19:17 Aspire kernel: CPU: 0 PID: 616 Comm: umount Not tainted 4.1.2-2-ARCH #1 jui 21 10:19:17 Aspire kernel: Hardware name: Acer, inc. Aspire 1640Z /Lugano3 , BIOS 3A24 10/30/06 jui 21 10:19:17 Aspire kernel: task: f4183fc0 ti: f1a7c000 task.ti: f1a7c000 jui 21 10:19:17 Aspire kernel: EIP: 0060:[<c127cd7b>] EFLAGS: 00010082 CPU: 0 jui 21 10:19:17 Aspire kernel: EIP is at __percpu_counter_add+0x1b/0xd0 jui 21 10:19:17 Aspire kernel: EAX: c1cee508 EBX: c1cee508 ECX: 00000000 EDX: 00000001 jui 21 10:19:17 Aspire kernel: ESI: ffffffff EDI: 00000000 EBP: f1a7de70 ESP: f1a7de50 jui 21 10:19:17 Aspire kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 jui 21 10:19:17 Aspire kernel: CR0: 8005003b CR2: 34943000 CR3: 31aa0000 CR4: 000007d0 jui 21 10:19:17 Aspire kernel: Stack: jui 21 10:19:17 Aspire kernel: 00000285 00000285 c166f540 00000001 00000000 c1cee4e8 ffffffff f53a635c jui 21 10:19:17 Aspire kernel: f1a7de8c c1138904 00000010 f6879660 f6879660 f53a635c f53a636c f1a7dea8 jui 21 10:19:18 Aspire kernel: c11bd549 00000000 00000282 f6879660 f521bc78 f2f5c400 f1a7deb8 c11bd6a3 jui 21 10:19:18 Aspire kernel: Call Trace: jui 21 10:19:18 Aspire kernel: [<c1138904>] account_page_dirtied+0x74/0x120 jui 21 10:19:18 Aspire kernel: [<c11bd549>] __set_page_dirty+0x39/0xb0 jui 21 10:19:19 Aspire kernel: [<c11bd6a3>] mark_buffer_dirty+0x53/0xd0 jui 21 10:19:19 Aspire kernel: [<f828cf48>] ext4_commit_super+0x158/0x230 [ext4] jui 21 10:19:19 Aspire kernel: [<f8059785>] ? mb_cache_shrink+0x55/0x250 [mbcache] jui 21 10:19:19 Aspire kernel: [<f828dc87>] ext4_put_super+0xc7/0x320 [ext4] jui 21 10:19:19 Aspire kernel: [<c11a7c02>] ? dispose_list+0x32/0x40 jui 21 10:19:20 Aspire kernel: [<c11a88a2>] ? evict_inodes+0xf2/0x110 jui 21 10:19:20 Aspire kernel: [<c1190d34>] generic_shutdown_super+0x64/0xe0 jui 21 10:19:20 Aspire kernel: [<c113fb30>] ? unregister_shrinker+0x40/0x50 jui 21 10:19:20 Aspire kernel: [<c119104f>] kill_block_super+0x1f/0x70 jui 21 10:19:20 Aspire kernel: [<c119134d>] deactivate_locked_super+0x3d/0x70 jui 21 10:19:21 Aspire kernel: [<c1191737>] deactivate_super+0x57/0x60 jui 21 10:19:21 Aspire kernel: [<c11abb79>] cleanup_mnt+0x39/0x90 jui 21 10:19:21 Aspire kernel: [<c11abc10>] __cleanup_mnt+0x10/0x20 jui 21 10:19:21 Aspire kernel: [<c106ffe9>] task_work_run+0xc9/0xe0 jui 21 10:19:21 Aspire kernel: [<c1002625>] do_notify_resume+0x75/0x80 jui 21 10:19:22 Aspire kernel: [<c14b15fd>] work_notifysig+0x30/0x37 jui 21 10:19:22 Aspire kernel: Code: 39 c7 77 de eb da 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 57 56 53 89 c3 83 ec 14 89 55 ec 89 4d f0 64 ff 05 44 17 71 c1 8b 7b 14 <64> 8b 37 89 7d e0 89 f7 c1 ff 1f 01 d6 8b 55 08 11 cf 89 d1 c1 jui 21 10:19:22 Aspire kernel: EIP: [<c127cd7b>] __percpu_counter_add+0x1b/0xd0 SS:ESP 0068:f1a7de50 jui 21 10:19:22 Aspire kernel: CR2: 0000000034943000 jui 21 10:19:22 Aspire kernel: ---[ end trace 4bafb307c38dbc3e ]--- This bug is still present in 4.2-rc6. Reverting: commit 08439fec266c3cc5702953b4f54bdf5649357de0 Author: Christoph Hellwig <hch@lst.de> Date: Thu Apr 2 23:56:32 2015 -0400 ext4: remove block_device_ejected bdi->dev now never goes away, so this function became useless. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu> makes the oops go away, at least for me (I've tested this on v4.1.4 i386 only). I hope this rings some bells. I can also confirm that this bug is present in latest stable kernel (4.1.5) and reverting commit from comment 6 seems to fix it. On Fri, Aug 14, 2015 at 11:02:14AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=101011 > > I can also confirm that this bug is present in latest stable kernel (4.1.5) > and > reverting commit from comment 6 seems to fix it. Christoph, I've since gotten two reports from users that reverting your commit: "08439fec266c3: ext4: remove block_device_ejected" fixes a crash when a USB stick is yanked from their system. Looking at the reported stack dump, it looks like the crash is happening in account_page_dirtied() when it updates some bdi-specific statistics. I haven't been paying attention to the recent changes in how bdi gets torn down after the device gets removed, and in fact finding the recent changes wasn't obvioius enough after doing a brief search, but it seems to me that if reverting this patch is making any kind of differences, then the assertion in the commit description: bdi->dev now never goes away, so this function became useless. it implies that bdi->dev *does* become NULL, and checking for this is useful. In any case, I don't see any harm in reverting this commit; what do you think? Thanks, - Ted Hi Ted, sorry for the delay - I saw the mail Jan Cc'ed me on yesterday. After my changes it should not go away and I had tested the original eject test that it indeed didn't. Either I forgot a case, or the major writeback Tejun did a little later regressed it. As I won't have time to look into it ASAP I'd suggest to revert my patch for now. In the long run I really don't want to have these checks spread over file system so I plan to look into it once I get a few spare hours. On Sat, Aug 15, 2015 at 10:19:02AM +0200, Christoph Hellwig wrote:
>
> sorry for the delay - I saw the mail Jan Cc'ed me on yesterday. After
> my changes it should not go away and I had tested the original eject
> test that it indeed didn't. Either I forgot a case, or the major
> writeback Tejun did a little later regressed it.
>
> As I won't have time to look into it ASAP I'd suggest to revert my
> patch for now. In the long run I really don't want to have these
> checks spread over file system so I plan to look into it once I
> get a few spare hours.
Thanks, I'll revert the patch.
I suspect we should add an ioctl to simulate a USB device unplug using
the loopback block device, so we can add a test to xfstests.
- Ted
This bug is already hitting distributions, e.g. here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1478623 I hope this helps to get the right people into contact and sorry if I just added to the noise... The revert is upstream as commit bdfe0cbd746aa9 and will be showing up stable backport kernels very shortly (the deadline for commenting on the stable backports is tonight, so the next stable backport kernels should be released within a day or so with the revert). It's up to the distribution to update their kernels, and they don't have to wait for the stable kernel backports to be released. So please feel free to request at each of the various distro-specific bug trackers that they include the revert found in upstream commit bdfe0cbd746aa9. Especially if you are receiving paid support from the distribution, they are much more likely to listen to you. :-) Resolving since this is fixed upstream.... > The revert is upstream as commit bdfe0cbd746aa9 Thanks for the info! I will apply the revert to 4.2.1. > Especially if you are receiving paid support from the distribution, they are > much more likely to listen to you. :-) It's the other way around. I am maintaining a distribution and get paid for supporting it. :-) |