I am running Linux Mint Mate 17.2 with an update Ubuntu mainline kernel downloaded from here: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.1.1-unstable/ When I disconnect a mounted ext4 USB stick without properly unmounting it first, the kernel crashes and I need to reboot. Here the relevant logs, I can reproduce this every time I try: Jul 4 22:26:52 Lenny kernel: [ 807.592356] sdb: sdb1 Jul 4 22:26:52 Lenny kernel: [ 807.595350] sd 3:0:0:0: [sdb] Attached SCSI removable disk Jul 4 22:26:53 Lenny kernel: [ 807.826241] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null) Jul 4 22:27:02 Lenny kernel: [ 817.461405] usb 2-2: USB disconnect, device number 2 Jul 4 22:27:02 Lenny kernel: [ 817.490648] Buffer I/O error on dev sdb1, logical block 15237120, lost sync page write Jul 4 22:27:02 Lenny kernel: [ 817.490655] JBD2: Error -5 detected when updating journal superblock for sdb1-8. Jul 4 22:27:02 Lenny kernel: [ 817.490873] BUG: unable to handle kernel paging request at 34beb000 Jul 4 22:27:02 Lenny kernel: [ 817.490929] IP: [<c136ce88>] __percpu_counter_add+0x18/0xc0 Jul 4 22:27:02 Lenny kernel: [ 817.490977] *pdpt = 0000000023db9001 *pde = 0000000000000000 Jul 4 22:27:02 Lenny kernel: [ 817.491024] Oops: 0000 [#1] SMP Jul 4 22:27:02 Lenny kernel: [ 817.491056] Modules linked in: uas usb_storage ctr ccm msr snd_hda_codec_analog snd_hda_codec_generic dm_multipath scsi_dh pcmcia coretemp kvm_intel kvm snd_seq_midi snd_seq_midi_event snd_rawmidi arc4 snd_seq yenta_socket iwl3945 serio_raw thinkpad_acpi iwlegacy mac80211 snd_hda_intel nvram snd_hda_controller snd_seq_device snd_hda_codec snd_hda_core snd_hwdep pcmcia_rsrc lpc_ich btusb pcmcia_core cfg80211 snd_pcm btbcm btintel rfcomm snd_timer shpchp bnep bluetooth snd soundcore 8250_fintek parport_pc ppdev tp_smapi(OE) thinkpad_ec(OE) mac_hid lp parport dm_mirror dm_region_hash dm_log i915 e1000e i2c_algo_bit sdhci_pci psmouse drm_kms_helper ahci ptp libahci sdhci drm pps_core video Jul 4 22:27:02 Lenny kernel: [ 817.491694] CPU: 0 PID: 4083 Comm: umount Tainted: G U OE 4.1.1-040101-generic #201507011435 Jul 4 22:27:02 Lenny kernel: [ 817.491761] Hardware name: LENOVO 7675CTO/7675CTO, BIOS 7NETC2WW (2.22 ) 03/22/2011 Jul 4 22:27:02 Lenny kernel: [ 817.491814] task: ebf06b50 ti: ebebc000 task.ti: ebebc000 Jul 4 22:27:02 Lenny kernel: [ 817.491853] EIP: 0060:[<c136ce88>] EFLAGS: 00010082 CPU: 0 Jul 4 22:27:02 Lenny kernel: [ 817.491894] EIP is at __percpu_counter_add+0x18/0xc0 Jul 4 22:27:02 Lenny kernel: [ 817.491931] EAX: f21c8e88 EBX: f21c8e88 ECX: 00000000 EDX: 00000001 Jul 4 22:27:02 Lenny kernel: [ 817.491975] ESI: 00000001 EDI: 00000000 EBP: ebebde60 ESP: ebebde40 Jul 4 22:27:02 Lenny kernel: [ 817.492018] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Jul 4 22:27:02 Lenny kernel: [ 817.492057] CR0: 8005003b CR2: 34beb000 CR3: 33354200 CR4: 000007f0 Jul 4 22:27:02 Lenny kernel: [ 817.492100] Stack: Jul 4 22:27:02 Lenny kernel: [ 817.492117] c1abe100 edcb0098 edcb00ec ffffffff f21c8e68 ffffffff f21c8e68 f286d160 Jul 4 22:27:02 Lenny kernel: [ 817.492198] ebebde84 c1160454 00000010 00000282 f72a77f8 00000984 f72a77f8 f286d160 Jul 4 22:27:02 Lenny kernel: [ 817.492277] f286d170 ebebdea0 c11e613f 00000000 00000282 f72a77f8 edd7f4d0 00000000 Jul 4 22:27:02 Lenny kernel: [ 817.492355] Call Trace: Jul 4 22:27:02 Lenny kernel: [ 817.492379] [<c1160454>] account_page_dirtied+0x74/0x110 Jul 4 22:27:02 Lenny kernel: [ 817.492420] [<c11e613f>] __set_page_dirty+0x3f/0xb0 Jul 4 22:27:02 Lenny kernel: [ 817.492459] [<c11e6203>] mark_buffer_dirty+0x53/0xc0 Jul 4 22:27:02 Lenny kernel: [ 817.492497] [<c124a0cb>] ext4_commit_super+0x17b/0x250 Jul 4 22:27:02 Lenny kernel: [ 817.492535] [<c124ac71>] ext4_put_super+0xc1/0x320 Jul 4 22:27:02 Lenny kernel: [ 817.492572] [<c11f04ba>] ? fsnotify_unmount_inodes+0x1aa/0x1c0 Jul 4 22:27:02 Lenny kernel: [ 817.492615] [<c11cfeda>] ? evict_inodes+0xca/0xe0 Jul 4 22:27:02 Lenny kernel: [ 817.492653] [<c11b925a>] generic_shutdown_super+0x6a/0xe0 Jul 4 22:27:02 Lenny kernel: [ 817.492695] [<c10a1df0>] ? prepare_to_wait_event+0xd0/0xd0 Jul 4 22:27:02 Lenny kernel: [ 817.492736] [<c1165a50>] ? unregister_shrinker+0x40/0x50 Jul 4 22:27:02 Lenny kernel: [ 817.492775] [<c11b92f6>] kill_block_super+0x26/0x70 Jul 4 22:27:02 Lenny kernel: [ 817.492815] [<c11b94f5>] deactivate_locked_super+0x45/0x80 Jul 4 22:27:02 Lenny kernel: [ 817.492854] [<c11ba007>] deactivate_super+0x47/0x60 Jul 4 22:27:02 Lenny kernel: [ 817.492892] [<c11d2b39>] cleanup_mnt+0x39/0x80 Jul 4 22:27:02 Lenny kernel: [ 817.492925] [<c11d2bc0>] __cleanup_mnt+0x10/0x20 Jul 4 22:27:02 Lenny kernel: [ 817.492960] [<c1080b51>] task_work_run+0x91/0xd0 Jul 4 22:27:02 Lenny kernel: [ 817.492996] [<c1011e3c>] do_notify_resume+0x7c/0x90 Jul 4 22:27:02 Lenny kernel: [ 817.493035] [<c1720da5>] work_notify ul 4 22:27:02 Lenny kernel: [ 817.493071] Code: 8b 55 e8 e9 f4 fe ff ff 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 83 ec 20 89 5d f4 89 c3 89 75 f8 89 d6 89 7d fc 89 cf 8b 48 14 <64> 8b 01 89 45 ec 89 c2 8b 45 08 c1 fa 1f 01 75 ec 89 55 f0 89 Jul 4 22:27:02 Lenny kernel: [ 817.493401] EIP: [<c136ce88>] __percpu_counter_add+0x18/0xc0 SS:ESP 0068:ebebde40 Jul 4 22:27:02 Lenny kernel: [ 817.493462] CR2: 0000000034beb000 Jul 4 22:27:02 Lenny kernel: [ 817.494516] ---[ end trace dd564a7bea834ecd ]---
Created attachment 181941 [details] uname -a
Created attachment 181951 [details] cat /proc/version
Created attachment 181961 [details] dmesg
Created attachment 181971 [details] lspci -vvnn
Can also reproduce on a non-tainted kernel (4.1.2) on an old laptop: jui 21 10:19:15 Aspire kernel: usb 3-3.3.4.1.1: USB disconnect, device number 15 jui 21 10:19:15 Aspire kernel: BUG: unable to handle kernel paging request at 34943000 jui 21 10:19:16 Aspire kernel: IP: [<c127cd7b>] __percpu_counter_add+0x1b/0xd0 jui 21 10:19:16 Aspire kernel: *pde = 00000000 jui 21 10:19:17 Aspire kernel: Oops: 0000 [#1] PREEMPT SMP jui 21 10:19:17 Aspire kernel: Modules linked in: joydev psmouse snd_hda_codec_hdmi pcspkr serio_raw iTCO_wdt iTCO_vendor_support i2c_i801 evdev mousedev mac_hid i915 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel ipw2200 snd_hda_controller snd_hda_codec drm_kms_helper 8139too libipw snd_hda_core drm 8139cp lib80211 snd_hwdep cfg80211 snd_pcm pcmcia i2c_algo_bit i2c_core mii rfkill snd_timer lpc_ich intel_agp yenta_socket intel_gtt agpgart pcmcia_rsrc pcmcia_core snd rng_core soundcore thermal battery shpchp video ac button acpi_cpufreq processor sch_fq_codel ip_tables x_tables ext4 crc16 mbcache jbd2 hid_generic usbhid hid uas usb_storage sr_mod cdrom sd_mod ata_generic pata_acpi atkbd libps2 ata_piix libata scsi_mod ehci_pci uhci_hcd ehci_hcd usbcore usb_common i8042 serio jui 21 10:19:17 Aspire kernel: CPU: 0 PID: 616 Comm: umount Not tainted 4.1.2-2-ARCH #1 jui 21 10:19:17 Aspire kernel: Hardware name: Acer, inc. Aspire 1640Z /Lugano3 , BIOS 3A24 10/30/06 jui 21 10:19:17 Aspire kernel: task: f4183fc0 ti: f1a7c000 task.ti: f1a7c000 jui 21 10:19:17 Aspire kernel: EIP: 0060:[<c127cd7b>] EFLAGS: 00010082 CPU: 0 jui 21 10:19:17 Aspire kernel: EIP is at __percpu_counter_add+0x1b/0xd0 jui 21 10:19:17 Aspire kernel: EAX: c1cee508 EBX: c1cee508 ECX: 00000000 EDX: 00000001 jui 21 10:19:17 Aspire kernel: ESI: ffffffff EDI: 00000000 EBP: f1a7de70 ESP: f1a7de50 jui 21 10:19:17 Aspire kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 jui 21 10:19:17 Aspire kernel: CR0: 8005003b CR2: 34943000 CR3: 31aa0000 CR4: 000007d0 jui 21 10:19:17 Aspire kernel: Stack: jui 21 10:19:17 Aspire kernel: 00000285 00000285 c166f540 00000001 00000000 c1cee4e8 ffffffff f53a635c jui 21 10:19:17 Aspire kernel: f1a7de8c c1138904 00000010 f6879660 f6879660 f53a635c f53a636c f1a7dea8 jui 21 10:19:18 Aspire kernel: c11bd549 00000000 00000282 f6879660 f521bc78 f2f5c400 f1a7deb8 c11bd6a3 jui 21 10:19:18 Aspire kernel: Call Trace: jui 21 10:19:18 Aspire kernel: [<c1138904>] account_page_dirtied+0x74/0x120 jui 21 10:19:18 Aspire kernel: [<c11bd549>] __set_page_dirty+0x39/0xb0 jui 21 10:19:19 Aspire kernel: [<c11bd6a3>] mark_buffer_dirty+0x53/0xd0 jui 21 10:19:19 Aspire kernel: [<f828cf48>] ext4_commit_super+0x158/0x230 [ext4] jui 21 10:19:19 Aspire kernel: [<f8059785>] ? mb_cache_shrink+0x55/0x250 [mbcache] jui 21 10:19:19 Aspire kernel: [<f828dc87>] ext4_put_super+0xc7/0x320 [ext4] jui 21 10:19:19 Aspire kernel: [<c11a7c02>] ? dispose_list+0x32/0x40 jui 21 10:19:20 Aspire kernel: [<c11a88a2>] ? evict_inodes+0xf2/0x110 jui 21 10:19:20 Aspire kernel: [<c1190d34>] generic_shutdown_super+0x64/0xe0 jui 21 10:19:20 Aspire kernel: [<c113fb30>] ? unregister_shrinker+0x40/0x50 jui 21 10:19:20 Aspire kernel: [<c119104f>] kill_block_super+0x1f/0x70 jui 21 10:19:20 Aspire kernel: [<c119134d>] deactivate_locked_super+0x3d/0x70 jui 21 10:19:21 Aspire kernel: [<c1191737>] deactivate_super+0x57/0x60 jui 21 10:19:21 Aspire kernel: [<c11abb79>] cleanup_mnt+0x39/0x90 jui 21 10:19:21 Aspire kernel: [<c11abc10>] __cleanup_mnt+0x10/0x20 jui 21 10:19:21 Aspire kernel: [<c106ffe9>] task_work_run+0xc9/0xe0 jui 21 10:19:21 Aspire kernel: [<c1002625>] do_notify_resume+0x75/0x80 jui 21 10:19:22 Aspire kernel: [<c14b15fd>] work_notifysig+0x30/0x37 jui 21 10:19:22 Aspire kernel: Code: 39 c7 77 de eb da 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 57 56 53 89 c3 83 ec 14 89 55 ec 89 4d f0 64 ff 05 44 17 71 c1 8b 7b 14 <64> 8b 37 89 7d e0 89 f7 c1 ff 1f 01 d6 8b 55 08 11 cf 89 d1 c1 jui 21 10:19:22 Aspire kernel: EIP: [<c127cd7b>] __percpu_counter_add+0x1b/0xd0 SS:ESP 0068:f1a7de50 jui 21 10:19:22 Aspire kernel: CR2: 0000000034943000 jui 21 10:19:22 Aspire kernel: ---[ end trace 4bafb307c38dbc3e ]---
This bug is still present in 4.2-rc6. Reverting: commit 08439fec266c3cc5702953b4f54bdf5649357de0 Author: Christoph Hellwig <hch@lst.de> Date: Thu Apr 2 23:56:32 2015 -0400 ext4: remove block_device_ejected bdi->dev now never goes away, so this function became useless. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu> makes the oops go away, at least for me (I've tested this on v4.1.4 i386 only). I hope this rings some bells.
I can also confirm that this bug is present in latest stable kernel (4.1.5) and reverting commit from comment 6 seems to fix it.
On Fri, Aug 14, 2015 at 11:02:14AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=101011 > > I can also confirm that this bug is present in latest stable kernel (4.1.5) > and > reverting commit from comment 6 seems to fix it. Christoph, I've since gotten two reports from users that reverting your commit: "08439fec266c3: ext4: remove block_device_ejected" fixes a crash when a USB stick is yanked from their system. Looking at the reported stack dump, it looks like the crash is happening in account_page_dirtied() when it updates some bdi-specific statistics. I haven't been paying attention to the recent changes in how bdi gets torn down after the device gets removed, and in fact finding the recent changes wasn't obvioius enough after doing a brief search, but it seems to me that if reverting this patch is making any kind of differences, then the assertion in the commit description: bdi->dev now never goes away, so this function became useless. it implies that bdi->dev *does* become NULL, and checking for this is useful. In any case, I don't see any harm in reverting this commit; what do you think? Thanks, - Ted
Hi Ted, sorry for the delay - I saw the mail Jan Cc'ed me on yesterday. After my changes it should not go away and I had tested the original eject test that it indeed didn't. Either I forgot a case, or the major writeback Tejun did a little later regressed it. As I won't have time to look into it ASAP I'd suggest to revert my patch for now. In the long run I really don't want to have these checks spread over file system so I plan to look into it once I get a few spare hours.
On Sat, Aug 15, 2015 at 10:19:02AM +0200, Christoph Hellwig wrote: > > sorry for the delay - I saw the mail Jan Cc'ed me on yesterday. After > my changes it should not go away and I had tested the original eject > test that it indeed didn't. Either I forgot a case, or the major > writeback Tejun did a little later regressed it. > > As I won't have time to look into it ASAP I'd suggest to revert my > patch for now. In the long run I really don't want to have these > checks spread over file system so I plan to look into it once I > get a few spare hours. Thanks, I'll revert the patch. I suspect we should add an ioctl to simulate a USB device unplug using the loopback block device, so we can add a test to xfstests. - Ted
This bug is already hitting distributions, e.g. here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1478623 I hope this helps to get the right people into contact and sorry if I just added to the noise...
The revert is upstream as commit bdfe0cbd746aa9 and will be showing up stable backport kernels very shortly (the deadline for commenting on the stable backports is tonight, so the next stable backport kernels should be released within a day or so with the revert). It's up to the distribution to update their kernels, and they don't have to wait for the stable kernel backports to be released. So please feel free to request at each of the various distro-specific bug trackers that they include the revert found in upstream commit bdfe0cbd746aa9. Especially if you are receiving paid support from the distribution, they are much more likely to listen to you. :-)
Resolving since this is fixed upstream....
> The revert is upstream as commit bdfe0cbd746aa9 Thanks for the info! I will apply the revert to 4.2.1. > Especially if you are receiving paid support from the distribution, they are > much more likely to listen to you. :-) It's the other way around. I am maintaining a distribution and get paid for supporting it. :-)