Bug 101011 - Kernel Oops when disconnecting a mounted ext4 usb stick
Summary: Kernel Oops when disconnecting a mounted ext4 usb stick
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: SCSI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: linux-scsi@vger.kernel.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-05 20:17 UTC by konradsa
Modified: 2017-05-18 12:08 UTC (History)
10 users (show)

See Also:
Kernel Version: 4.1.1-040101-generic
Subsystem:
Regression: No
Bisected commit-id:


Attachments
uname -a (104 bytes, text/x-log)
2015-07-05 20:19 UTC, konradsa
Details
cat /proc/version (150 bytes, text/x-log)
2015-07-05 20:19 UTC, konradsa
Details
dmesg (72.12 KB, text/x-log)
2015-07-05 20:20 UTC, konradsa
Details
lspci -vvnn (20.68 KB, text/x-log)
2015-07-05 20:20 UTC, konradsa
Details

Description konradsa 2015-07-05 20:17:08 UTC
I am running Linux Mint Mate 17.2 with an update Ubuntu mainline kernel downloaded from here: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.1.1-unstable/

When I disconnect a mounted ext4 USB stick without properly unmounting it first, the kernel crashes and I need to reboot. Here the relevant logs, I can reproduce this every time I try:

Jul  4 22:26:52 Lenny kernel: [  807.592356]  sdb: sdb1
Jul  4 22:26:52 Lenny kernel: [  807.595350] sd 3:0:0:0: [sdb] Attached SCSI removable disk
Jul  4 22:26:53 Lenny kernel: [  807.826241] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)
Jul  4 22:27:02 Lenny kernel: [  817.461405] usb 2-2: USB disconnect, device number 2
Jul  4 22:27:02 Lenny kernel: [  817.490648] Buffer I/O error on dev sdb1, logical block 15237120, lost sync page write
Jul  4 22:27:02 Lenny kernel: [  817.490655] JBD2: Error -5 detected when updating journal superblock for sdb1-8.
Jul  4 22:27:02 Lenny kernel: [  817.490873] BUG: unable to handle kernel paging request at 34beb000
Jul  4 22:27:02 Lenny kernel: [  817.490929] IP: [<c136ce88>] __percpu_counter_add+0x18/0xc0
Jul  4 22:27:02 Lenny kernel: [  817.490977] *pdpt = 0000000023db9001 *pde = 0000000000000000 
Jul  4 22:27:02 Lenny kernel: [  817.491024] Oops: 0000 [#1] SMP 
Jul  4 22:27:02 Lenny kernel: [  817.491056] Modules linked in: uas usb_storage ctr ccm msr snd_hda_codec_analog snd_hda_codec_generic dm_multipath scsi_dh pcmcia coretemp kvm_intel kvm snd_seq_midi snd_seq_midi_event snd_rawmidi arc4 snd_seq yenta_socket iwl3945 serio_raw thinkpad_acpi iwlegacy mac80211 snd_hda_intel nvram snd_hda_controller snd_seq_device snd_hda_codec snd_hda_core snd_hwdep pcmcia_rsrc lpc_ich btusb pcmcia_core cfg80211 snd_pcm btbcm btintel rfcomm snd_timer shpchp bnep bluetooth snd soundcore 8250_fintek parport_pc ppdev tp_smapi(OE) thinkpad_ec(OE) mac_hid lp parport dm_mirror dm_region_hash dm_log i915 e1000e i2c_algo_bit sdhci_pci psmouse drm_kms_helper ahci ptp libahci sdhci drm pps_core video
Jul  4 22:27:02 Lenny kernel: [  817.491694] CPU: 0 PID: 4083 Comm: umount Tainted: G     U     OE   4.1.1-040101-generic #201507011435
Jul  4 22:27:02 Lenny kernel: [  817.491761] Hardware name: LENOVO 7675CTO/7675CTO, BIOS 7NETC2WW (2.22 ) 03/22/2011
Jul  4 22:27:02 Lenny kernel: [  817.491814] task: ebf06b50 ti: ebebc000 task.ti: ebebc000
Jul  4 22:27:02 Lenny kernel: [  817.491853] EIP: 0060:[<c136ce88>] EFLAGS: 00010082 CPU: 0
Jul  4 22:27:02 Lenny kernel: [  817.491894] EIP is at __percpu_counter_add+0x18/0xc0
Jul  4 22:27:02 Lenny kernel: [  817.491931] EAX: f21c8e88 EBX: f21c8e88 ECX: 00000000 EDX: 00000001
Jul  4 22:27:02 Lenny kernel: [  817.491975] ESI: 00000001 EDI: 00000000 EBP: ebebde60 ESP: ebebde40
Jul  4 22:27:02 Lenny kernel: [  817.492018]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Jul  4 22:27:02 Lenny kernel: [  817.492057] CR0: 8005003b CR2: 34beb000 CR3: 33354200 CR4: 000007f0
Jul  4 22:27:02 Lenny kernel: [  817.492100] Stack:
Jul  4 22:27:02 Lenny kernel: [  817.492117]  c1abe100 edcb0098 edcb00ec ffffffff f21c8e68 ffffffff f21c8e68 f286d160
Jul  4 22:27:02 Lenny kernel: [  817.492198]  ebebde84 c1160454 00000010 00000282 f72a77f8 00000984 f72a77f8 f286d160
Jul  4 22:27:02 Lenny kernel: [  817.492277]  f286d170 ebebdea0 c11e613f 00000000 00000282 f72a77f8 edd7f4d0 00000000
Jul  4 22:27:02 Lenny kernel: [  817.492355] Call Trace:
Jul  4 22:27:02 Lenny kernel: [  817.492379]  [<c1160454>] account_page_dirtied+0x74/0x110
Jul  4 22:27:02 Lenny kernel: [  817.492420]  [<c11e613f>] __set_page_dirty+0x3f/0xb0
Jul  4 22:27:02 Lenny kernel: [  817.492459]  [<c11e6203>] mark_buffer_dirty+0x53/0xc0
Jul  4 22:27:02 Lenny kernel: [  817.492497]  [<c124a0cb>] ext4_commit_super+0x17b/0x250
Jul  4 22:27:02 Lenny kernel: [  817.492535]  [<c124ac71>] ext4_put_super+0xc1/0x320
Jul  4 22:27:02 Lenny kernel: [  817.492572]  [<c11f04ba>] ? fsnotify_unmount_inodes+0x1aa/0x1c0
Jul  4 22:27:02 Lenny kernel: [  817.492615]  [<c11cfeda>] ? evict_inodes+0xca/0xe0
Jul  4 22:27:02 Lenny kernel: [  817.492653]  [<c11b925a>] generic_shutdown_super+0x6a/0xe0
Jul  4 22:27:02 Lenny kernel: [  817.492695]  [<c10a1df0>] ? prepare_to_wait_event+0xd0/0xd0
Jul  4 22:27:02 Lenny kernel: [  817.492736]  [<c1165a50>] ? unregister_shrinker+0x40/0x50
Jul  4 22:27:02 Lenny kernel: [  817.492775]  [<c11b92f6>] kill_block_super+0x26/0x70
Jul  4 22:27:02 Lenny kernel: [  817.492815]  [<c11b94f5>] deactivate_locked_super+0x45/0x80
Jul  4 22:27:02 Lenny kernel: [  817.492854]  [<c11ba007>] deactivate_super+0x47/0x60
Jul  4 22:27:02 Lenny kernel: [  817.492892]  [<c11d2b39>] cleanup_mnt+0x39/0x80
Jul  4 22:27:02 Lenny kernel: [  817.492925]  [<c11d2bc0>] __cleanup_mnt+0x10/0x20
Jul  4 22:27:02 Lenny kernel: [  817.492960]  [<c1080b51>] task_work_run+0x91/0xd0
Jul  4 22:27:02 Lenny kernel: [  817.492996]  [<c1011e3c>] do_notify_resume+0x7c/0x90
Jul  4 22:27:02 Lenny kernel: [  817.493035]  [<c1720da5>] work_notify
ul  4 22:27:02 Lenny kernel: [  817.493071] Code: 8b 55 e8 e9 f4 fe ff ff 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 83 ec 20 89 5d f4 89 c3 89 75 f8 89 d6 89 7d fc 89 cf 8b 48 14 <64> 8b 01 89 45 ec 89 c2 8b 45 08 c1 fa 1f 01 75 ec 89 55 f0 89
Jul  4 22:27:02 Lenny kernel: [  817.493401] EIP: [<c136ce88>] __percpu_counter_add+0x18/0xc0 SS:ESP 0068:ebebde40
Jul  4 22:27:02 Lenny kernel: [  817.493462] CR2: 0000000034beb000
Jul  4 22:27:02 Lenny kernel: [  817.494516] ---[ end trace dd564a7bea834ecd ]---
Comment 1 konradsa 2015-07-05 20:19:24 UTC
Created attachment 181941 [details]
uname -a
Comment 2 konradsa 2015-07-05 20:19:51 UTC
Created attachment 181951 [details]
cat /proc/version
Comment 3 konradsa 2015-07-05 20:20:08 UTC
Created attachment 181961 [details]
dmesg
Comment 4 konradsa 2015-07-05 20:20:28 UTC
Created attachment 181971 [details]
lspci -vvnn
Comment 5 taz.007 2015-07-22 17:39:41 UTC
Can also reproduce on a non-tainted kernel (4.1.2) on an old laptop:

jui 21 10:19:15 Aspire kernel: usb 3-3.3.4.1.1: USB disconnect, device number 15
jui 21 10:19:15 Aspire kernel: BUG: unable to handle kernel paging request at 34943000
jui 21 10:19:16 Aspire kernel: IP: [<c127cd7b>] __percpu_counter_add+0x1b/0xd0
jui 21 10:19:16 Aspire kernel: *pde = 00000000 
jui 21 10:19:17 Aspire kernel: Oops: 0000 [#1] PREEMPT SMP 
jui 21 10:19:17 Aspire kernel: Modules linked in: joydev psmouse snd_hda_codec_hdmi pcspkr serio_raw iTCO_wdt iTCO_vendor_support i2c_i801 evdev mousedev mac_hid i915 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel ipw2200 snd_hda_controller snd_hda_codec drm_kms_helper 8139too libipw snd_hda_core drm 8139cp lib80211 snd_hwdep cfg80211 snd_pcm pcmcia i2c_algo_bit i2c_core mii rfkill snd_timer lpc_ich intel_agp yenta_socket intel_gtt agpgart pcmcia_rsrc pcmcia_core snd rng_core soundcore thermal battery shpchp video ac button acpi_cpufreq processor sch_fq_codel ip_tables x_tables ext4 crc16 mbcache jbd2 hid_generic usbhid hid uas usb_storage sr_mod cdrom sd_mod ata_generic pata_acpi atkbd libps2 ata_piix libata scsi_mod ehci_pci uhci_hcd ehci_hcd usbcore usb_common i8042 serio
jui 21 10:19:17 Aspire kernel: CPU: 0 PID: 616 Comm: umount Not tainted 4.1.2-2-ARCH #1
jui 21 10:19:17 Aspire kernel: Hardware name: Acer, inc. Aspire 1640Z    /Lugano3         , BIOS 3A24 10/30/06
jui 21 10:19:17 Aspire kernel: task: f4183fc0 ti: f1a7c000 task.ti: f1a7c000
jui 21 10:19:17 Aspire kernel: EIP: 0060:[<c127cd7b>] EFLAGS: 00010082 CPU: 0
jui 21 10:19:17 Aspire kernel: EIP is at __percpu_counter_add+0x1b/0xd0
jui 21 10:19:17 Aspire kernel: EAX: c1cee508 EBX: c1cee508 ECX: 00000000 EDX: 00000001
jui 21 10:19:17 Aspire kernel: ESI: ffffffff EDI: 00000000 EBP: f1a7de70 ESP: f1a7de50
jui 21 10:19:17 Aspire kernel:  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
jui 21 10:19:17 Aspire kernel: CR0: 8005003b CR2: 34943000 CR3: 31aa0000 CR4: 000007d0
jui 21 10:19:17 Aspire kernel: Stack:
jui 21 10:19:17 Aspire kernel:  00000285 00000285 c166f540 00000001 00000000 c1cee4e8 ffffffff f53a635c
jui 21 10:19:17 Aspire kernel:  f1a7de8c c1138904 00000010 f6879660 f6879660 f53a635c f53a636c f1a7dea8
jui 21 10:19:18 Aspire kernel:  c11bd549 00000000 00000282 f6879660 f521bc78 f2f5c400 f1a7deb8 c11bd6a3
jui 21 10:19:18 Aspire kernel: Call Trace:
jui 21 10:19:18 Aspire kernel:  [<c1138904>] account_page_dirtied+0x74/0x120
jui 21 10:19:18 Aspire kernel:  [<c11bd549>] __set_page_dirty+0x39/0xb0
jui 21 10:19:19 Aspire kernel:  [<c11bd6a3>] mark_buffer_dirty+0x53/0xd0
jui 21 10:19:19 Aspire kernel:  [<f828cf48>] ext4_commit_super+0x158/0x230 [ext4]
jui 21 10:19:19 Aspire kernel:  [<f8059785>] ? mb_cache_shrink+0x55/0x250 [mbcache]
jui 21 10:19:19 Aspire kernel:  [<f828dc87>] ext4_put_super+0xc7/0x320 [ext4]
jui 21 10:19:19 Aspire kernel:  [<c11a7c02>] ? dispose_list+0x32/0x40
jui 21 10:19:20 Aspire kernel:  [<c11a88a2>] ? evict_inodes+0xf2/0x110
jui 21 10:19:20 Aspire kernel:  [<c1190d34>] generic_shutdown_super+0x64/0xe0
jui 21 10:19:20 Aspire kernel:  [<c113fb30>] ? unregister_shrinker+0x40/0x50
jui 21 10:19:20 Aspire kernel:  [<c119104f>] kill_block_super+0x1f/0x70
jui 21 10:19:20 Aspire kernel:  [<c119134d>] deactivate_locked_super+0x3d/0x70
jui 21 10:19:21 Aspire kernel:  [<c1191737>] deactivate_super+0x57/0x60
jui 21 10:19:21 Aspire kernel:  [<c11abb79>] cleanup_mnt+0x39/0x90
jui 21 10:19:21 Aspire kernel:  [<c11abc10>] __cleanup_mnt+0x10/0x20
jui 21 10:19:21 Aspire kernel:  [<c106ffe9>] task_work_run+0xc9/0xe0
jui 21 10:19:21 Aspire kernel:  [<c1002625>] do_notify_resume+0x75/0x80
jui 21 10:19:22 Aspire kernel:  [<c14b15fd>] work_notifysig+0x30/0x37
jui 21 10:19:22 Aspire kernel: Code: 39 c7 77 de eb da 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 57 56 53 89 c3 83 ec 14 89 55 ec 89 4d f0 64 ff 05 44 17 71 c1 8b 7b 14 <64> 8b 37 89 7d e0 89 f7 c1 ff 1f 01 d6 8b 55 08 11 cf 89 d1 c1
jui 21 10:19:22 Aspire kernel: EIP: [<c127cd7b>] __percpu_counter_add+0x1b/0xd0 SS:ESP 0068:f1a7de50
jui 21 10:19:22 Aspire kernel: CR2: 0000000034943000
jui 21 10:19:22 Aspire kernel: ---[ end trace 4bafb307c38dbc3e ]---
Comment 6 Krzysztof Kotlenga 2015-08-10 18:02:02 UTC
This bug is still present in 4.2-rc6. Reverting:

commit 08439fec266c3cc5702953b4f54bdf5649357de0
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Apr 2 23:56:32 2015 -0400

    ext4: remove block_device_ejected
    
    bdi->dev now never goes away, so this function became useless.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

makes the oops go away, at least for me (I've tested this on v4.1.4 i386 only). I hope this rings some bells.
Comment 7 Maciej S. Szmigiero 2015-08-14 11:02:14 UTC
I can also confirm that this bug is present in latest stable kernel (4.1.5) and reverting commit from comment 6 seems to fix it.
Comment 8 Theodore Tso 2015-08-14 18:39:43 UTC
On Fri, Aug 14, 2015 at 11:02:14AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=101011
> 
> I can also confirm that this bug is present in latest stable kernel (4.1.5)
> and
> reverting commit from comment 6 seems to fix it.

Christoph,

I've since gotten two reports from users that reverting your commit:
"08439fec266c3: ext4: remove block_device_ejected" fixes a crash when
a USB stick is yanked from their system.  Looking at the reported
stack dump, it looks like the crash is happening in
account_page_dirtied() when it updates some bdi-specific statistics.

I haven't been paying attention to the recent changes in how bdi gets
torn down after the device gets removed, and in fact finding the
recent changes wasn't obvioius enough after doing a brief search, but
it seems to me that if reverting this patch is making any kind of
differences, then the assertion in the commit description:

    bdi->dev now never goes away, so this function became useless.

it implies that bdi->dev *does* become NULL, and checking for this is
useful.  In any case, I don't see any harm in reverting this commit;
what do you think?

Thanks,

					- Ted
Comment 9 Christoph Hellwig 2015-08-15 08:19:05 UTC
Hi Ted,


sorry for the delay - I saw the mail Jan Cc'ed me on yesterday.  After
my changes it should not go away and I had tested the original eject
test that it indeed didn't.  Either I forgot a case, or the major
writeback Tejun did a little later regressed it.

As I won't have time to look into it ASAP I'd suggest to revert my
patch for now.  In the long run I really don't want to have these
checks spread over file system so I plan to look into it once I
get a few spare hours.
Comment 10 Theodore Tso 2015-08-16 13:38:08 UTC
On Sat, Aug 15, 2015 at 10:19:02AM +0200, Christoph Hellwig wrote:
> 
> sorry for the delay - I saw the mail Jan Cc'ed me on yesterday.  After
> my changes it should not go away and I had tested the original eject
> test that it indeed didn't.  Either I forgot a case, or the major
> writeback Tejun did a little later regressed it.
> 
> As I won't have time to look into it ASAP I'd suggest to revert my
> patch for now.  In the long run I really don't want to have these
> checks spread over file system so I plan to look into it once I
> get a few spare hours.

Thanks, I'll revert the patch.

I suspect we should add an ioctl to simulate a USB device unplug using
the loopback block device, so we can add a test to xfstests.

    	     	   	      	     	 - Ted
Comment 11 Ronny Standtke 2015-09-28 18:49:28 UTC
This bug is already hitting distributions, e.g. here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1478623

I hope this helps to get the right people into contact and sorry if I just added to the noise...
Comment 12 Theodore Tso 2015-09-28 21:46:44 UTC
The revert is upstream as commit bdfe0cbd746aa9 and will be showing up stable backport kernels very shortly (the deadline for commenting on the stable backports is tonight, so the next stable backport kernels should be released within a day or so with the revert).

It's up to the distribution to update their kernels, and they don't have to wait for the stable kernel backports to be released.  So please feel free to request at each of the various distro-specific bug trackers that they include the revert found in upstream commit bdfe0cbd746aa9.  Especially if you are receiving paid support from the distribution, they are much more likely to listen to you.  :-)
Comment 13 Theodore Tso 2015-09-28 21:47:11 UTC
Resolving since this is fixed upstream....
Comment 14 Ronny Standtke 2015-09-29 16:33:42 UTC
> The revert is upstream as commit bdfe0cbd746aa9

Thanks for the info! I will apply the revert to 4.2.1.

> Especially if you are receiving paid support from the distribution, they are 
> much more likely to listen to you.  :-)

It's the other way around. I am maintaining a distribution and get paid for supporting it. :-)

Note You need to log in before you can comment on or make changes to this bug.