Bug 58011 - Rare oops when loading modules.
Summary: Rare oops when loading modules.
Status: CLOSED CODE_FIX
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-11 11:16 UTC by Chris Rankin
Modified: 2013-11-13 20:37 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.8.12
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Output from serial console showing oops. (16.37 KB, text/plain)
2013-05-11 11:16 UTC, Chris Rankin
Details
dmesg output from successful boot (44.51 KB, text/plain)
2013-05-11 21:13 UTC, Chris Rankin
Details

Description Chris Rankin 2013-05-11 11:16:49 UTC
Created attachment 101171 [details]
Output from serial console showing oops.

Booted 3.8.12 and it BUG-ged at a kobject_put:

BUG: unable to handle kernel paging request at f86dd0d8
IP: [<c1125a8d>] kobject_put+0xa/0x50
*pde = 36aff067 *pte = 00000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: usbhid(+) videobuf2_core acpi_cpufreq(+) snd_rawmidi snd_hwdep snd_seq videodev mperf snd_seq_device snd_pcm snd_page_alloc snd_hrtimer firewire_ohci snd_timer firewire_core ppdev parport_pc snd i2c_i801 dcdbas parport soundcore crc_itu_t microcode psmouse floppy lpc_ich mfd_core pcspkr ehci_pci processor serio_raw nfsd auth_rpcgss binfmt_misc nfs_acl lockd sunrpc uinput ext3 mbcache jbd sr_mod cdrom sd_mod pata_acpi sata_sil radeon uhci_hcd ata_piix cfbfillrect libata ehci_hcd cfbimgblt cfbcopyarea i2c_algo_bit backlight scsi_mod drm_kms_helper usbcore intel_agp e1000 ttm usb_common drm intel_gtt agpgart button
Pid: 1726, comm: systemd-udevd Not tainted 3.8.12 #1 Dell Computer Corporation Precision WorkStation 650    /0F1262
EIP: 0060:[<c1125a8d>] EFLAGS: 00010286 CPU: 0
EIP is at kobject_put+0xa/0x50
...

This might have happened once with an earlier kernel, and it hasn't happened again with this kernel.

Rare race condition?
Comment 1 Chris Rankin 2013-05-11 21:13:55 UTC
Created attachment 101191 [details]
dmesg output from successful boot

Hardware is 2 x Intel Xeon (Northwood) CPUs, 32 bit, 2 GB RAM.
Comment 2 Joe Lawrence 2013-05-25 03:39:04 UTC
Hi Chris,

I saw a similar crash running 3.9 that emitted "mgag200: module is already loaded" (much like the "acpi_cpufreq: module is already loaded" from your attachment) before blowing up in load_module executing kobj code.  This would occur in a low percentage of boots.

I cherry picked commit 944a1fa01266aa9ace607f29551b73c41e9440e9 "module: don't unlink the module until we've removed all exposure" and haven't seen that crash since.

Hope this helps,

-- Joe
Comment 3 Joe Lawrence 2013-06-05 20:55:35 UTC
Commit a49b7e82cab0f9b41f483359be83f44fbb6b4979 "kobject: fix kset_find_obj() race with concurrent last kobject_put()" may also be good enough to dodge this crash.

-- Joe

Note You need to log in before you can comment on or make changes to this bug.