Bug 15963

Summary: BUG: rtc_cmos (2.6.34-rc7)
Product: Timers Reporter: Maciej Rutecki (maciej.rutecki)
Component: Realtime ClockAssignee: timers_realtime-clock
Status: CLOSED CODE_FIX    
Severity: normal CC: error27, maciej.rutecki, randy.dunlap, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.34-rc7 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 15310    
Attachments: move setting driver data before rtc_device_register()
move setting driver data before rtc_device_register() - version 2.

Description Maciej Rutecki 2010-05-11 18:24:40 UTC
Subject    : BUG: rtc_cmos (2.6.34-rc7)
Submitter  : Randy Dunlap <randy.dunlap@oracle.com>
Date       : 2010-05-10 23:09
Message-ID : 4BE89243.8090809@oracle.com
References : http://marc.info/?l=linux-kernel&m=127353313728385&w=2

This entry is being used for tracking a regression from 2.6.33.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Randy Dunlap 2010-05-13 17:54:47 UTC
I have now hit this BUG about 6 times.  It does not always happen,
but I have been pushing it a lot.  Somehow the struct cmos_rtc *cmos
in cmos_update_irq_enable() is NULL (returned from dev_get_drvdata()).

I have the most success in causing the BUG to happen after I load
and unload several PCMCIA drivers, then load/unload rtc-cmos module
multiple times.
Comment 2 Randy Dunlap 2010-05-13 18:03:09 UTC
Could hwclock be doing something odd?  In all of my kernel logs with
this BUG, I see this:

Pid: 3787, comm: hwclock Not tainted 2.6.34-rc7 #2 0HH807/OptiPlex GX620
Comment 3 Dan Carpenter 2010-05-14 09:03:13 UTC
Created attachment 26376 [details]
move setting driver data before rtc_device_register()

Hi Randy,

I think this patch will take care of the problem.  Can you give it a whirl?
Comment 4 Randy Dunlap 2010-05-14 17:39:27 UTC
Hi Dan,
Thanks for the patch... but it consistently bugs during initialization:

[   18.063430] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[   18.064183] IP: [<ffffffffa0243c4d>] cmos_do_probe+0x1e6/0x638 [rtc_cmos]
[   18.064183] PGD 7c6e7067 PUD 7c6cb067 PMD 0 
[   18.064183] Oops: 0000 [#1] SMP 
[   18.064183] last sysfs file: /sys/class/scsi_generic/sg0/dev
[   18.064183] CPU 1 
[   18.064183] Modules linked in: rtc_cmos(+) psmouse serio_raw pcspkr sg rtc_core i2c_i801 rng_core rtc_lib parport button intel_agp thermal processor thermal_sys hwmon sr_mod cdrom ata_generic pata_acpi ata_piix libata ide_pci_generic ide_core sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ssb mmc_core pcmcia pcmcia_core ehci_hcd usbcore nls_base [last unloaded: scsi_wait_scan]
[   18.064183] 
[   18.064183] Pid: 2061, comm: modprobe Not tainted 2.6.34-rc7 #3 0HH807/OptiPlex GX620               
[   18.064183] RIP: 0010:[<ffffffffa0243c4d>]  [<ffffffffa0243c4d>] cmos_do_probe+0x1e6/0x638 [rtc_cmos]
[   18.064183] RSP: 0018:ffff880079c35d48  EFLAGS: 00010202
[   18.064183] RAX: 0000000000000000 RBX: ffff88007e845b18 RCX: ffffffffa0245a70
[   18.064183] RDX: ffffffffa0245490 RSI: ffff88007e845b18 RDI: ffffffffa0245480
[   18.064183] RBP: ffff880079c35d78 R08: ffffffff81830d38 R09: ffffffff810573ae
[   18.064183] R10: ffffffffa0245480 R11: ffffffff81830ca0 R12: ffffffffa0245ce0
[   18.064183] R13: ffff88007e383440 R14: 0000000000000008 R15: 0000000000000100
[   18.064183] FS:  00007f94d15036f0(0000) GS:ffff880005400000(0000) knlGS:0000000000000000
[   18.064183] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   18.064183] CR2: 0000000000000010 CR3: 0000000079c09000 CR4: 00000000000006e0
[   18.064183] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.064183] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   18.064183] Process modprobe (pid: 2061, threadinfo ffff880079c34000, task ffff88007c720000)
[   18.064183] Stack:
[   18.064183]  ffff880079c35d78 ffff88007e845b18 0000000000000008 ffffffffa0245780
[   18.064183] <0> ffffffffa0245740 0000000000000000 ffff880079c35d98 ffffffffa02450aa
[   18.064183] <0> ffff88007e845b18 ffffffffa0245500 ffff880079c35dd8 ffffffff8126d670
[   18.064183] Call Trace:
[   18.064183]  [<ffffffffa02450aa>] cmos_pnp_probe+0x108/0x114 [rtc_cmos]
[   18.064183]  [<ffffffff8126d670>] pnp_device_probe+0x115/0x159
[   18.064183]  [<ffffffff812ccd73>] ? driver_sysfs_add+0x5c/0x96
[   18.064183]  [<ffffffff812cd077>] driver_probe_device+0x1b7/0x334
[   18.064183]  [<ffffffff812cd27c>] __driver_attach+0x88/0xc0
[   18.064183]  [<ffffffff812cd1f4>] ? __driver_attach+0x0/0xc0
[   18.064183]  [<ffffffff812cbed2>] bus_for_each_dev+0x7e/0xd6
[   18.064183]  [<ffffffff812ccc8e>] driver_attach+0x20/0x29
[   18.064183]  [<ffffffff812cc690>] bus_add_driver+0x147/0x361
[   18.064183]  [<ffffffff812cd6e3>] driver_register+0xf3/0x1ad
[   18.064183]  [<ffffffffa01c7000>] ? cmos_init+0x0/0xad [rtc_cmos]
[   18.064183]  [<ffffffff8126d2db>] pnp_register_driver+0x23/0x2c
[   18.064183]  [<ffffffffa01c701b>] cmos_init+0x1b/0xad [rtc_cmos]
[   18.064183]  [<ffffffff8100023a>] do_one_initcall+0x75/0x1cb
[   18.064183]  [<ffffffff810920bb>] sys_init_module+0x134/0x326
[   18.064183]  [<ffffffff810034eb>] system_call_fastpath+0x16/0x1b
[   18.064183] Code: e8 c9 8c 08 e1 48 8b 05 72 20 00 00 48 ff 05 53 2b 00 00 48 c7 c1 70 5a 24 a0 48 c7 c2 90 54 24 a0 48 89 de 48 c7 c7 80 54 24 a0 <48> 8b 40 10 49 89 45 10 e8 bc e4 f1 ff 48 3d 00 f0 ff ff 48 89 
[   18.064183] RIP  [<ffffffffa0243c4d>] cmos_do_probe+0x1e6/0x638 [rtc_cmos]
[   18.064183]  RSP <ffff880079c35d48>
[   18.064183] CR2: 0000000000000010
[   18.395492] ---[ end trace c91b161e66911807 ]---
Comment 5 Dan Carpenter 2010-05-15 20:37:37 UTC
Created attachment 26390 [details]
move setting driver data before rtc_device_register() - version 2.

Aish... Randy, I'm embarrassed.  That was an avoidable bug.  It won't happen again.  Sorry for that.

I can't reproduce your original bug on my system with an unmodified kernel.  I put the system in loop where it did an rmmod and a modprobe 10000 times and but I didn't see original bug.  But I still think setting the driver data earlier _should_ fix it.  Could you test version 2?
Comment 6 Rafael J. Wysocki 2010-05-16 19:23:42 UTC
Handled-By : Dan Carpenter <error27@gmail.com>
Patch : https://bugzilla.kernel.org/attachment.cgi?id=26390
Comment 7 Randy Dunlap 2010-05-17 04:10:55 UTC
Hi Dan, Patch #2 seems to have fixed the bug that I was seeing.  Thanks.
Tested-by: Randy Dunlap <randy.dunlap@oracle.com>
Comment 8 Rafael J. Wysocki 2010-06-13 11:46:04 UTC
Fixed by commit 6ba8bcd457d9fc793ac9435aa2e4138f571d4ec5 .