Latest working kernel version: Earliest failing kernel version: Distribution: any Hardware Environment: any Software Environment: any Problem Description: Lots of device drivers register a 'struct device_driver' with the '.bus' member set to '&platform_bus_type'. This is wrong, since the platform_bus functions expect the 'struct device_driver' to be wrapped up in a 'struct platform_driver' which provides some additional callbacks (like suspend_late, resume_early). The effect may be that platform_suspend_late() uses bogus data outside the device_driver struct as a pointer pointer to the device driver's suspend_late() function or other hard to reproduce failures. The following drivers need to be converted to use a platform_driver struct and the platform_driver_register() function: ./arch/mips/basler/excite/excite_iodev ./drivers/char/ipmi/ipmi_msghandler ./drivers/char/ipmi/ipmi_si_intf ./drivers/char/tpm/tpm_atmel ./drivers/char/tpm/tpm_nsc ./drivers/char/tpm/tpm_tis ./drivers/hwmon/ibmaem ./drivers/mtd/maps/pxa2xx-flash.c ./drivers/ide/mips/au1xxx-ide ./drivers/ide/mips/swarm ./drivers/mtd/nand/excite_nandflash ./drivers/mtd/onenand/generic ./drivers/net/gianfar_mii ./drivers/net/mipsnet ./drivers/net/fs_enet/fs_enet-main ./drivers/net/fs_enet/fs_enet-main ./drivers/net/fs_enet/fs_enet-main ./drivers/net/fs_enet/mii-bitbang ./drivers/net/fs_enet/mii-fec ./drivers/pcmcia/au1000_generic ./drivers/pcmcia/hd64465_ss ./drivers/pcmcia/i82365 ./drivers/pcmcia/m32r_cfc ./drivers/pcmcia/m32r_pcc ./drivers/pcmcia/sa1100_generic ./drivers/pcmcia/tcic ./drivers/pcmcia/vrc4171_card ./drivers/scsi/a4000t ./drivers/scsi/bvme6000_scsi ./drivers/scsi/mvme16x_scsi ./drivers/serial/cpm_uart/cpm_uart_core ./drivers/serial/cpm_uart/cpm_uart_core ./drivers/video/au1100fb ./drivers/video/au1200fb ./drivers/video/hgafb ./drivers/video/skeletonfb ./drivers/watchdog/rm9k_wdt Steps to reproduce: grep -r -B 5 '\.bus.*=.*platform_bus_type' . | grep device_driver gives a (possibly incomplete) list of drivers that are potentially affected by this bug.
Created attachment 18129 [details] The attached patch fixes the problem for the PXA2xx Flash driver
Yeah, looks like the platform core does: to_platform_driver(drv) (container_of((drv), struct platform_driver, driver)) which means it expects the driver core driver to be embedded in a platform driver. David, these drivers need a fix, right?
Yes, these drivers need fixing. The problem came with whatever patch changed the platform bus code to rely on such casts for drivers. I noticed it when I added basic PM methods to some drivers -- suspend/resume would oops, long before I added the early/late versions to match that PM iteration -- and fixed drivers on all the platforms I worked with. That was already way too much work to fix any bug I didn't create, so I suppose I ignored the other drivers. :) The issue is that *originally* this idiom was standard and used throughout the "legacy" bus (later renamed "platform"). Then it became incorrect, I think when suspend/resume handling changed, but that code was still in the tree ... and the idiom was in peoples' minds.
This is present on 2.6.26, 2.6.27, 2.6.28, and current mainline by a quick peek at the code. And it is causing suspend/resume issues, as expected (that's how I found this bug report). Is anything being done about it?
It would seem nothing is being done about it. This might be a good project for kernel janitors...
Please correct me if I am wrong, but this bug DOES mean we are basically doing: if (random memory contents) (random memory contents)(foo); at suspend and resume, isn't it? Because if it is that bad, it is time to send a patch marking all those drivers BROKEN, until someone fixes them. And get that patch into -stable for all kernels. A look at drivers/char/tpm shows me the "fix" might need a complete clean-up.
I wouldn't say "random"; but certainly it's not defined to be usable in that way.
Ok, some of the drivers (the easiest ones) were fixed but the patch is not in mailine yet: http://marc.info/?l=linux-kernel&m=123393493213943&w=2 But what about the rest of the drivers? Some of them, like tpm_nsc, are not a trivial fix. Maybe they should be made to depend on !CONFIG_SUSPEND || BROKEN until they get fixed?
Reply-To: david-b@pacbell.net On Saturday 21 February 2009, bugme-daemon@bugzilla.kernel.org wrote: > Maybe they should be made to depend on !CONFIG_SUSPEND || BROKEN until they > get > fixed? Sounds like a plan to me.
If this is still seen on modern kernels then please re-open/update