Bug 11681

Summary: Incorrect use of 'platform_bus_type' with 'struct device_driver'
Product: Drivers Reporter: Lothar Wassmann (LW)
Component: OtherAssignee: drivers_other
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan, dbrownell, greg, hmh, kay, marcus
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.26 Subsystem:
Regression: No Bisected commit-id:
Attachments: The attached patch fixes the problem for the PXA2xx Flash driver

Description Lothar Wassmann 2008-10-01 05:17:49 UTC
Latest working kernel version:
Earliest failing kernel version:
Distribution: any
Hardware Environment: any
Software Environment: any
Problem Description: 
Lots of device drivers register a 'struct device_driver' with the '.bus' member
set to '&platform_bus_type'.
This is wrong, since the platform_bus functions expect the 'struct device_driver' to be wrapped up in a 'struct platform_driver' which provides
some additional callbacks (like suspend_late, resume_early).
The effect may be that platform_suspend_late() uses bogus data outside the
device_driver struct as a pointer pointer to the device driver's suspend_late()
function or other hard to reproduce failures.

The following drivers need to be converted to use a platform_driver struct and
the platform_driver_register() function:
./arch/mips/basler/excite/excite_iodev
./drivers/char/ipmi/ipmi_msghandler
./drivers/char/ipmi/ipmi_si_intf
./drivers/char/tpm/tpm_atmel
./drivers/char/tpm/tpm_nsc
./drivers/char/tpm/tpm_tis
./drivers/hwmon/ibmaem
./drivers/mtd/maps/pxa2xx-flash.c
./drivers/ide/mips/au1xxx-ide
./drivers/ide/mips/swarm
./drivers/mtd/nand/excite_nandflash
./drivers/mtd/onenand/generic
./drivers/net/gianfar_mii
./drivers/net/mipsnet
./drivers/net/fs_enet/fs_enet-main
./drivers/net/fs_enet/fs_enet-main
./drivers/net/fs_enet/fs_enet-main
./drivers/net/fs_enet/mii-bitbang
./drivers/net/fs_enet/mii-fec
./drivers/pcmcia/au1000_generic
./drivers/pcmcia/hd64465_ss
./drivers/pcmcia/i82365
./drivers/pcmcia/m32r_cfc
./drivers/pcmcia/m32r_pcc
./drivers/pcmcia/sa1100_generic
./drivers/pcmcia/tcic
./drivers/pcmcia/vrc4171_card
./drivers/scsi/a4000t
./drivers/scsi/bvme6000_scsi
./drivers/scsi/mvme16x_scsi
./drivers/serial/cpm_uart/cpm_uart_core
./drivers/serial/cpm_uart/cpm_uart_core
./drivers/video/au1100fb
./drivers/video/au1200fb
./drivers/video/hgafb
./drivers/video/skeletonfb
./drivers/watchdog/rm9k_wdt

Steps to reproduce:
grep -r -B 5 '\.bus.*=.*platform_bus_type' . | grep device_driver
gives a (possibly incomplete) list of drivers that are potentially affected by this bug.
Comment 1 Lothar Wassmann 2008-10-01 05:21:18 UTC
Created attachment 18129 [details]
The attached patch fixes the problem for the PXA2xx Flash driver
Comment 2 Kay Sievers 2008-10-01 11:37:29 UTC
Yeah, looks like the platform core does:
  to_platform_driver(drv) (container_of((drv), struct platform_driver, driver))
which means it expects the driver core driver to be embedded in a platform driver.

David, these drivers need a fix, right?
Comment 3 David Brownell 2008-10-01 12:03:11 UTC
Yes, these drivers need fixing.

The problem came with whatever patch changed the platform bus code to rely on such casts for drivers.  I noticed it when I added basic PM methods to some drivers -- suspend/resume would oops, long before I added the early/late versions to match that PM iteration -- and fixed drivers on all the platforms I worked with.  That was already way too much work to fix any bug I didn't create, so I suppose I ignored the other drivers.  :)

The issue is that *originally* this idiom was standard and used throughout the "legacy" bus (later renamed "platform").  Then it became incorrect, I think when suspend/resume handling changed, but that code was still in the tree ... and the idiom was in peoples' minds.
Comment 4 Henrique de Moraes Holschuh 2009-01-04 11:58:26 UTC
This is present on 2.6.26, 2.6.27, 2.6.28, and current mainline by a quick peek at the code.  And it is causing suspend/resume issues, as expected (that's how I found this bug report).

Is anything being done about it?
Comment 5 David Brownell 2009-01-04 12:18:47 UTC
It would seem nothing is being done about it.  This might be a good project for kernel janitors...
Comment 6 Henrique de Moraes Holschuh 2009-01-04 13:10:14 UTC
Please correct me if I am wrong, but this bug DOES mean we are basically doing:

if (random memory contents)
   (random memory contents)(foo);

at suspend and resume, isn't it?

Because if it is that bad, it is time to send a patch marking all those drivers BROKEN, until someone fixes them.  And get that patch into -stable for all kernels.

A look at drivers/char/tpm shows me the "fix" might need a complete clean-up.
Comment 7 David Brownell 2009-01-04 14:52:26 UTC
I wouldn't say "random"; but certainly it's not defined to be usable in that way.
Comment 8 Henrique de Moraes Holschuh 2009-02-21 08:05:48 UTC
Ok, some of the drivers (the easiest ones) were fixed but the patch is not in mailine yet:

http://marc.info/?l=linux-kernel&m=123393493213943&w=2

But what about the rest of the drivers?  Some of them, like tpm_nsc, are not a trivial fix.

Maybe they should be made to depend on !CONFIG_SUSPEND || BROKEN until they get fixed?
Comment 9 Anonymous Emailer 2009-02-21 11:48:18 UTC
Reply-To: david-b@pacbell.net

On Saturday 21 February 2009, bugme-daemon@bugzilla.kernel.org wrote:
> Maybe they should be made to depend on !CONFIG_SUSPEND || BROKEN until they
> get
> fixed?

Sounds like a plan to me.
Comment 10 Alan 2012-10-30 15:13:11 UTC
If this is still seen on modern kernels then please re-open/update