Bug 35062

Summary: Suspend to ram stopped working after commit of " intel-iommu: Unlink domain from iommu "
Product: Drivers Reporter: optiluca
Component: OtherAssignee: drivers_other
Status: CLOSED CODE_FIX    
Severity: high CC: akpm, alex.williamson, dwmw2, florian, maciej.rutecki, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.38.5 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 32012    

Description optiluca 2011-05-14 11:08:46 UTC
Hi.  When upgrading from kernel 2.6.38.4 to 2.6.38.5, suspend to ram stopped working on my Thinkpad W510.  I ran a git bisection and found the issue to have begun with the commit " intel-iommu: Unlink domain from iommu " (commit a97590e56d0d58e1dd262353f7cbd84e81d8e600 ) http://lkml.org/lkml/2011/4/29/308

When suspending from X, the system blanks the screen, but hangs before switching off the backlight.  When suspending from console, the kernel spews out a long error message ending with "segmentation fault", and hangs.

I apologize for the lack of information, this is the first time I report a bug here.  If there is something else you need to know, please ask :)

BTW, here are some hardware / system details:

lspci
00:00.0 Host bridge: Intel Corporation Core Processor DMI (rev 11)
00:03.0 PCI bridge: Intel Corporation Core Processor PCI Express Root Port 1 (rev 11)
00:08.0 System peripheral: Intel Corporation Core Processor System Management Registers (rev 11)
00:08.1 System peripheral: Intel Corporation Core Processor Semaphore and Scratchpad Registers (rev 11)
00:08.2 System peripheral: Intel Corporation Core Processor System Control and Status Registers (rev 11)
00:08.3 System peripheral: Intel Corporation Core Processor Miscellaneous Registers (rev 11)
00:10.0 System peripheral: Intel Corporation Core Processor QPI Link (rev 11)
00:10.1 System peripheral: Intel Corporation Core Processor QPI Routing and Protocol Registers (rev 11)
00:16.0 Communication controller: Intel Corporation 5 Series/3400 Series Chipset HECI Controller (rev 06)
00:16.3 Serial controller: Intel Corporation 5 Series/3400 Series Chipset KT Controller (rev 06)
00:19.0 Ethernet controller: Intel Corporation 82577LM Gigabit Network Connection (rev 06)
00:1a.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 06)
00:1b.0 Audio device: Intel Corporation 5 Series/3400 Series Chipset High Definition Audio (rev 06)
00:1c.0 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 (rev 06)
00:1c.1 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 2 (rev 06)
00:1c.3 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 4 (rev 06)
00:1c.4 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 5 (rev 06)
00:1c.6 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 7 (rev 06)
00:1c.7 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 8 (rev 06)
00:1d.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 06)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev a6)
00:1f.0 ISA bridge: Intel Corporation Mobile 5 Series Chipset LPC Interface Controller (rev 06)
00:1f.2 SATA controller: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller (rev 06)
00:1f.3 SMBus: Intel Corporation 5 Series/3400 Series Chipset SMBus Controller (rev 06)
01:00.0 VGA compatible controller: nVidia Corporation Device 0a3c (rev a2)
01:00.1 Audio device: nVidia Corporation High Definition Audio Controller (rev a1)
03:00.0 Network controller: Intel Corporation WiFi Link 6000 Series (rev 35)
0d:00.0 SD Host controller: Ricoh Co Ltd Device e822 (rev 01)
0d:00.1 System peripheral: Ricoh Co Ltd Device e230 (rev 01)
0f:00.0 USB Controller: NEC Corporation Device 0194 (rev 03)
17:00.0 SD Host controller: Ricoh Co Ltd Device e822 (rev 01)
17:00.3 FireWire (IEEE 1394): Ricoh Co Ltd Device e832 (rev 01)
ff:00.0 Host bridge: Intel Corporation Core Processor QuickPath Architecture Generic Non-Core Registers (rev 04)
ff:00.1 Host bridge: Intel Corporation Core Processor QuickPath Architecture System Address Decoder (rev 04)
ff:02.0 Host bridge: Intel Corporation Core Processor QPI Link 0 (rev 04)
ff:02.1 Host bridge: Intel Corporation Core Processor QPI Physical 0 (rev 04)
ff:03.0 Host bridge: Intel Corporation Core Processor Integrated Memory Controller (rev 04)
ff:03.1 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Target Address Decoder (rev 04)
ff:03.4 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Test Registers (rev 04)
ff:04.0 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Channel 0 Control Registers (rev 04)
ff:04.1 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Channel 0 Address Registers (rev 04)
ff:04.2 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Channel 0 Rank Registers (rev 04)
ff:04.3 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Channel 0 Thermal Control Registers (rev 04)
ff:05.0 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Channel 1 Control Registers (rev 04)
ff:05.1 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Channel 1 Address Registers (rev 04)
ff:05.2 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Channel 1 Rank Registers (rev 04)
ff:05.3 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Channel 1 Thermal Control Registers (rev 04)

emerge --info
Portage 2.2.0_alpha33 (default/linux/amd64/10.0/desktop/kde, gcc-4.5.2, glibc-2.13-r2, 2.6.38-gentoo-r3 x86_64)
=================================================================
System uname: Linux-2.6.38-gentoo-r3-x86_64-Intel-R-_Core-TM-_i7_CPU_Q_820_@_1.73GHz-with-gentoo-2.0.2
Timestamp of tree: Sat, 14 May 2011 10:30:01 +0000
app-shells/bash:          4.2_p10
dev-java/java-config:     2.1.11-r3
dev-lang/python:          2.7.1-r1
dev-util/cmake:           2.8.4-r1
sys-apps/baselayout:      2.0.2
sys-apps/openrc:          0.8.2-r1
sys-apps/sandbox:         2.5
sys-devel/autoconf:       2.13, 2.68
sys-devel/automake:       1.9.6-r3, 1.10.3, 1.11.1-r1
sys-devel/binutils:       2.21
sys-devel/gcc:            4.5.2
sys-devel/gcc-config:     1.4.1-r1
sys-devel/libtool:        2.4-r1
sys-devel/make:           3.82
sys-kernel/linux-headers: 2.6.38 (virtual/os-headers)
sys-libs/glibc:           2.13-r2
Comment 1 Rafael J. Wysocki 2011-05-14 20:51:45 UTC
First-Bad-Commit : a97590e56d0d58e1dd262353f7cbd84e81d8e600
Comment 2 Rafael J. Wysocki 2011-05-14 20:54:04 UTC
commit a97590e56d0d58e1dd262353f7cbd84e81d8e600
Author: Alex Williamson <alex.williamson@redhat.com>
Date:   Fri Mar 4 14:52:16 2011 -0700

    intel-iommu: Unlink domain from iommu
    
    When we remove a device, we unlink the iommu from the domain, but
    we never do the reverse unlinking of the domain from the iommu.
    This means that we never clear iommu->domain_ids, eventually leading
    to resource exhaustion if we repeatedly bind and unbind a device
    to a driver.  Also free empty domains to avoid a resource leak.
    
    Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
    Acked-by: Donald Dutile <ddutile@redhat.com>
    Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Comment 3 David Woodhouse 2011-05-14 23:00:20 UTC
http://david.woodhou.se/flush-unmaps-on-unbind.patch ?
Comment 4 Alex Williamson 2011-05-15 13:35:52 UTC
(In reply to comment #3)
> http://david.woodhou.se/flush-unmaps-on-unbind.patch ?

David, this seems to work for me.  I found I was able to reproduce a similar issue by unbinding a device from snd_hda_intel.  It seems only some devices trigger this, so we never hit it in testing a97590e5.  Thanks,

Alex
Comment 5 Rafael J. Wysocki 2011-05-15 13:47:21 UTC
Patch : http://david.woodhou.se/flush-unmaps-on-unbind.patch
Handled-By : David Woodhouse <dwmw2@infradead.org>
Comment 6 Andrew Morton 2011-05-18 20:35:30 UTC
David, please ensure that the patch (which isn't in linux-next yet?) has the cc:stable in the changelog?
Comment 7 Florian Mickler 2011-06-06 10:58:32 UTC
A patch referencing this bug report has been merged in v3.0-rc2:

commit 7b668357810ecb5fdda4418689d50f5d95aea6a8
Author: Alex Williamson <alex.williamson@redhat.com>
Date:   Tue May 24 12:02:41 2011 +0100

    intel-iommu: Flush unmaps at domain_exit