Bug 39312

Summary: intel-iommu: Dont cache iova above 32bit - network copy freezes system
Product: Drivers Reporter: Marcus Becker (marcus.disi)
Component: OtherAssignee: drivers_other
Status: CLOSED CODE_FIX    
Severity: normal CC: chrisw, florian, kernel, maciej.rutecki, marcus.disi, mschiff, psomas, rjw
Priority: P1    
Hardware: All   
OS: Linux   
URL: https://bugs.gentoo.org/show_bug.cgi?id=373109
Kernel Version: 2.6.39.2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 36912    
Attachments: dmesg
working config 2.6.39.1
bisect log I created during the process
git bisect log
dmesg 2.6.39.2
lspci -vvv

Description Marcus Becker 2011-07-13 14:32:41 UTC
Created attachment 65422 [details]
dmesg

Hi,

With the upgrade to 2.6.39.2, I cannot copy more than ~1GB of data
over the network before my input devices lock up.
Example:
use scp on tty1 to copy a folder or file larger than 3GB over my
gigabit network and after ~1min the keyboard stops responding.
use cp on tty1 with an nfs-3 share mounted over the same network and
try to copy the same file, same happens

I can hit ctrl+c for about 20-30 sec and eventually get an interrupt
that stops the copying, then the input devices slowly gain control
again. If I do the same on X, there is no chance to get in between and
I have to use SysRq to reboot.

In our bug report at https://bugs.gentoo.org/show_bug.cgi?id=373109
We did a bisect between 2.6.39.1 and 2.6.39.2 and found the following
patch is causing this problem:

commit 87cc4d1e3e05af38c7c51323a3d86fe2572ab033
Author: Chris Wright <chrisw@sous-sol.org>
Date:   Sat May 28 13:15:04 2011 -0500

   intel-iommu: Dont cache iova above 32bit

I will also attach dmesg, current kernel config, and my bisect log (I
put a uname -a into the log after each bisect) plus git bisect log

Please let me know if you need more information.

Thanks,

Marcus
Comment 1 Marcus Becker 2011-07-13 14:33:23 UTC
Created attachment 65432 [details]
working config 2.6.39.1
Comment 2 Marcus Becker 2011-07-13 14:33:53 UTC
Created attachment 65442 [details]
bisect log I created during the process
Comment 3 Marcus Becker 2011-07-13 14:34:13 UTC
Created attachment 65452 [details]
git bisect log
Comment 4 Andrew Morton 2011-07-13 18:54:11 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 13 Jul 2011 14:32:42 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=39312
> 
>                URL: https://bugs.gentoo.org/show_bug.cgi?id=373109
>            Summary: intel-iommu: Dont cache iova above 32bit - network
>                     copy freezes system
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.39.2
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: drivers_other@kernel-bugs.osdl.org
>         ReportedBy: marcus.disi@gmail.com
>         Regression: No
> 
> 
> Created an attachment (id=65422)
>  --> (https://bugzilla.kernel.org/attachment.cgi?id=65422)
> dmesg
> 
> Hi,
> 
> With the upgrade to 2.6.39.2, I cannot copy more than ~1GB of data
> over the network before my input devices lock up.
> Example:
> use scp on tty1 to copy a folder or file larger than 3GB over my
> gigabit network and after ~1min the keyboard stops responding.
> use cp on tty1 with an nfs-3 share mounted over the same network and
> try to copy the same file, same happens
> 
> I can hit ctrl+c for about 20-30 sec and eventually get an interrupt
> that stops the copying, then the input devices slowly gain control
> again. If I do the same on X, there is no chance to get in between and
> I have to use SysRq to reboot.
> 
> In our bug report at https://bugs.gentoo.org/show_bug.cgi?id=373109
> We did a bisect between 2.6.39.1 and 2.6.39.2 and found the following
> patch is causing this problem:
> 
> commit 87cc4d1e3e05af38c7c51323a3d86fe2572ab033
> Author: Chris Wright <chrisw@sous-sol.org>
> Date:   Sat May 28 13:15:04 2011 -0500
> 
>    intel-iommu: Dont cache iova above 32bit
> 
> I will also attach dmesg, current kernel config, and my bisect log (I
> put a uname -a into the log after each bisect) plus git bisect log
> 

A 2.3.39.1->2.6.39.2 regression.

And, presumably, a 2.6.39->mainline regression.

That's commit 1c9fc3d11b84fbd0c4f4aa7855702c2a1f098ebb in mainline.
Comment 5 Rafael J. Wysocki 2011-07-13 19:33:54 UTC
First-Bad-Commit : 1c9fc3d11b84fbd0c4f4aa7855702c2a1f098ebb
Comment 6 Mike Travis 2011-07-13 20:07:24 UTC
Mike Travis wrote:
> Interesting, I was just preparing a patch to fix this (follows)

Oops, sorry, I was mistaken.  The patch I'm preparing is for a different
problem.  I'll look more closely at the bug report but I may need Chris's
help in resolving it.

Thanks,
Mike

> 
> Andrew Morton wrote:
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> On Wed, 13 Jul 2011 14:32:42 GMT
>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=39312
>>>
>>>                URL: https://bugs.gentoo.org/show_bug.cgi?id=373109
>>>            Summary: intel-iommu: Dont cache iova above 32bit - network
>>>                     copy freezes system
>>>            Product: Drivers
>>>            Version: 2.5
>>>     Kernel Version: 2.6.39.2
>>>           Platform: All
>>>         OS/Version: Linux
>>>               Tree: Mainline
>>>             Status: NEW
>>>           Severity: normal
>>>           Priority: P1
>>>          Component: Other
>>>         AssignedTo: drivers_other@kernel-bugs.osdl.org
>>>         ReportedBy: marcus.disi@gmail.com
>>>         Regression: No
>>>
>>>
>>> Created an attachment (id=65422)
>>>  --> (https://bugzilla.kernel.org/attachment.cgi?id=65422)
>>> dmesg
>>>
>>> Hi,
>>>
>>> With the upgrade to 2.6.39.2, I cannot copy more than ~1GB of data
>>> over the network before my input devices lock up.
>>> Example:
>>> use scp on tty1 to copy a folder or file larger than 3GB over my
>>> gigabit network and after ~1min the keyboard stops responding.
>>> use cp on tty1 with an nfs-3 share mounted over the same network and
>>> try to copy the same file, same happens
>>>
>>> I can hit ctrl+c for about 20-30 sec and eventually get an interrupt
>>> that stops the copying, then the input devices slowly gain control
>>> again. If I do the same on X, there is no chance to get in between and
>>> I have to use SysRq to reboot.
>>>
>>> In our bug report at https://bugs.gentoo.org/show_bug.cgi?id=373109
>>> We did a bisect between 2.6.39.1 and 2.6.39.2 and found the following
>>> patch is causing this problem:
>>>
>>> commit 87cc4d1e3e05af38c7c51323a3d86fe2572ab033
>>> Author: Chris Wright <chrisw@sous-sol.org>
>>> Date:   Sat May 28 13:15:04 2011 -0500
>>>
>>>    intel-iommu: Dont cache iova above 32bit
>>>
>>> I will also attach dmesg, current kernel config, and my bisect log (I
>>> put a uname -a into the log after each bisect) plus git bisect log
>>>
>>
>> A 2.3.39.1->2.6.39.2 regression.
>>
>> And, presumably, a 2.6.39->mainline regression.
>>
>> That's commit 1c9fc3d11b84fbd0c4f4aa7855702c2a1f098ebb in mainline.
Comment 7 Mike Travis 2011-07-13 20:17:24 UTC
Interesting, I was just preparing a patch to fix this (follows)

Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Wed, 13 Jul 2011 14:32:42 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
>> https://bugzilla.kernel.org/show_bug.cgi?id=39312
>>
>>                URL: https://bugs.gentoo.org/show_bug.cgi?id=373109
>>            Summary: intel-iommu: Dont cache iova above 32bit - network
>>                     copy freezes system
>>            Product: Drivers
>>            Version: 2.5
>>     Kernel Version: 2.6.39.2
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Other
>>         AssignedTo: drivers_other@kernel-bugs.osdl.org
>>         ReportedBy: marcus.disi@gmail.com
>>         Regression: No
>>
>>
>> Created an attachment (id=65422)
>>  --> (https://bugzilla.kernel.org/attachment.cgi?id=65422)
>> dmesg
>>
>> Hi,
>>
>> With the upgrade to 2.6.39.2, I cannot copy more than ~1GB of data
>> over the network before my input devices lock up.
>> Example:
>> use scp on tty1 to copy a folder or file larger than 3GB over my
>> gigabit network and after ~1min the keyboard stops responding.
>> use cp on tty1 with an nfs-3 share mounted over the same network and
>> try to copy the same file, same happens
>>
>> I can hit ctrl+c for about 20-30 sec and eventually get an interrupt
>> that stops the copying, then the input devices slowly gain control
>> again. If I do the same on X, there is no chance to get in between and
>> I have to use SysRq to reboot.
>>
>> In our bug report at https://bugs.gentoo.org/show_bug.cgi?id=373109
>> We did a bisect between 2.6.39.1 and 2.6.39.2 and found the following
>> patch is causing this problem:
>>
>> commit 87cc4d1e3e05af38c7c51323a3d86fe2572ab033
>> Author: Chris Wright <chrisw@sous-sol.org>
>> Date:   Sat May 28 13:15:04 2011 -0500
>>
>>    intel-iommu: Dont cache iova above 32bit
>>
>> I will also attach dmesg, current kernel config, and my bisect log (I
>> put a uname -a into the log after each bisect) plus git bisect log
>>
> 
> A 2.3.39.1->2.6.39.2 regression.
> 
> And, presumably, a 2.6.39->mainline regression.
> 
> That's commit 1c9fc3d11b84fbd0c4f4aa7855702c2a1f098ebb in mainline.
Comment 8 Marcus Becker 2011-07-13 21:59:03 UTC
On 13 July 2011 22:40, Chris Wright <chrisw@sous-sol.org> wrote:
>> On Wed, 13 Jul 2011 14:32:42 GMT
>> bugzilla-daemon@bugzilla.kernel.org wrote:
>> > https://bugzilla.kernel.org/show_bug.cgi?id=39312
>> >
>> >                URL: https://bugs.gentoo.org/show_bug.cgi?id=373109
>> >            Summary: intel-iommu: Dont cache iova above 32bit - network
>> >                     copy freezes system
>> >            Product: Drivers
>> >            Version: 2.5
>> >     Kernel Version: 2.6.39.2
>> >           Platform: All
>> >         OS/Version: Linux
>> >               Tree: Mainline
>> >             Status: NEW
>> >           Severity: normal
>> >           Priority: P1
>> >          Component: Other
>> >         AssignedTo: drivers_other@kernel-bugs.osdl.org
>> >         ReportedBy: marcus.disi@gmail.com
>> >         Regression: No
>> >
>> >
>> > Created an attachment (id=65422)
>> >  --> (https://bugzilla.kernel.org/attachment.cgi?id=65422)
>> > dmesg
>
> Can you send a dmesg from boot, and an lspci?
>
Hi,

I cannot reboot right now, I'll produce dmesg with 2.6.39.2 later
here is lspci for now:
disi-bigtop ~ # lspci
00:00.0 Host bridge: Intel Corporation Device 0104 (rev 09)
00:01.0 PCI bridge: Intel Corporation Device 0101 (rev 09)
00:16.0 Communication controller: Intel Corporation Cougar Point HECI
Controller #1 (rev 04)
00:1a.0 USB Controller: Intel Corporation Cougar Point USB Enhanced
Host Controller #2 (rev 05)
00:1b.0 Audio device: Intel Corporation Cougar Point High Definition
Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation Cougar Point PCI Express Root
Port 1 (rev b5)
00:1c.1 PCI bridge: Intel Corporation Cougar Point PCI Express Root
Port 2 (rev b5)
00:1c.2 PCI bridge: Intel Corporation Cougar Point PCI Express Root
Port 3 (rev b5)
00:1c.3 PCI bridge: Intel Corporation Cougar Point PCI Express Root
Port 4 (rev b5)
00:1d.0 USB Controller: Intel Corporation Cougar Point USB Enhanced
Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation Device 1c49 (rev 05)
00:1f.2 SATA controller: Intel Corporation Cougar Point 6 port SATA
AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation Cougar Point SMBus Controller (rev 05)
01:00.0 VGA compatible controller: nVidia Corporation Device 0dd1 (rev a1)
01:00.1 Audio device: nVidia Corporation Device 0be9 (rev a1)
02:00.0 USB Controller: NEC Corporation Device 0194 (rev 03)
03:00.0 Ethernet controller: JMicron Technology Corp. JMC250 PCI
Express Gigabit Ethernet Controller (rev 05)
03:00.1 System peripheral: JMicron Technology Corp. Device 2392 (rev 90)
03:00.2 SD Host controller: JMicron Technology Corp. Device 2391 (rev 90)
03:00.3 System peripheral: JMicron Technology Corp. Device 2393 (rev 90)
04:00.0 Network controller: Intel Corporation Device 0091 (rev 34)
05:00.0 FireWire (IEEE 1394): JMicron Technology Corp. IEEE 1394 Host
Controller (rev 30)
Comment 9 Marcus Becker 2011-07-13 22:14:27 UTC
On 13 July 2011 23:02, Chris Wright <chrisw@sous-sol.org> wrote:
> * Marcus Becker (marcus.disi@gmail.com) wrote:
>> On 13 July 2011 22:40, Chris Wright <chrisw@sous-sol.org> wrote:
>> >> On Wed, 13 Jul 2011 14:32:42 GMT
>> >> bugzilla-daemon@bugzilla.kernel.org wrote:
>> >> > https://bugzilla.kernel.org/show_bug.cgi?id=39312
>> >> >
>> >> >                URL: https://bugs.gentoo.org/show_bug.cgi?id=373109
>> >> >            Summary: intel-iommu: Dont cache iova above 32bit - network
>> >> >                     copy freezes system
>> >> >            Product: Drivers
>> >> >            Version: 2.5
>> >> >     Kernel Version: 2.6.39.2
>> >> >           Platform: All
>> >> >         OS/Version: Linux
>> >> >               Tree: Mainline
>> >> >             Status: NEW
>> >> >           Severity: normal
>> >> >           Priority: P1
>> >> >          Component: Other
>> >> >         AssignedTo: drivers_other@kernel-bugs.osdl.org
>> >> >         ReportedBy: marcus.disi@gmail.com
>> >> >         Regression: No
>> >> >
>> >> >
>> >> > Created an attachment (id=65422)
>> >> >  --> (https://bugzilla.kernel.org/attachment.cgi?id=65422)
>> >> > dmesg
>> >
>> > Can you send a dmesg from boot, and an lspci?
>> >
>> Hi,
>>
>> I cannot reboot right now, I'll produce dmesg with 2.6.39.2 later
>> here is lspci for now:
>> disi-bigtop ~ # lspci
>> 00:00.0 Host bridge: Intel Corporation Device 0104 (rev 09)
>> 00:01.0 PCI bridge: Intel Corporation Device 0101 (rev 09)
>> 00:16.0 Communication controller: Intel Corporation Cougar Point HECI
>> Controller #1 (rev 04)
>> 00:1a.0 USB Controller: Intel Corporation Cougar Point USB Enhanced
>> Host Controller #2 (rev 05)
>> 00:1b.0 Audio device: Intel Corporation Cougar Point High Definition
>> Audio Controller (rev 05)
>> 00:1c.0 PCI bridge: Intel Corporation Cougar Point PCI Express Root
>> Port 1 (rev b5)
>> 00:1c.1 PCI bridge: Intel Corporation Cougar Point PCI Express Root
>> Port 2 (rev b5)
>> 00:1c.2 PCI bridge: Intel Corporation Cougar Point PCI Express Root
>> Port 3 (rev b5)
>> 00:1c.3 PCI bridge: Intel Corporation Cougar Point PCI Express Root
>> Port 4 (rev b5)
>> 00:1d.0 USB Controller: Intel Corporation Cougar Point USB Enhanced
>> Host Controller #1 (rev 05)
>> 00:1f.0 ISA bridge: Intel Corporation Device 1c49 (rev 05)
>> 00:1f.2 SATA controller: Intel Corporation Cougar Point 6 port SATA
>> AHCI Controller (rev 05)
>> 00:1f.3 SMBus: Intel Corporation Cougar Point SMBus Controller (rev 05)
>> 01:00.0 VGA compatible controller: nVidia Corporation Device 0dd1 (rev a1)
>> 01:00.1 Audio device: nVidia Corporation Device 0be9 (rev a1)
>> 02:00.0 USB Controller: NEC Corporation Device 0194 (rev 03)
>> 03:00.0 Ethernet controller: JMicron Technology Corp. JMC250 PCI
>> Express Gigabit Ethernet Controller (rev 05)
>
> Is this the network device the traffic is going through?   Can you send
> full lspci -vvv?
>
>> 03:00.1 System peripheral: JMicron Technology Corp. Device 2392 (rev 90)
>> 03:00.2 SD Host controller: JMicron Technology Corp. Device 2391 (rev 90)
>> 03:00.3 System peripheral: JMicron Technology Corp. Device 2393 (rev 90)
>> 04:00.0 Network controller: Intel Corporation Device 0091 (rev 34)
>> 05:00.0 FireWire (IEEE 1394): JMicron Technology Corp. IEEE 1394 Host
>> Controller (rev 30)
>

the network adapter is the jme eth0
Ethernet controller: JMicron Technology Corp. JMC250 PCI Express
Gigabit Ethernet Controller (rev 05)
Comment 10 Marcus Becker 2011-07-13 22:21:01 UTC
Created attachment 65502 [details]
dmesg 2.6.39.2

the jme picked up 100Mbps this time, booting back to 2.6.39.1 and it picked up 1000Mbps, I also had Gigabit on 2.6.39.2 usually

jme 0000:03:00.0: eth0: Link is up at ANed: 1000 Mbps, Full-Duplex, MDI
Comment 11 Marcus Becker 2011-07-13 22:22:03 UTC
Created attachment 65512 [details]
lspci -vvv
Comment 12 Chris Wright 2011-07-13 22:53:58 UTC
> On Wed, 13 Jul 2011 14:32:42 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=39312
> > 
> >                URL: https://bugs.gentoo.org/show_bug.cgi?id=373109
> >            Summary: intel-iommu: Dont cache iova above 32bit - network
> >                     copy freezes system
> >            Product: Drivers
> >            Version: 2.5
> >     Kernel Version: 2.6.39.2
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >         AssignedTo: drivers_other@kernel-bugs.osdl.org
> >         ReportedBy: marcus.disi@gmail.com
> >         Regression: No
> > 
> > 
> > Created an attachment (id=65422)
> >  --> (https://bugzilla.kernel.org/attachment.cgi?id=65422)
> > dmesg

Can you send a dmesg from boot, and an lspci?
Comment 13 Chris Wright 2011-07-13 22:54:14 UTC
* Marcus Becker (marcus.disi@gmail.com) wrote:
> On 13 July 2011 22:40, Chris Wright <chrisw@sous-sol.org> wrote:
> >> On Wed, 13 Jul 2011 14:32:42 GMT
> >> bugzilla-daemon@bugzilla.kernel.org wrote:
> >> > https://bugzilla.kernel.org/show_bug.cgi?id=39312
> >> >
> >> >                URL: https://bugs.gentoo.org/show_bug.cgi?id=373109
> >> >            Summary: intel-iommu: Dont cache iova above 32bit - network
> >> >                     copy freezes system
> >> >            Product: Drivers
> >> >            Version: 2.5
> >> >     Kernel Version: 2.6.39.2
> >> >           Platform: All
> >> >         OS/Version: Linux
> >> >               Tree: Mainline
> >> >             Status: NEW
> >> >           Severity: normal
> >> >           Priority: P1
> >> >          Component: Other
> >> >         AssignedTo: drivers_other@kernel-bugs.osdl.org
> >> >         ReportedBy: marcus.disi@gmail.com
> >> >         Regression: No
> >> >
> >> >
> >> > Created an attachment (id=65422)
> >> >  --> (https://bugzilla.kernel.org/attachment.cgi?id=65422)
> >> > dmesg
> >
> > Can you send a dmesg from boot, and an lspci?
> >
> Hi,
> 
> I cannot reboot right now, I'll produce dmesg with 2.6.39.2 later
> here is lspci for now:
> disi-bigtop ~ # lspci
> 00:00.0 Host bridge: Intel Corporation Device 0104 (rev 09)
> 00:01.0 PCI bridge: Intel Corporation Device 0101 (rev 09)
> 00:16.0 Communication controller: Intel Corporation Cougar Point HECI
> Controller #1 (rev 04)
> 00:1a.0 USB Controller: Intel Corporation Cougar Point USB Enhanced
> Host Controller #2 (rev 05)
> 00:1b.0 Audio device: Intel Corporation Cougar Point High Definition
> Audio Controller (rev 05)
> 00:1c.0 PCI bridge: Intel Corporation Cougar Point PCI Express Root
> Port 1 (rev b5)
> 00:1c.1 PCI bridge: Intel Corporation Cougar Point PCI Express Root
> Port 2 (rev b5)
> 00:1c.2 PCI bridge: Intel Corporation Cougar Point PCI Express Root
> Port 3 (rev b5)
> 00:1c.3 PCI bridge: Intel Corporation Cougar Point PCI Express Root
> Port 4 (rev b5)
> 00:1d.0 USB Controller: Intel Corporation Cougar Point USB Enhanced
> Host Controller #1 (rev 05)
> 00:1f.0 ISA bridge: Intel Corporation Device 1c49 (rev 05)
> 00:1f.2 SATA controller: Intel Corporation Cougar Point 6 port SATA
> AHCI Controller (rev 05)
> 00:1f.3 SMBus: Intel Corporation Cougar Point SMBus Controller (rev 05)
> 01:00.0 VGA compatible controller: nVidia Corporation Device 0dd1 (rev a1)
> 01:00.1 Audio device: nVidia Corporation Device 0be9 (rev a1)
> 02:00.0 USB Controller: NEC Corporation Device 0194 (rev 03)
> 03:00.0 Ethernet controller: JMicron Technology Corp. JMC250 PCI
> Express Gigabit Ethernet Controller (rev 05)

Is this the network device the traffic is going through?   Can you send
full lspci -vvv?

> 03:00.1 System peripheral: JMicron Technology Corp. Device 2392 (rev 90)
> 03:00.2 SD Host controller: JMicron Technology Corp. Device 2391 (rev 90)
> 03:00.3 System peripheral: JMicron Technology Corp. Device 2393 (rev 90)
> 04:00.0 Network controller: Intel Corporation Device 0091 (rev 34)
> 05:00.0 FireWire (IEEE 1394): JMicron Technology Corp. IEEE 1394 Host
> Controller (rev 30)
Comment 14 Chris Wright 2011-07-13 23:14:58 UTC
* Chris Wright (chrisw@sous-sol.org) wrote:
> > On Wed, 13 Jul 2011 14:32:42 GMT
> > bugzilla-daemon@bugzilla.kernel.org wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=39312
> > > 
> > >                URL: https://bugs.gentoo.org/show_bug.cgi?id=373109
> > >            Summary: intel-iommu: Dont cache iova above 32bit - network
> > >                     copy freezes system
> > >            Product: Drivers
> > >            Version: 2.5
> > >     Kernel Version: 2.6.39.2
> > >           Platform: All
> > >         OS/Version: Linux
> > >               Tree: Mainline
> > >             Status: NEW
> > >           Severity: normal
> > >           Priority: P1
> > >          Component: Other
> > >         AssignedTo: drivers_other@kernel-bugs.osdl.org
> > >         ReportedBy: marcus.disi@gmail.com
> > >         Regression: No
> > > 
> > > 
> > > Created an attachment (id=65422)
> > >  --> (https://bugzilla.kernel.org/attachment.cgi?id=65422)
> > > dmesg
> 
> Can you send a dmesg from boot, and an lspci?

Two things worth trying:

1) boot with intel_iommu=strict (to disable batching of unmaps, should
keep the number of outstanding mappings much lower)

2) boot with intel_iommu=forcedac (to disable the current behaviour
which tries to map < 32bit, then if that fails, maps >32bit).

The hangs sounds like the iova allocation is looping excessively under
spin_lock_irqsave()
Comment 15 Marc Schiffbauer 2011-07-14 15:21:14 UTC
I am having the same issue. For me the system got frozen after about 1.3G transferred to my computer via nfs.

With intel_iommu=strict the behavior is the same as without, freeze after 1.3G

With intel_iommu=forcedac my system gets nearly frozen after about a second (~65MB transferred). The screen refresh gets very slow as well as keyboard input.
Transfer rate drops down to some hundreds kb/s but I am able to Ctrl-C and the system gets back to normal after some seconds.
Comment 16 Marcus Becker 2011-07-16 09:26:16 UTC
On 14 July 2011 00:14, Chris Wright <chrisw@sous-sol.org> wrote:
> * Chris Wright (chrisw@sous-sol.org) wrote:
>> > On Wed, 13 Jul 2011 14:32:42 GMT
>> > bugzilla-daemon@bugzilla.kernel.org wrote:
>> > > https://bugzilla.kernel.org/show_bug.cgi?id=39312
>> > >
>> > >                URL: https://bugs.gentoo.org/show_bug.cgi?id=373109
>> > >            Summary: intel-iommu: Dont cache iova above 32bit - network
>> > >                     copy freezes system
>> > >            Product: Drivers
>> > >            Version: 2.5
>> > >     Kernel Version: 2.6.39.2
>> > >           Platform: All
>> > >         OS/Version: Linux
>> > >               Tree: Mainline
>> > >             Status: NEW
>> > >           Severity: normal
>> > >           Priority: P1
>> > >          Component: Other
>> > >         AssignedTo: drivers_other@kernel-bugs.osdl.org
>> > >         ReportedBy: marcus.disi@gmail.com
>> > >         Regression: No
>> > >
>> > >
>> > > Created an attachment (id=65422)
>> > >  --> (https://bugzilla.kernel.org/attachment.cgi?id=65422)
>> > > dmesg
>>
>> Can you send a dmesg from boot, and an lspci?
>
> Two things worth trying:
>
> 1) boot with intel_iommu=strict (to disable batching of unmaps, should
> keep the number of outstanding mappings much lower)
>
> 2) boot with intel_iommu=forcedac (to disable the current behaviour
> which tries to map < 32bit, then if that fails, maps >32bit).
>
> The hangs sounds like the iova allocation is looping excessively under
> spin_lock_irqsave()
>

Hi,
as Marc stated in the bug report, first method is the same behavior as
before and the second method made my input delay at first and then
later locked up as method 1.
On first boot with method one, my external USB-keyboard refused to
work (funny flashing keys) so I used the laptop keyboard. I rebooted
again using the command option and it worked...
Hope that helps,

Marcus
Comment 17 Marcus Becker 2011-07-16 09:28:54 UTC
One more thing I tested, to copy ~5GB from an external USB hard drive works without problems. Marc has the same network card, it might has something to do with this?
Comment 18 Chris Wright 2011-07-19 15:02:54 UTC
(In reply to comment #15)
> I am having the same issue. For me the system got frozen after about 1.3G
> transferred to my computer via nfs.
> 
> With intel_iommu=strict the behavior is the same as without, freeze after
> 1.3G
> 
> With intel_iommu=forcedac my system gets nearly frozen after about a second
> (~65MB transferred). The screen refresh gets very slow as well as keyboard
> input.
> Transfer rate drops down to some hundreds kb/s but I am able to Ctrl-C and
> the
> system gets back to normal after some seconds.

Thanks (both Marc and Marcus) for testing this.  The forcedac test means we always allocate from the end of the 64-bit address space.  This suggests that the the linear search from the end of the address space is slow, which should only happen if there are a lot of address mappings.

(In reply to comment #17)
> One more thing I tested, to copy ~5GB from an external USB hard drive works
> without problems. Marc has the same network card, it might has something to
> do
> with this?

Yes, seems like the jme driver is not unmapping all descriptors.  I don't have access to the hardware, but if you enable CONFIG_IOMMU_DEBUG=y we can see if the iommu is filling up with mappings.  The driver itself would be pretty easy to debug to discover which unmap calls aren't being made.
Comment 19 Marcus Becker 2011-07-19 16:22:57 UTC
CONFIG_IOMMU_DEBUG=y doesn't really show anything more than before. I had to enable AMD features to enable it, guess that doesn't trace intel-iommu?
Maybe, if you involve Guo-Fu Tseng you could help him solve the jme problem?
He already provided a patch to only map to addresses below 32 but didn't help:
https://bugs.gentoo.org/show_bug.cgi?id=373109
Comment 20 Chris Wright 2011-07-19 17:23:34 UTC
Not the trace, but it will keep track of all mappings (it does sanity checking on map/unmap requests).  And you should see it give up because the memory available for tracking the dma mappings is exhausted: "DMA-API: debugging out of memory - disabling".  Basically an indirect indication that the driver is mapping but not unmapping.  I'll see if Guo-Fu Tseng can help, thanks.
Comment 21 Marcus Becker 2011-07-21 16:49:47 UTC
The patch http://patchwork.ozlabs.org/patch/105878/
Guo-Fu Tseng reported upstream works for me and two others...
https://bugs.gentoo.org/process_bug.cgi
Comment 22 Florian Mickler 2011-08-09 07:56:57 UTC
The patch got merged in v3.1-rc1:

commit 94c5b41b327e08de0ddf563237855f55080652a1
Author: Guo-Fu Tseng <cooldavid@cooldavid.org>
Date:   Wed Jul 20 16:57:36 2011 +0000

    jme: Fix unmap error (Causing system freeze)