Bug 10080 - 2.6.25-rc2: ohci1394 problem (MMIO broken)
Summary: 2.6.25-rc2: ohci1394 problem (MMIO broken)
Status: CLOSED CODE_FIX
Alias: None
Product: Other
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: other_other
URL:
Keywords:
Depends on:
Blocks: 9832
  Show dependency tree
 
Reported: 2008-02-23 14:10 UTC by Rafael J. Wysocki
Modified: 2008-03-27 14:17 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.25-rc2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
config used (51.78 KB, text/plain)
2008-03-22 10:08 UTC, Thomas Meyer
Details
dmesg (120.25 KB, text/plain)
2008-03-22 10:10 UTC, Thomas Meyer
Details
lspci -vv (23.84 KB, text/plain)
2008-03-24 12:31 UTC, Thomas Meyer
Details
dmesg 2.6.24 (30.40 KB, text/plain)
2008-03-25 13:54 UTC, Thomas Meyer
Details
lspci -vv 2.6.24 (24.03 KB, text/plain)
2008-03-25 13:55 UTC, Thomas Meyer
Details
lsmod 2.6.24 (2.44 KB, text/plain)
2008-03-25 13:56 UTC, Thomas Meyer
Details

Description Rafael J. Wysocki 2008-02-23 14:10:49 UTC
Subject         : 2.6.25-rc2: ohci1394 problem
Submitter       : Thomas Meyer <thomas@m3y3r.de>
Date            : 2008-02-20 08:47
References      : http://lkml.org/lkml/2008/2/20/58
Handled-By      : Stefan Richter <stefanr@s5r6.in-berlin.de>

This entry is being used for tracking a regression from 2.6.24.  Please don't
close it until the problem is fixed in the mainline or the report is rejected.
Comment 1 Stefan Richter 2008-02-24 10:59:06 UTC
Summary of latest feedback (http://lkml.org/lkml/2008/2/23/244 + http://lkml.org/lkml/2008/2/23/259):
  - The presence of both ohci1394 and firewire-ohci does not seem to be the problem.
  - Several or all of ohci1394's MMIO reads return ~0 (all bits set to one).  Therefore it is unlikely that the problem is caused by the IEEE1394 subsystem.
  - Is this really a regression relative to 2.6.24?
Comment 2 Rafael J. Wysocki 2008-02-24 17:07:41 UTC
Thomas, can you verify if 2.6.24 works correctly, please?
Comment 3 Stefan Richter 2008-02-25 11:02:26 UTC
Thomas response in http://marc.info/?l=linux-kernel&m=120396387509950 :
  - Was apparently a mixup when building the kernel; went away after "make clean".
  - In any case, 2.6.24 was not affected.
Comment 4 Stefan Richter 2008-03-19 02:14:24 UTC
New status:  Bug reappeared with a make distclean; make on v2.6.25-rc6-14-gbde4f8f.
Comment 5 Thomas Meyer 2008-03-22 10:08:35 UTC
Created attachment 15396 [details]
config used
Comment 6 Thomas Meyer 2008-03-22 10:10:53 UTC
Created attachment 15397 [details]
dmesg
Comment 7 Thomas Meyer 2008-03-22 10:14:55 UTC
 $ git describe
v2.6.25-rc6-243-g028011e
Comment 8 Rafael J. Wysocki 2008-03-22 10:58:04 UTC
Handled-By : Nobody
Comment 9 H. Peter Anvin 2008-03-22 14:55:59 UTC
I'm confused about this.  I looked at the original threads, and what really stands out to me is that the original reporter had two drivers loaded for the same hardware (firewire-ohci and ohci1394.)  *In the best case* there is a fundamental race condition there, meaning unpredictable behaviour would be the norm.
Comment 10 H. Peter Anvin 2008-03-22 15:22:38 UTC
Can someone publish the "lspci -vv" for the affected system?
Comment 11 Anonymous Emailer 2008-03-22 15:28:31 UTC
Reply-To: stefanr@s5r6.in-berlin.de

H. Peter Anvin wrote at 
http://bugzilla.kernel.org/show_bug.cgi?id=10080#c9 :
> I'm confused about this.  I looked at the original threads, and what really
> stands out to me is that the original reporter had two drivers loaded for the
> same hardware (firewire-ohci and ohci1394.)  *In the best case* there is a
> fundamental race condition there, meaning unpredictable behaviour would be
> the
> norm.


Hmm, right -- I didn't see this until now.  Today's dmesg:
http://bugzilla.kernel.org/attachment.cgi?id=15397&action=view
[    1.236587] firewire_ohci: Failed to remap registers
[  243.640549] ohci1394: fw-host0: Get PHY Reg timeout
(etc.)

However, the two drivers for the same device don't seem to be the 
problem.  Looks like firewire-ohci was attempted to be bound to the 
controller much earlier than ohci1394.  The error message means that 
firewire-ohci's pci_request_region() succeeded but pci_iomap() failed, 
hence the pci_driver.probe failed, hence firewire-ohci wasn't bound to 
the device, hence subsequent loading of ohci1394 (manually, I presume) 
was a valid action.

IOW firewire-ohci was indeed already loaded, but not bound to the device 
because of the .probe failure; and ohci1394 was loaded much later.

Same thing in the report in February:
http://lkml.org/lkml/2008/2/23/244
[    1.326958] firewire_ohci: Failed to remap registers
[  856.943807] ohci1394: fw-host0: Get PHY Reg timeout
(here: ohci1394 manually loaded by insmod)

(Let's see if bugme-daemon captures this...)
Comment 12 Stefan Richter 2008-03-23 12:31:08 UTC
Proposed patch: http://lkml.org/lkml/2008/3/22/175
Comment 13 Thomas Meyer 2008-03-24 12:31:06 UTC
Created attachment 15417 [details]
lspci -vv
Comment 14 Stefan Richter 2008-03-24 14:09:37 UTC
Linus' patch as per comment #12 has been committed:
commit b9e76a00749521f2b080fa8a4fb15f66538ab756
(I suppose it still needs to be tested by Thomas)
Comment 15 Stefan Richter 2008-03-25 00:58:58 UTC
Additional proposed patch by Ingo: http://lkml.org/lkml/2008/3/25/32
Comment 16 Thomas Meyer 2008-03-25 13:54:45 UTC
Created attachment 15433 [details]
dmesg 2.6.24
Comment 17 Thomas Meyer 2008-03-25 13:55:46 UTC
Created attachment 15434 [details]
lspci -vv 2.6.24
Comment 18 Thomas Meyer 2008-03-25 13:56:18 UTC
Created attachment 15435 [details]
lsmod 2.6.24

Note You need to log in before you can comment on or make changes to this bug.