Bug 43247 - O2 micro SD/MMC+1394 controller: 1394 device can't work (Register access failure)
Summary: O2 micro SD/MMC+1394 controller: 1394 device can't work (Register access fail...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: IEEE1394 (show other bugs)
Hardware: All Linux
: P1 high
Assignee: drivers_ieee1394
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-14 05:54 UTC by jennifer
Modified: 2014-02-28 10:54 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.38-3.3.7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
The ohci file for comment#4 (103.58 KB, text/plain)
2012-05-28 07:29 UTC, jennifer
Details
The picture for comment#4 (156.07 KB, image/png)
2012-05-28 07:30 UTC, jennifer
Details
Log Message for commet#10. (159.19 KB, text/plain)
2012-05-28 08:55 UTC, jennifer
Details
screenshot from Dell Latitude E5420, Ubuntu 12-04 (852.79 KB, image/jpeg)
2013-09-08 15:24 UTC, Stefan Richter
Details

Description jennifer 2012-05-14 05:54:14 UTC
Hi,

The 1394 initial process will fail and show " Apr  2 11:42:20 localhost kernel: firewire_ohci: Register access failure - please notify linux1394-devel@lists.sf.net".

This issue has happened in the O2 micro SD/MMC+1394 combo function chip + DELL platform Vans or Latitude E5520. We can find the same issue in J-micro USB+1394 combo pci expresses card + DELL platform Vans or Latitude E5520.

We found the problem was caused by the initialization process. If the two devices have initial in the same time, the 1394 device can't work. If we have plug in the 1394 device and boot into OS, the 1394 device can work well.

Best regards,
Jennifer
Comment 1 Stefan Richter 2012-05-14 06:29:02 UTC
Current discussion:
http://marc.info/?t=133346965500001

Potential solutions, not yet fully confirmed:

(a) After the CycleTimer acces in ohci_enable(), clear regAccessFail and proceed with controller initialization.
Patch series:
http://git.kernel.org/?p=linux/kernel/git/ieee1394/linux1394.git;a=shortlog;h=refs/heads/o2micro
Configure a file in /etc/modprobe.d/ with the line:
options firewire-ohci cycle_timer_hard_fail=0 sclk_retries=0

Or

(b) Defer ohci->bus_time initialization and updates until the bus time CSR is accessed for the first time.
Patch:
http://marc.info/?l=linux1394-devel&m=133611307930958
Comment 2 jennifer 2012-05-15 02:24:16 UTC
Hi,

I have tried those solution and it is unwoekable. Could you provide any other solutions?
Comment 3 jennifer 2012-05-17 10:24:27 UTC
Information update:

The Firewire driver has failed on "ohci_enable" routine. After that, it will show the "register access failure" message from irq_handler routine. Maybe there are something wrong in the "ohci_enable" register setting process.
Comment 4 jennifer 2012-05-23 05:10:19 UTC
(In reply to comment #1)
> Current discussion:
> http://marc.info/?t=133346965500001
> 
> Potential solutions, not yet fully confirmed:
> 
> (a) After the CycleTimer acces in ohci_enable(), clear regAccessFail and
> proceed with controller initialization.
> Patch series:
>
> http://git.kernel.org/?p=linux/kernel/git/ieee1394/linux1394.git;a=shortlog;h=refs/heads/o2micro
> Configure a file in /etc/modprobe.d/ with the line:
> options firewire-ohci cycle_timer_hard_fail=0 sclk_retries=0
> 

[Jennifer]
Please see the attached picture "screenshot.png".
I have tried to add the " options firewire-ohci cycle_timer_hard_fail=0 sclk_retries=0" in the conf files under "/etc/modprobe.d/". After that, reboot it. 


> Or
> 
> (b) Defer ohci->bus_time initialization and updates until the bus time CSR is
> accessed for the first time.
> Patch:
> http://marc.info/?l=linux1394-devel&m=133611307930958

[Jennifer]
Please see the attached file ochi.c.
I have added the patch on ohci.c. (without the solution (A) )


OS: Fedora 16
Kernel version: 3.1.07 & 3.4 rc6
Comment 5 jennifer 2012-05-25 09:20:10 UTC
-Update information-

If the OS has entered the suspend mode, the 1394 device can work after the OS resume.
OS: Fedora 16
Kernel: 3.3.5 or 3.3.7

Steps:
1. Suspend
2. Resume.
3. Plug in the 1394 device and the 1394 device is workable.

Steps:
1. Suspend
2. Resume.
3. Power off the notebook
4. Power on the notebook
5. Enter Fedora 16 and plug in 1394 device. The 1394 device is work.
Comment 6 Stefan Richter 2012-05-27 15:30:45 UTC
The attachments from comment 4 are missing.  Did something go wrong with the upload?

Is comment 5 referring to unmodified Fedora kernels?
Comment 7 jennifer 2012-05-28 07:29:59 UTC
Created attachment 73441 [details]
The ohci file for comment#4

The ohci file for comment#4
Comment 8 jennifer 2012-05-28 07:30:41 UTC
Created attachment 73442 [details]
The picture for comment#4

The picture for comment#4
Comment 9 jennifer 2012-05-28 07:31:59 UTC
Yes, I didn't modify the Fedora 16 (3.3.5 & 3.3.7
) kernels.
Comment 10 jennifer 2012-05-28 08:54:28 UTC
(In reply to comment #5)
> -Update information-
> 
> If the OS has entered the suspend mode, the 1394 device can work after the OS
> resume.
> OS: Fedora 16
> Kernel: 3.3.5 or 3.3.7
> 
> Steps:
> 1. Suspend
> 2. Resume.
> 3. Plug in the 1394 device and the 1394 device is workable.
>

Sorry, this process was wrong. When we power off and restart the notebook, the 1394 device still unrecognized.
> Steps:
> 1. Suspend
> 2. Resume.
> 3. Power off the notebook
> 4. Power on the notebook
> 5. Enter Fedora 16 and plug in 1394 device. The 1394 device is work.

I have get the messages from Fedora16+3.3.7kernel. Please see the messages file. 

Message Test process:
1. Boot into Fedora16+3.3.7kernel
2. Plug in the 1394 Harddisk
3. 1394 HD was unrecognized.
4. Plug out the 1394 HD.
5. Suspend
6. Resume
7. Plug in the 1394 Harddisk
8. 1394 HD was recognized.
9. Eject the 1394 HD.
10. Plug in the SD card and copy file to SD card.
Comment 11 jennifer 2012-05-28 08:55:48 UTC
Created attachment 73443 [details]
Log Message for commet#10.

Log Message for commet#10.
Comment 12 jennifer 2012-05-29 08:55:11 UTC
Thanks for your suggestions. This is my reply.

Clemens Ladisch wrote:
>Jennifer, does this chip has any PCI configuration registers that control PHY
>>power management, or PHY reset/initialization ?

No, we don't have any PCI configuration registers that control PHY power management, or PHY reset/initialization.

Stefan Richter wrote:
>OHCI 1.1 says:  "Software may only set [HCControl.BIBimageValid] when
>>HCControl.linkEnable is zero."  This let's me wonder whether setting these
>two >bits simultaneously could confuse some controllers.  To be safe, maybe
>set >them in separate MMIOs?  Should we even read HCControl back to check that
>>BIBimageValid has become 1 before we set linkEnable?

>Observations:

>  - The reported cases of "device not detected if plugged into O2Micro
>    controller after controller was enabled" are about SBP-2 devices,
>    and the devices are actually detected but the SBP-2 login procedure
>    is not completed.  One of the login steps that could have gone wrong
>    is when the target tries to read the initiator's EUI-64 in its bus
>    info block.  On the other hand, SBP-2 and -3 require this to be done
>    in two quadlet read requests; per OHCI spec these are served by the
>    OHCI physical response unit based on the GlobalUniqueID register
>    rather than the BIB image.

>  - BIBimageValid was added in OHCI 1.1; the bit is reserved in
>    OHCI 1.0.  The old ohci1394 driver in kernel 2.6.36 and older never
>    set this bit but also did not use the atomic config ROM image update
>    protocol of OHCI 1.1.
>---
>The patch should be applicable to any recent kernel without regAccessFail
>>patches.
>
> drivers/firewire/ohci.c |    6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)

>--- a/drivers/firewire/ohci.c
>+++ b/drivers/firewire/ohci.c
>@@ -2384,10 +2384,8 @@ static int ohci_enable(struct fw_card *c
>               irqs |= OHCI1394_busReset;
>       reg_write(ohci, OHCI1394_IntMaskSet, irqs);
> 
>-      reg_write(ohci, OHCI1394_HCControlSet,
>-                OHCI1394_HCControl_linkEnable |
>-                OHCI1394_HCControl_BIBimageValid);
>-
>+      reg_write(ohci, OHCI1394_HCControlSet,
>>OHCI1394_HCControl_BIBimageValid);
>+      reg_write(ohci, OHCI1394_HCControlSet, OHCI1394_HCControl_linkEnable);
>       reg_write(ohci, OHCI1394_LinkControlSet,
>                 OHCI1394_LinkControl_rcvSelfID |

I have tried it and it is unworkable.
Comment 13 jennifer 2012-06-19 02:05:31 UTC
Hi, 

Does anyone have any suggestions about this issue?
Comment 14 Stefan Richter 2012-06-26 20:14:21 UTC
On Jun 19 bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=43247
> 
> --- Comment #13 from jennifer <jennifer.li@o2micro.com>  2012-06-19 02:05:31
> ---
> Hi, 
> 
> Does anyone have any suggestions about this issue?

Sorry for the delay.  The next step would be to test the suggestion from
Clemens, May 28, of a limited loop over
  - soft reset
  - perhaps an added wait
  - request LPS on
  - wait
  - check LPS
  - read cycle timer, check regAccessFail
in ohci_enable.  I shall write a patch for testing when I get some
spare time.

However
  1. the prior reports about failure to access SBP-2 storage devices
  (everything up until SBP-2 login exclusive did actually work, only one
  of the steps during login did not proceed) still let me suspect that the
  issue is not entirely localized to the phy--link interface state;
  2. what if the mentioned loop never gets past reading the cycle timer
  without regAccessFail?

Could there be an interaction between initialization of the two or three
functions of the combination controller?  (FireWire OHCI 1217:13f7 rev 05
and SDHCI 1217:8321 rev 05; or FireWire OHCI 1217:11f7 rev 05 and SDHCI
1217:8320 rev 5 and Mass storage controller 1217:8330 rev 05)
Comment 15 jennifer 2012-06-27 06:31:50 UTC
(In reply to comment #14)
Thanks for your reply!
> Sorry for the delay.  The next step would be to test the suggestion from
> Clemens, May 28, of a limited loop over
>   - soft reset
Jennifer: I have added the soft reset before and it looks unworkable.
>   - perhaps an added wait
>   - request LPS on
>   - wait
>   - check LPS
>   - read cycle timer, check regAccessFail
> in ohci_enable.  I shall write a patch for testing when I get some
> spare time.
> 
> However
>   1. the prior reports about failure to access SBP-2 storage devices
>   (everything up until SBP-2 login exclusive did actually work, only one
>   of the steps during login did not proceed) still let me suspect that the
>   issue is not entirely localized to the phy--link interface state;
>   2. what if the mentioned loop never gets past reading the cycle timer
>   without regAccessFail?
Jennifer: I don't understand this question. If we didn't have the "regAccessFail" message and we didn't have the 1394 problem. 
> 
> Could there be an interaction between initialization of the two or three
> functions of the combination controller?  (FireWire OHCI 1217:13f7 rev 05
> and SDHCI 1217:8321 rev 05; or FireWire OHCI 1217:11f7 rev 05 and SDHCI
> 1217:8320 rev 5 and Mass storage controller 1217:8330 rev 05)
This problems happened in the default Linux driver + combination controller only. If we have initialed the combo device separately (manually), we didn't see the problems. If OS have initialed the combo device, we will see the problems.
Linux have the default 1394 and SD driver only. So, we didn't meet the problems in (Mass storage controller 1217:8330 rev 05+FireWire OHCI 1217:13f7 rev 05)
Comment 16 Stefan Richter 2012-07-01 11:01:55 UTC
On Jun 27 bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=43247
> 
> --- Comment #15 from jennifer <jennifer.li@o2micro.com>  2012-06-27 06:31:50
> ---
> (In reply to comment #14)
> Thanks for your reply!
> > Sorry for the delay.  The next step would be to test the suggestion from
> > Clemens, May 28, of a limited loop over
> >   - soft reset
> 
> Jennifer: I have added the soft reset before and it looks unworkable.
> 
> >   - perhaps an added wait
> >   - request LPS on
> >   - wait
> >   - check LPS
> >   - read cycle timer, check regAccessFail
> > in ohci_enable.  I shall write a patch for testing when I get some
> > spare time.

I guess we should nevertheless still try the loop, but I don't have much
hope for that.

> > However
> >   1. the prior reports about failure to access SBP-2 storage devices
> >   (everything up until SBP-2 login exclusive did actually work, only one
> >   of the steps during login did not proceed) still let me suspect that the
> >   issue is not entirely localized to the phy--link interface state;

On the other hand, Cyril's new report shows the symptoms of a completely
disabled phy--link interface.
http://marc.info/?l=linux1394-devel&m=134074880424956

> >   2. what if the mentioned loop never gets past reading the cycle timer
> >   without regAccessFail?
> 
> Jennifer: I don't understand this question. If we didn't have the
> "regAccessFail" message and we didn't have the 1394 problem. 

It was a hypothetical question regrading what might happen with the above
mentioned loop.

> > Could there be an interaction between initialization of the two or three
> > functions of the combination controller?  (FireWire OHCI 1217:13f7 rev 05
> > and SDHCI 1217:8321 rev 05; or FireWire OHCI 1217:11f7 rev 05 and SDHCI
> > 1217:8320 rev 5 and Mass storage controller 1217:8330 rev 05)
> 
> This problems happened in the default Linux driver + combination controller
> only. If we have initialed the combo device separately (manually), we didn't
> see the problems. If OS have initialed the combo device, we will see the
> problems.
> Linux have the default 1394 and SD driver only. So, we didn't meet the
> problems
> in (Mass storage controller 1217:8330 rev 05+FireWire OHCI 1217:13f7 rev 05)

Was your manual initialization done in hardware or by software?  If the
latter,
  - where there steps done in a different oder than the Linux driver does,
  - did you omit steps which the Linux driver performs?
Comment 17 jennifer 2012-07-06 10:14:46 UTC
(In reply to comment #16)
> On Jun 27 bugzilla-daemon@bugzilla.kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=43247
> > 
> > --- Comment #15 from jennifer <jennifer.li@o2micro.com>  2012-06-27
> 06:31:50 ---
> > (In reply to comment #14)
> > Thanks for your reply!
> > > Sorry for the delay.  The next step would be to test the suggestion from
> > > Clemens, May 28, of a limited loop over
> > >   - soft reset
> > 
> > Jennifer: I have added the soft reset before and it looks unworkable.
> > 
> > >   - perhaps an added wait
> > >   - request LPS on
> > >   - wait
> > >   - check LPS
> > >   - read cycle timer, check regAccessFail
> > > in ohci_enable.  I shall write a patch for testing when I get some
> > > spare time.
> 
> I guess we should nevertheless still try the loop, but I don't have much
> hope for that.
Yes, it is unworkable. I have tested it.
> 
> > > However
> > >   1. the prior reports about failure to access SBP-2 storage devices
> > >   (everything up until SBP-2 login exclusive did actually work, only one
> > >   of the steps during login did not proceed) still let me suspect that
> the
> > >   issue is not entirely localized to the phy--link interface state;
> 
> On the other hand, Cyril's new report shows the symptoms of a completely
> disabled phy--link interface.
> http://marc.info/?l=linux1394-devel&m=134074880424956
> 
> > >   2. what if the mentioned loop never gets past reading the cycle timer
> > >   without regAccessFail?
> > 
> > Jennifer: I don't understand this question. If we didn't have the
> > "regAccessFail" message and we didn't have the 1394 problem. 
> 
> It was a hypothetical question regrading what might happen with the above
> mentioned loop.
> 
> > > Could there be an interaction between initialization of the two or three
> > > functions of the combination controller?  (FireWire OHCI 1217:13f7 rev 05
> > > and SDHCI 1217:8321 rev 05; or FireWire OHCI 1217:11f7 rev 05 and SDHCI
> > > 1217:8320 rev 5 and Mass storage controller 1217:8330 rev 05)
> > 
> > This problems happened in the default Linux driver + combination controller
> > only. If we have initialed the combo device separately (manually), we
> didn't
> > see the problems. If OS have initialed the combo device, we will see the
> > problems.
> > Linux have the default 1394 and SD driver only. So, we didn't meet the
> problems
> > in (Mass storage controller 1217:8330 rev 05+FireWire OHCI 1217:13f7 rev
> 05)
> 
> Was your manual initialization done in hardware or by software?  If the
> latter,
>   - where there steps done in a different oder than the Linux driver does,
>   - did you omit steps which the Linux driver performs?
Those steps done in the same order. It was arranged by OS. I didn't modify the Linux OS and it has arrange by default Linux OS.
Comment 18 Stefan Richter 2012-07-08 09:57:16 UTC
> --- Comment #17 from jennifer <jennifer.li@o2micro.com>  2012-07-06 10:14:46
> ---
> (In reply to comment #16)
> > On Jun 27 bugzilla-daemon@bugzilla.kernel.org wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=43247
> > I guess we should nevertheless still try the loop, but I don't have much
> > hope for that.
> 
> Yes, it is unworkable. I have tested it.

[...]
> > > This problems happened in the default Linux driver + combination
> controller
> > > only. If we have initialed the combo device separately (manually), we
> didn't
> > > see the problems. If OS have initialed the combo device, we will see the
> > > problems.
[...]
> > Was your manual initialization done in hardware or by software?  If the
> > latter,
> >   - where there steps done in a different oder than the Linux driver does,
> >   - did you omit steps which the Linux driver performs?
> 
> Those steps done in the same order. It was arranged by OS. I didn't modify
> the
> Linux OS and it has arrange by default Linux OS.

I am not sure whether I understood correctly:  Do you mean by that that
the original Linux driver works for you if you boot Linux but do not let
the driver be automatically loaded during boot, but instead load the driver
later by 'modprobe firewire-ohci' after the rest of the system has finished
booting up?

And if yes, did you also need to load the sdhci-pci driver manually this
way?  And further, does it matter whether sdhci-pci is loaded before
firewire-ohci or the other way around?
Comment 19 jennifer 2012-07-10 11:02:16 UTC
(In reply to comment #18)
> > --- Comment #17 from jennifer <jennifer.li@o2micro.com>  2012-07-06
> 10:14:46 ---
> > (In reply to comment #16)
> > > On Jun 27 bugzilla-daemon@bugzilla.kernel.org wrote:
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=43247
> > > I guess we should nevertheless still try the loop, but I don't have much
> > > hope for that.
> > 
> > Yes, it is unworkable. I have tested it.
> 
> [...]
> > > > This problems happened in the default Linux driver + combination
> controller
> > > > only. If we have initialed the combo device separately (manually), we
> didn't
> > > > see the problems. If OS have initialed the combo device, we will see
> the
> > > > problems.
> [...]
> > > Was your manual initialization done in hardware or by software?  If the
> > > latter,
> > >   - where there steps done in a different oder than the Linux driver
> does,
> > >   - did you omit steps which the Linux driver performs?
> > 
> > Those steps done in the same order. It was arranged by OS. I didn't modify
> the
> > Linux OS and it has arrange by default Linux OS.
> 
> I am not sure whether I understood correctly:  Do you mean by that that
> the original Linux driver works for you if you boot Linux but do not let
> the driver be automatically loaded during boot, but instead load the driver
> later by 'modprobe firewire-ohci' after the rest of the system has finished
> booting up?

Yes.

> 
> And if yes, did you also need to load the sdhci-pci driver manually this
> way?  

There are 4 ways which can pass the issue.
1. Load 1394 by OS + Load sdhci-pci by manually.
2. Load sdhci-pci by OS + Load 1394 by manually.
3. Load 1394 by manually + load sdhci-pci by manually.
4. Load sdhci-pci by manually + load 1394 by manually.

> And further, does it matter whether sdhci-pci is loaded before
> firewire-ohci or the other way around?

According to our test result, if we load the driver by manually and the issue will disappear. It didn't has the relationship about the loaded priority.
But, if we load those drivers by OS and the issue will happen.
Comment 20 Stefan Richter 2012-07-14 13:09:16 UTC
On Jul 10 bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=43247
> 
> --- Comment #19 from jennifer <jennifer.li@o2micro.com>  2012-07-10 11:02:16
> ---
[...]
> > Do you mean by that that
> > the original Linux driver works for you if you boot Linux but do not let
> > the driver be automatically loaded during boot, but instead load the driver
> > later by 'modprobe firewire-ohci' after the rest of the system has finished
> > booting up?
> 
> Yes.
> 
> > 
> > And if yes, did you also need to load the sdhci-pci driver manually this
> > way?  
> 
> There are 4 ways which can pass the issue.
> 1. Load 1394 by OS + Load sdhci-pci by manually.
> 2. Load sdhci-pci by OS + Load 1394 by manually.
> 3. Load 1394 by manually + load sdhci-pci by manually.
> 4. Load sdhci-pci by manually + load 1394 by manually.
> 
> > And further, does it matter whether sdhci-pci is loaded before
> > firewire-ohci or the other way around?
> 
> According to our test result, if we load the driver by manually and the issue
> will disappear. It didn't has the relationship about the loaded priority.
> But, if we load those drivers by OS and the issue will happen.


Could somebody at linux-pci@vger.kernel.org please advise?

1.)  Is there a kernel parameter which Jennifer could try in order to
force serialized PCI driver probing?

2.)  If there is one and if this turns out to cure the issue in testing:
How can I implement serialization between the O2Micro FireWire .probe()
and .resume() on one hand and the O2Micro SDHCI .probe() and .resume() on
the other hand?


[If you reply to this via bugzilla mail, please add
    Cc: linux-pci@vger.kernel.org
in your reply.  I am not aware of a way to add it to bugzilla.kernel.org's
Cc list of bug 43247.]
Comment 21 Bjorn Helgaas 2012-07-17 18:25:01 UTC
> 1.)  Is there a kernel parameter which Jennifer could try in order to force
> serialized PCI driver probing?

I don't understand the question here.  As far as I know, drivers
compiled statically into the kernel are already initialized serially,
via the do_initcalls() -> do_initcall_level(6) path.  The PCI core
enumerates all the devices, then when we call each driver's
module_init() function (serially), the module_init() function will
register the driver, and the driver core will call the driver's
.probe() function for every matching PCI device.

> 2.)  If there is one and if this turns out to cure the issue in testing:
> How can I implement serialization between the O2Micro FireWire .probe() and
> .resume() on one hand and the O2Micro SDHCI .probe() and .resume() on the
> other hand?

If serialization is required between two drivers, that sounds like a
driver bug.  What are the two drivers involved?
Comment 22 Stefan Richter 2012-07-17 19:47:01 UTC
On Jul 17 Bjorn Helgaas wrote:
> > 1.)  Is there a kernel parameter which Jennifer could try in order
> >      to force serialized PCI driver probing?
> 
> I don't understand the question here.  As far as I know, drivers
> compiled statically into the kernel are already initialized serially,
> via the do_initcalls() -> do_initcall_level(6) path.  The PCI core
> enumerates all the devices, then when we call each driver's
> module_init() function (serially), the module_init() function will
> register the driver, and the driver core will call the driver's
> .probe() function for every matching PCI device.

Jennifer and the other reporters of the issue most certainly all used
modular kernels, i.e. got the drivers loaded by udev.

module_init() doesn't matter, but
  - .probe(),
  - .resume()
do.  So as you say, if the drivers were statically linked, then at least
.probe() would be performed serially, one PCI device after another,
right?  But AFAIK .resume() would still be performed in parallel by a
pool of kernel threads in current kernels.

> > 2.)  If there is one and if this turns out to cure the issue in
> >      testing:
> >      How can I implement serialization between the O2Micro FireWire
> >      .probe() and .resume() on one hand and the O2Micro SDHCI
> >      .probe() and .resume() on the other hand?
> 
> If serialization is required between two drivers, that sounds like a
> driver bug.  What are the two drivers involved?

From what I can tell, there is no driver bug but an unfortunate interaction
between parts of a combo controller which in theory should functionally be
totally unrelated.

First we had lots and lots of slightly different and inconclusive reports
from various people about the O2Micro FireWire part (or/and the SDHCI
part) failing to work after boot or after PM resume.  Jennifer's recent
reports show that this happens if the two devices are being .probed() at
the same time (or perhaps not exactly at the same time but close together;
at least the firewire-ohci .probe() takes long enough to make some overlap
probable of the .probe()s were called asynchronously baut close together).

The involved drivers are:

    drivers/firewire/firewire-ohci.ko (drivers/firewire/ohci.c)
    drivers/mmc/host/sdhci-pci.ko     (drivers/mmc/host/sdhci-pci.c)


The involved device is either this triple combo device:

0b:00.0 FireWire (IEEE 1394) [0c00]: O2 Micro, Inc. Device [1217:11f7] (rev 05) (prog-if 10 [OHCI])
	Subsystem: Dell Device [1028:04a4]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 42
	Region 0: Memory at e0130000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: <access denied>
	Kernel driver in use: firewire_ohci
	Kernel modules: firewire-ohci

0b:00.1 SD Host controller [0805]: O2 Micro, Inc. Device [1217:8320] (rev 05) (prog-if 01)
	Subsystem: Dell Device [1028:04a4]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin B routed to IRQ 16
	Region 0: Memory at e0120000 (32-bit, non-prefetchable) [size=512]
	Capabilities: <access denied>
	Kernel driver in use: sdhci-pci
	Kernel modules: sdhci-pci

0b:00.2 Mass storage controller [0180]: O2 Micro, Inc. Device [1217:8330] (rev 05)
	Subsystem: Dell Device [1028:04a4]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin C routed to IRQ 10
	Region 0: Memory at e0110000 (32-bit, non-prefetchable) [size=1K]
	Region 2: Memory at e0100000 (32-bit, non-prefetchable) [size=1K]
	Capabilities: <access denied>


Or this dual combo device:

09:00.0 FireWire (IEEE 1394) [0c00]: O2 Micro, Inc. Device [1217:13f7] (rev 05) (prog-if 10 [OHCI])
	Subsystem: Dell Device [1028:04b4]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 17
	Region 0: Memory at e1a30000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: <access denied>
	Kernel driver in use: firewire_ohci
	Kernel modules: firewire-ohci

09:00.1 SD Host controller [0805]: O2 Micro, Inc. Device [1217:8321] (rev 05) (prog-if 01)
	Subsystem: Dell Device [1028:04b4]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin B routed to IRQ 18
	Region 0: Memory at e1a20000 (32-bit, non-prefetchable) [size=512]
	Capabilities: <access denied>
	Kernel driver in use: sdhci-pci
	Kernel modules: sdhci-pci

-- 
Stefan Richter
-=====-===-- -=== =---=
http://arcgraph.de/sr/
Comment 23 Bjorn Helgaas 2012-07-17 20:55:54 UTC
> Jennifer and the other reporters of the issue most certainly all used
> modular kernels, i.e. got the drivers loaded by udev.
>
> module_init() doesn't matter, but
>   - .probe(),
>   - .resume()
> do.  So as you say, if the drivers were statically linked, then at least
> .probe() would be performed serially, one PCI device after another,
> right?  But AFAIK .resume() would still be performed in parallel by a
> pool of kernel threads in current kernels.

OK, I see.  When both drivers are loaded near the same time via udev,
the firewire driver doesn't work correctly.  I don't know of anything
that serializes driver registration from dynamically loaded modules.

I wouldn't bother with the resume issue until after the load-time
issue is resolved because suspend/resume makes things harder to debug.
 You might be able to try loading/unloading the modules in a loop to
reproduce the issue in a single boot.

> From what I can tell, there is no driver bug but an unfortunate interaction
> between parts of a combo controller which in theory should functionally be
> totally unrelated.

That's plausible.  But I can't think of anything useful the PCI core
can do here.  If you can identify a hardware issue in the device, it's
possible you could write a quirk to work around it.  But after the
driver has called pci_enable_device() (the last point at which quirks
are applied), the PCI core is pretty much out of the picture.

Maybe something could be added to one or both drivers to look for
these devices and try deal with them?  I'm afraid I don't have any
good ideas here.
Comment 24 jennifer 2012-07-18 09:41:06 UTC
(In reply to comment #23)
> OK, I see.  When both drivers are loaded near the same time via udev,
> the firewire driver doesn't work correctly.  I don't know of anything
> that serializes driver registration from dynamically loaded modules.
> 
> I wouldn't bother with the resume issue until after the load-time
> issue is resolved because suspend/resume makes things harder to debug.
>  You might be able to try loading/unloading the modules in a loop to
> reproduce the issue in a single boot.
> 

We have tried to load/inload the firewire driver but it is unworkable.
This is the test process: boot into Linux -> rmmod firewire -> insmod firewire -> firewire device still unworkable
Comment 25 Stefan Richter 2012-07-18 13:28:04 UTC
(In reply to comment #0)
> This issue has happened in the O2 micro SD/MMC+1394 combo function chip +
> DELL
> platform Vans or Latitude E5520. We can find the same issue in J-micro
> USB+1394
> combo pci expresses card + DELL platform Vans or Latitude E5520.

Do you mean by "J-micro USB+1394 combo pci expresses card" the combo chip in these Dell laptops, or are you referring to a separate add-on card which features the same or similar combo chip like the controllers in the Dells?

If it is an add-on card:  I haven't found such a card by web search.  Is it not a released product yet?  Or could you provide me a link to a vendor or shop webpage about the card, so that I could look into getting one myself?  (A European source would work best for me.)
Comment 26 jennifer 2012-07-19 08:31:09 UTC
(In reply to comment #25)
> (In reply to comment #0)
> > This issue has happened in the O2 micro SD/MMC+1394 combo function chip +
> DELL
> > platform Vans or Latitude E5520. We can find the same issue in J-micro
> USB+1394
> > combo pci expresses card + DELL platform Vans or Latitude E5520.
> 
> Do you mean by "J-micro USB+1394 combo pci expresses card" the combo chip in
> these Dell laptops, or are you referring to a separate add-on card which
> features the same or similar combo chip like the controllers in the Dells?
> 
> If it is an add-on card:  I haven't found such a card by web search.  Is it
> not
> a released product yet?  Or could you provide me a link to a vendor or shop
> webpage about the card, so that I could look into getting one myself?  (A
> European source would work best for me.)

Yes, it is a separate add-on card. But I can't find a shop in European. This is the Taiwan's shop link.
http://buy.yahoo.com.tw/gdsale/gdsale.asp?gdid=1289116
UPTECH   UTE110 (1394+USB)

According to my test result, this issue happened in DELL E5520 platform only. If possible, maybe you should buy a DELL E5520 in the same time.
Comment 27 Stefan Richter 2012-07-27 19:00:10 UTC
On Jul 27 Ayan George wrote:
> Hey Stefan,
> 
> I'm sorry I haven't been responsive when it comes to this bug.  I just
> wanted to let you know that the workaround one of the OEM team members
> came up with is to reload the driver after a drive is connected.

The workaround could be implemented in the kernel at the cost of
respectively many lines of code:  Wrap all of the controller initialization
into a self-rearming worqueue job.  The job attempts to initialize the
controller, checks for regAccessFail, if yes, shuts down the controller
and requeues itself for e.g. 5 seconds later.  Additional care must be
taken about .suspend(), .resume(), and module_exit().

However, this would be such a gross hack --- there must be something
simpler than that.  Perhaps serialization against initialization of the
SDHCI part would be the key, but so far it is unclear how that could be
systematically tested and (if successful) be properly implemented.

> I hope that information helps determine the root cause of the problem.

This finding would be consistent with the earlier finding that it also
works if a FireWire device is already connected when the laptop is powered
up IIRC.  On the other hand, Jennifer wrote in
https://bugzilla.kernel.org/show_bug.cgi?id=43247#c24:

| >  You might be able to try loading/unloading the modules in a loop to
| > reproduce the issue in a single boot.
| 
| We have tried to load/inload the firewire driver but it is unworkable.
| This is the test process: boot into Linux -> rmmod firewire -> insmod
| firewire -> firewire device still unworkable

So I guess this was tested without anything connected.
Comment 28 jennifer 2012-07-30 07:52:51 UTC
(In reply to comment #27)
> | >  You might be able to try loading/unloading the modules in a loop to
> | > reproduce the issue in a single boot.
> | 
> | We have tried to load/inload the firewire driver but it is unworkable.
> | This is the test process: boot into Linux -> rmmod firewire -> insmod
> | firewire -> firewire device still unworkable
> 
> So I guess this was tested without anything connected.

Yes, this was tested without anything connected.
Comment 29 Stefan Richter 2013-09-08 15:24:57 UTC
Created attachment 107601 [details]
screenshot from Dell Latitude E5420, Ubuntu 12-04

Date: Tue, 3 Sep 2013 19:41:03 -0400
From: Frej Tulin <frej.tulin@gmail.com>
To: linux1394-devel@lists.sf.net
Subject: ubuntu 12.04 power management discrepance & crash failure

Hi

I'm running ubuntu 12.04 on a dell latitude E5420.

When the computer wakes up from sleep it displays several lines containing
"power management discrepancy", e.g.
[ 3944.693579] [drm:gem6_sanitize_pm] *ERROR* Power management discrepancy:
GEM6_RP_INTERRUPT_LIMITS expected 1a000000, was 12060000"
it also tells me to notify linux1394-devel@lists.sf.net, which is why i'm
writing here.

The computer also crashes regularly and nothing but a hard reboot helps. I
haven't noticed any pattern as to when it typically crashes. It seems to be
random.

I'm guessing the error message and the frequent crashing are connected.

Please let me know if you have any insight into this.

I'm attaching a picture I managed to snap of the error message screen. It
only ever up for about 1 second or less.

many thanks
/frej


-- 
Frej Tulin
Fred Cross lab
Rockefeller University
Comment 30 Peter Hurley 2014-02-26 22:52:46 UTC
(In reply to Bjorn Helgaas from comment #23)
> > From what I can tell, there is no driver bug but an unfortunate interaction
> > between parts of a combo controller which in theory should functionally be
> > totally unrelated.
> 
> That's plausible.  But I can't think of anything useful the PCI core
> can do here.  If you can identify a hardware issue in the device, it's
> possible you could write a quirk to work around it.  But after the
> driver has called pci_enable_device() (the last point at which quirks
> are applied), the PCI core is pretty much out of the picture.
> 
> Maybe something could be added to one or both drivers to look for
> these devices and try deal with them?  I'm afraid I don't have any
> good ideas here.

I think this may be a hardware bug with concurrent PCI Configuration transactions to different functions of the O2 multi-function device.

Either the PCI-to-PCI bridge is conflating different transactions to different functions on the same device or the device is.

When sdhci and firewire-ohci are probing simultaneously, the sdhci driver gets a bad BAR size (in addition to the firewire-ohci error):

sdhci: Secure Digital Host Controller Interface driver
sdhci: Copyright(c) Pierre Ossman
firewire_ohci: Register access failure - please notify linux1394-dev@lists.sf.net
firewire_ohci: Added fw-ohci device 0000:09:00.0, OHCI v1.10, 8 IR + 8 IT contexts, quirks 0x10
sdhci-pci 0000:09:00.1: SDHCI controller found [1217:8321] (rev 5)
sdhci-pci 0000:09:00.1: Invalid iomem size. You may experience problems.
mmc0: SDHCI controller on PCI [0000:09:00.1] using DMA

Would anyone with this hardware be willing to run an instrumentation patch I could supply?
Comment 31 Peter Hurley 2014-02-27 17:45:26 UTC
(In reply to Bjorn Helgaas from comment #23)
> > Jennifer and the other reporters of the issue most certainly all used
> > modular kernels, i.e. got the drivers loaded by udev.
> >
> > module_init() doesn't matter, but
> >   - .probe(),
> >   - .resume()
> > do.  So as you say, if the drivers were statically linked, then at least
> > .probe() would be performed serially, one PCI device after another,
> > right?  But AFAIK .resume() would still be performed in parallel by a
> > pool of kernel threads in current kernels.
> 
> OK, I see.  When both drivers are loaded near the same time via udev,
> the firewire driver doesn't work correctly.  I don't know of anything
> that serializes driver registration from dynamically loaded modules.

I think there is a way to serialize these probes:

1. load the pci-stub driver early to claim one of the PCI ids, eg. the firewire id 1217:13f7
2. create a udev rule that matches the other PCI id, eg. the SD host controller id 1217:8321, but runs a script that unbinds pci-stub and modprobes firewire-ohci

This may even be possible without pci-stub, and only using udev rules by matching the pci MODALIAS for one of the pci functions but doing nothing, like:

ACTION=="add", SUBSYSTEM=="pci", ENV{MODALIAS}=="pci:v000011C1d00005811*", RUN:=""

and then doing step 2 above (maybe in the same udev rules file):

ACTION=="add", DRIVER="sdhci", ATTR{idVendor}=="1217", ATTR{idProduct}=="8321", RUN+="/sbin/modprobe firewire-ohci"

This won't work for a built-in firewire-ohci but that's not very common,
and this doesn't fix resume.
Comment 32 Stefan Richter 2014-02-28 10:36:56 UTC
If concurrent initialization is the problem, could a simple

        if (dev->vendor == 0x1217 &&
            (dev->device == 0x11f7 || dev->device == 0x13f7))
                ssleep(7); /* 7 is lucky */

right at the top of firewire-ohci's pci_probe() and pci_resume() do the trick?  Or is it already too late at the time when PCI core calls the driver methods?

Is probing and PM resume parallelized in _all_ kernel configurations, or would this hack to be dependent on some CONFIG_XYZ variable to avoid a global stall by this hack?

Note You need to log in before you can comment on or make changes to this bug.