Bug 43176

Summary: Error during boot: sp5100_tco: mmio address 0xfec000f0 already in use
Product: Platform Specific/Hardware Reporter: Ralf (post+kernel)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: RESOLVED CODE_FIX    
Severity: normal CC: alan, arekm, bjorn, florian, jrnieder, kreed, mc74hc00, paulepanter, samspam
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.4-rc4 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Full output of dmesg with 3.4.0-rc4
dmesg output with 2.6.39 (incorrect configuration)
dmesg output when booting the 2.6.38 kernel: watchdog enabled
dmesg output when booting the 2.6.39 kernel: error from watchdog module
Proposed Patch
Proposed Patch V2
dmesg of linux 3.5+version 2 of the patch by Takahisa
Result of dmesg and /proc/iomem
Content of /proc/iomem on my laptop
Patch V3
Proposed Patch V4
Proposed Patch V5
[PATCH v6 1/2] sp5100_tco: Add module parameter `force_addr` to set MMIO address
[PATCH v6 2/2] sp5100_tco: Add SB8x0 chipset support
dmesg for Linux ubuntu 3.5.0-25-generic x86_64

Description Ralf 2012-04-29 14:34:52 UTC
Created attachment 73120 [details]
Full output of dmesg with 3.4.0-rc4

During boot on my HP Compaq 615 laptop, the kernel shows the following error message:

[   10.890011] sp5100_tco: mmio address 0xfec000f0 already in use

I originally noticed this with a Debian kernel, but a self-compiled vanilla 3.4-rc4 shows the same behaviour.

The bug was reported against Debian at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=638863, and Bjorn Helgaas wrote patches fixing the issue, which were discussed at http://thread.gmane.org/gmane.linux.kernel/1184383. However, no consensus was reached whether the fix is actually correct or not, so the problem persists.

I added the full output of "dmesg" as attachement. Please let me know if you need further information.
Comment 1 Ralf 2012-05-01 12:44:03 UTC
I just noticed that this actually is a regression:

While debugging another problem, I compiled old vanilla kernels, and the error message does not appear (but some others, which were fixed in the mean time, do). 2.6.39 is good, while 3.4-rc4 (see above) is bad.
I can run a bisect, but I'd appreciate a hint to which folder it should be restricted.

Kind regards,
Ralf
Comment 2 Ralf 2012-05-01 12:44:46 UTC
Created attachment 73138 [details]
dmesg output with 2.6.39
(incorrect configuration)
Comment 3 Ralf 2012-05-01 16:02:43 UTC
Comment on attachment 73138 [details]
dmesg output with 2.6.39
(incorrect configuration)

Sorry, it turned out watchdog support was disabled in that minimal testing configuration, so this log is obsolete.

However, after enabling watchdogs, the error still appears in 2.6.39 but is gone in 2.6.38 (I will attach dmesg logfiles).
Comment 4 Ralf 2012-05-01 16:04:28 UTC
Created attachment 73141 [details]
dmesg output when booting the 2.6.38 kernel: watchdog enabled
Comment 5 Ralf 2012-05-01 16:05:36 UTC
Created attachment 73142 [details]
dmesg output when booting the 2.6.39 kernel: error from watchdog module
Comment 6 Paul Menzel 2012-06-30 08:58:55 UTC
(In reply to comment #0)
> Created an attachment (id=73120) [details]
> Full output of dmesg
> 
> During boot on my HP Compaq 615 laptop, the kernel shows the following error
> message:
> 
> [   10.890011] sp5100_tco: mmio address 0xfec000f0 already in use

I can confirm this error with the ASUS M2A-VM, ASRock A780FullHD, which both show the same mmio address

    SP5100 TCO timer: SP5100 TCO WatchDog Timer Driver v0.01
    SP5100 TCO timer: mmio address 0xfec000f0 already in use

and ASRock E350M1.

    SP5100 TCO timer: mmio address 0xbafe00 already in use

> I originally noticed this with a Debian kernel, but a self-compiled vanilla
> 3.4-rc4 shows the same behaviour.

Debian Sid/unstable is also used on these systems.

> The bug was reported against Debian at
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=638863, and Bjorn Helgaas
> wrote patches fixing the issue, which were discussed at
> http://thread.gmane.org/gmane.linux.kernel/1184383. However, no consensus was
> reached whether the fix is actually correct or not, so the problem persists.
> 
> I added the full output of "dmesg" as attachement. Please let me know if you
> need further information.

Ralf, thanks for working on that for over half a year.
Comment 7 Paul Menzel 2012-06-30 09:07:48 UTC
I am adding Bjorn to CC since he worked on the patches.
Comment 8 Paul Menzel 2012-06-30 09:14:28 UTC
Doing

    git log drivers/pci/hotplug/shpchp_hpc.c

Bjorns commit

    commit 4cac2eb158c6da0c761689345c6cc5df788a6292
    Author: Bjorn Helgaas <bhelgaas@google.com>
    Date:   Tue Aug 23 10:16:43 2011 -0600

        PCI hotplug: shpchp: don't blindly claim non-AMD 0x7450 device IDs
        
        Previously we claimed device ID 0x7450, regardless of the vendor, which is
        clearly wrong.  Now we'll claim that device ID only for AMD.
        
        I suspect this was just a typo in the original code, but it's possible this
        change will break shpchp on non-7450 AMD bridges.  If so, we'll have to fix
        them as we find them.
        
        Reference: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=638863
        Reported-by: Ralf Jung <ralfjung-e@gmx.de>
        Cc: Joerg Roedel <joerg.roedel@amd.com>
        Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
        Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

references the Debian report. But it looks like this did not fix it since it is in Linux 3.1

    $ git describe 4cac2eb158c6da0c761689345c6cc5df788a6292
    v3.1-7185-g4cac2eb

but Linux 3.2 is running on my systems (and Ralf uses tested Linux 3.4) and it exposes this problem.
Comment 9 Paul Menzel 2012-06-30 09:35:31 UTC
(In reply to comment #3)
> (From update of attachment 73138 [details])

[…]

> However, after enabling watchdogs, the error still appears in 2.6.39 but is
> gone in 2.6.38 (I will attach dmesg logfiles).

As a consequence the changes in `drivers/pci/hotplug/shpchp_hpc.c` are not responsible for this error because the last relevant change was in Linux 2.6.37.

    $ git describe e24dcbef93dbbf529fbedfc6ce8ab22d2cef35f0
    v2.6.36-rc4-177-ge24dcbe
    $ git describe --contains e24dcbef
    v2.6.37-rc1~164^2~3

Maybe Bjorn can give you a hint what directory you could bisect. Maybe it is related to PCI. On the other hand, bisecting the whole Linux kernel should only give one or two more iterations.
Comment 10 Paul Menzel 2012-06-30 09:37:46 UTC
(In reply to comment #8)
> Doing
> 
>     git log drivers/pci/hotplug/shpchp_hpc.c
> 
> Bjorns commit
> 
>     commit 4cac2eb158c6da0c761689345c6cc5df788a6292
>     Author: Bjorn Helgaas <bhelgaas@google.com>
>     Date:   Tue Aug 23 10:16:43 2011 -0600
> 
>         PCI hotplug: shpchp: don't blindly claim non-AMD 0x7450 device IDs
> 
>         Previously we claimed device ID 0x7450, regardless of the vendor,
>         which
> is
>         clearly wrong.  Now we'll claim that device ID only for AMD.
> 
>         I suspect this was just a typo in the original code, but it's
>         possible
> this
>         change will break shpchp on non-7450 AMD bridges.  If so, we'll have
>         to
> fix
>         them as we find them.
> 
>         Reference: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=638863
>         Reported-by: Ralf Jung <ralfjung-e@gmx.de>
>         Cc: Joerg Roedel <joerg.roedel@amd.com>
>         Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>         Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> 
> references the Debian report. But it looks like this did not fix it since it
> is
> in Linux 3.1
> 
>     $ git describe 4cac2eb158c6da0c761689345c6cc5df788a6292
>     v3.1-7185-g4cac2eb

Just a correction, that commit is only in Linux 3.2.

    $ git describe --contains 4cac2eb158c6da0c761689345c6cc5df788a6292
    v3.2-rc3~3^2

This does not change the next statement.

> but Linux 3.2 is running on my systems (and Ralf uses tested Linux 3.4) and
> it
> exposes this problem.
Comment 11 Ralf 2012-06-30 10:26:23 UTC
(In reply to comment #8)
> Bjorns commit
> 
>     commit 4cac2eb158c6da0c761689345c6cc5df788a6292
>     Author: Bjorn Helgaas <bhelgaas@google.com>
>     Date:   Tue Aug 23 10:16:43 2011 -0600
> 
>         PCI hotplug: shpchp: don't blindly claim non-AMD 0x7450 device IDs
This commit fixes another error which I reported in the same Debian bug report (I didn't want to open a bug report for each kernel message I see on startup - maybe I should, but then I didn't know whether they were related or not). It successfully fixed

[    5.083819] shpchp 0000:00:01.0: Cannot reserve MMIO region

but obviously that's unrelated to the current problem.
Comment 12 Ralf 2012-07-03 11:47:40 UTC
I started some bisecting, but I'm having a lot of trouble. Compilation is often failing with this error

fs/pstore/inode.c:253:2: error: unknown field ‘get_sb’ specified in initializer
fs/pstore/inode.c:253:2: warning: initialization makes integer from pointer without a cast [enabled by default]
fs/pstore/inode.c:253:2: warning: (near initialization for ‘pstore_fs_type.fs_flags’) [enabled by default]
fs/pstore/inode.c:253:2: error: initializer element is not computable at load time
fs/pstore/inode.c:253:2: error: (near initialization for ‘pstore_fs_type.fs_flags’)

Here's the "bisect log" of my results so far:

# bad: [6221f222c0ebf1acdf7abcf927178f40e1a65e2a] Linux 2.6.39-rc2
git bisect bad 6221f222c0ebf1acdf7abcf927178f40e1a65e2a
# good: [521cb40b0c44418a4fd36dc633f575813d59a43d] Linux 2.6.38
git bisect good 521cb40b0c44418a4fd36dc633f575813d59a43d
# bad: [0ce790e7d736cedc563e1fb4e998babf5a4dbc3d] Linux 2.6.39-rc1
git bisect bad 0ce790e7d736cedc563e1fb4e998babf5a4dbc3d
# skip: [1d2a1959fe534279cf37aba20b08c24c20840e52] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into sh-latest
git bisect skip 1d2a1959fe534279cf37aba20b08c24c20840e52
# good: [0998e1db988658bb5ca660b4d929e1d2e7e8473e] staging: xgifb: vb_util: include the .h file
git bisect good 0998e1db988658bb5ca660b4d929e1d2e7e8473e
# skip: [c55d267de274d308927b60c3e740c1a826832317] Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
git bisect skip c55d267de274d308927b60c3e740c1a826832317
# good: [baf075eca42f217e8d297914ed6fecfd2452a0e4] [media] drivers/media/rc/Kconfig: use tabs, instead of spaces
git bisect good baf075eca42f217e8d297914ed6fecfd2452a0e4
Comment 13 Takahisa Tanaka 2012-07-15 01:31:43 UTC
Created attachment 75381 [details]
Proposed Patch

Hi, Ralf,

Back to your original problem, " sp5100_tco: mmio address 0xfec000f0 already in use", my understanding is this symptom is caused by following reason.

> I can confirm this error with the ASUS M2A-VM, ASRock A780FullHD, which both
> show the same mmio address
> 
>     SP5100 TCO timer: SP5100 TCO WatchDog Timer Driver v0.01
>     SP5100 TCO timer: mmio address 0xfec000f0 already in use

The mmio address 0xfec000f0 is a typically base address of watchdog timer(*). As far as I know、almost all BIOS of PC with SP5100/SB7x0/SB8x0 chipsets seem to assign 0xfec000f0 to chipset as a base address of watchdog timer. However, this address conflicts with IO-APIC mmio address, and http://thread.gmane.org/gmane.linux.kernel/1184383 patch isn't merged into upstream yet.
* Refer to http://support.amd.com/us/Embedded_TechDocs/44415.pdf
  AMD SP5100 BIOS Developer's Guide(Page 41).


>  and ASRock E350M1.
> 
>     SP5100 TCO timer: mmio address 0xbafe00 already in use

The sp5100_tco supports SP5100/SB7x0 chipset, doesn't support SB8x0 chipset, because sp5100_tco doesn't know that the offset address was changed from SB8x0.

The offset address of SP5100 and SB7x0 chipsets are as follows, quotes from the AMD SB700/710/750 Register Reference Guide(Page 164) and the AMD SP5100 Register Reference Guide(Page 166).

  WatchDogTimerControl 69h
  WatchDogTimerBase0   6Ch
  WatchDogTimerBase1   6Dh
  WatchDogTimerBase2   6Eh
  WatchDogTimerBase3   6Fh

The offset address of SB8x0 chipsets are as follows, quotes from AMD SB800-Series Southbridges Register Reference Guide(Page 147).

  WatchDogTimerEn      48h
  WatchDogTimerConfig  4Ch

So, In the case of SB8x0 chipset, sp5100_tco reads meaningless mmio address(0xbafe00) from wrong offset address.


I have a PC with SB850 chipset. I was also suffering from the same symptom. In order to solve above symptom, I have modified the code of sp5100_tco as follows. 
  1. As is the case with via_wdt.c, sp5100_tco reprogramings the base address
     of watchdog timer to free mmio address space using the allocate_resource()
     function.

  2. In the case of SB8x0 chipset, sp5100_tco accesses the correct offset address.

I have confirmed attached patch fixed the symptom.

You may apply the patch on kernel 3.5-rc6.
Let us know if this patch fix your symptom.


Thanks,
Takahisa
Comment 14 Ralf 2012-07-15 10:28:43 UTC
(In reply to comment #13)
> You may apply the patch on kernel 3.5-rc6.
> Let us know if this patch fix your symptom.
Is this safe to do? Some months ago, someone already suggested to re-program the base address, but then he said the system could react in unforeseen bad ways when doing so.
This laptop is my main work machine, I really need it daily, so if there's a chance it might be broken I'd rather wait some weeks (I plan to buy a new laptop soon). Or maybe this is a stupid question and nothing can happen, I am not at all into this hardware programming stuff ;-)
Comment 15 Takahisa Tanaka 2012-07-16 02:32:24 UTC
(In reply to comment #14)
Thank you for your quick reply.

Sorry, I'm not familiar with chipset specification. But, I would like to use watchdog timer on my PC.

> Is this safe to do? Some months ago, someone already suggested to
> re-program the base address, but then he said the system could
> react in unforeseen bad ways when doing so.

I'm using this patch on my PC(ASUS M4A89GTD-PRO/USB3+Fedora17/x86_64),and It seems to work fine. But, I might not notice a problem...


> I am not at all into this hardware programming stuff ;-)

I read the document of SP5100/SB7x0/SB8x0 over again. I guess that AcpiMmioEn(SB8x0) and SBResourceMMIO_Base(SP5100/SB7x0) can be used. If these registers can be used, hardware programming is unnecessary.

Has someone already tried this method? 

Thanks,
Comment 16 Paul Menzel 2012-07-22 08:36:13 UTC
(In reply to comment #15)
> (In reply to comment #14)
> Thank you for your quick reply.
> 
> Sorry, I'm not familiar with chipset specification. But, I would like to use
> watchdog timer on my PC.

Is there a use case besides for a server? I would be interested in it because I just saw the error message in the Linux kernel ring buffer and wondered if it worked what would I use it for.

> > Is this safe to do? Some months ago, someone already suggested to
> > re-program the base address, but then he said the system could
> > react in unforeseen bad ways when doing so.
> 
> I'm using this patch on my PC(ASUS M4A89GTD-PRO/USB3+Fedora17/x86_64),and It
> seems to work fine. But, I might not notice a problem...

Yeah, I also found your report in the RedHat Bugzilla database [1].

> > I am not at all into this hardware programming stuff ;-)
> 
> I read the document of SP5100/SB7x0/SB8x0 over again. I guess that
> AcpiMmioEn(SB8x0) and SBResourceMMIO_Base(SP5100/SB7x0) can be used. If these
> registers can be used, hardware programming is unnecessary.
> 
> Has someone already tried this method? 

Yes, I did and it worked, that means no error message.

Tested-by: Paul Menzel <paulepanter@users.sourceforge.net>

ASRock A780FullHD <http://www.asrock.com/mb/overview.asp?model=a780fullhd>

Your patch on the following commit (3.5-rc7).

commit d75e2c9ad97c40f6d2cdaf2e16381b2034d19a6f
Merge: 9351737 85a053f
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Fri Jul 20 12:02:02 2012 -0700

    Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

$ dmesg
[   19.427821] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.02
[   19.427904] sp5100_tco: PCI Revision ID: 0x3a
[   19.427940] sp5100_tco: Using 0xfec00400 for watchdog base address
[   19.427950] sp5100_tco: Watchdog reboot not detected
[   19.428044] sp5100_tco: initialized (0xf80a0400). heartbeat=60 sec (nowayout=0)

Thank you so much! Though I did not test if it actually works if there is some hang.

Takahisa, now the only thing missing is, that you create a “correct” patch with a commit message and all necessary information which you can more or less copy from your comments in this thread. Also add my comments above about my system.

More or less you only have to do, `git commit --amend` and edit the commit message. Then `git format-patch -1` and you should be able to use that file. Please attach it here *and* send it to the mailing list so that more Linux kernel folks can comment on it.

Thank you again!

Do you need more information about my system?

[1] https://bugzilla.redhat.com/show_bug.cgi?id=710705
Comment 17 Takahisa Tanaka 2012-07-23 14:11:55 UTC
Created attachment 75951 [details]
Proposed Patch V2

Hi Paul,

Thank you for your response!

> > Has someone already tried this method? 
> 
> Yes, I did and it worked, that means no error message.

Thank you for testing. Sorry, but I have wrote a patch that use AcpiMMioEn and SBResourceMMIO_Base, and the patch has improved so that it may not re-programming to chipset as much as possible.

If you don't mind, Could you test the new patch. :)

I have tested M4A89GTD-PRO/USB3 and DL165G7. The results of tests were as follows.

 TestCase1: M4A89GTD-PRO/USB3(SB850 chipset)
   Result: OK, sp5100_tco can get 0xfed80001 from AcpiMmioEn(PM_Reg:24h). 
           The watchdog timer works fine.

   # dmesg | grep 5100
   [   53.047155] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
   [   53.139216] sp5100_tco: PCI Revision ID: 0x41
   [   53.139240] sp5100_tco: Watchdog Timer Base Address = 0xfec000f0
   [   53.139255] sp5100_tco: SBResource_MMIO address = 0xfed80001
   [   53.139270] sp5100_tco: Using 0xfed80b00 for watchdog base address
   [   53.139287] sp5100_tco: Watchdog reboot not detected
   [   53.139362] sp5100_tco: initialized (0xffffc900117fcb00). heartbeat=60 sec (nowayout=0)
   # echo a > /dev/watchdog
    ... after 60sec ... reboot ...

 TestCase2: DL165G7(SP5100 chipset)
   Result: OK, BIOS of DL165G7 wasn't enabled SBResource_MMIO(PCI_Reg:9Ch).
           Read value of SBResource_MMIO register is 0x00000000... 
           But, the watchdog timer works fine by using allocate_resource().

   # dmesg | grep 5100
   [   10.120017] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
   [   10.123229] sp5100_tco: PCI Revision ID: 0x3d
   [   10.124053] sp5100_tco: Watchdog Timer Base Address = 0xfec000f0
   [   10.124204] sp5100_tco: SBResource_MMIO address = 0x00000000
   [   10.124363] sp5100_tco: Allocated resource address = 0xfec00400
   [   10.124540] sp5100_tco: Using 0xfec00400 for watchdog base address
   [   10.124707] sp5100_tco: Watchdog reboot not detected
   [   10.126983] sp5100_tco: initialized (0xffffc90000370400). heartbeat=60 sec (nowayout=0)
   # echo a > /dev/watchdog
    ... after 60sec ... reboot ...

The 'SB800 BIOS Developer's Guide' specifies the recommended address(0xfed80000) of the AcpiMmioEn register. 'SP5100 BIOS Developer's Guide' doesn't specifies the recommended address of the SBResource_MMIO_Base address. So, I think that PC with SB8x0(or later) can use AcpiMmioEn register, and doesn't need to program a chipset.

I will attach the patch. The patch can apply to v3.5-rc1(or later). 
Processing of my patch is as follows.

  First, Get the watchdog base address from indirect I/O(0xCD6/0xCD7).
   1) This address is used if this address hasn't conflicted.
      But, Currently, this address conflicts the IOAPIC register map.
      So, progress to the next step.

  Next, Get the SBResource_MMIO base address from AcpiMmioEN or SBResource_MMIO register.
   1) This address is used if these register has enabled by BIOS.
   2) This address is used if this address hasn't conflicted. 
   3) If above condition isn't true, progress to the next step.

  Last, Get free mmio address from allocate_resource(), and, re-programming 
  the got address to watchdog timer register(PM_Reg: 44h). 


I think that It's OK to re-programming the watchdog timer base address during a disabled watchdog. I will post this patch to LKML, and ask for comment from Linux kernel folks. 

Thanks, Paul!


Regards,
Takahisa,
Comment 18 Paul Menzel 2012-07-25 09:32:59 UTC
(In reply to comment #16)

[…]

> Yes, I did and it worked, that means no error message.
> 
> Tested-by: Paul Menzel <paulepanter@users.sourceforge.net>
> 
> ASRock A780FullHD <http://www.asrock.com/mb/overview.asp?model=a780fullhd>
> 
> Your patch on the following commit (3.5-rc7).
> 
> commit d75e2c9ad97c40f6d2cdaf2e16381b2034d19a6f
> Merge: 9351737 85a053f
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Fri Jul 20 12:02:02 2012 -0700
> 
>     Merge branch 'upstream' of
> git://git.linux-mips.org/pub/scm/ralf/upstream-linus
> 
> $ dmesg
> [   19.427821] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.02
> [   19.427904] sp5100_tco: PCI Revision ID: 0x3a
> [   19.427940] sp5100_tco: Using 0xfec00400 for watchdog base address
> [   19.427950] sp5100_tco: Watchdog reboot not detected
> [   19.428044] sp5100_tco: initialized (0xf80a0400). heartbeat=60 sec
> (nowayout=0)
> 
> Thank you so much! Though I did not test if it actually works if there is
> some
> hang.

I do not know if I missed that or not. With your (old) patch applied on

    commit 28a33cbc24e4256c143dce96c7d93bf423229f92
    Author: Linus Torvalds <torvalds@linux-foundation.org>
    Date:   Sat Jul 21 13:58:29 2012 -0700

        Linux 3.5

I get one more line.

        $ dmesg | grep -i sp5100_tco
        [   15.400973] calling  sp5100_tco_init_module+0x0/0x1000 [sp5100_tco] @ 521
        [   15.400980] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.02
        [   15.401057] sp5100_tco: PCI Revision ID: 0x3a
        [   15.401089] sp5100_tco: Using 0xfec00400 for watchdog base address
        [   15.401099] sp5100_tco: Watchdog reboot not detected
        [   15.401538] sp5100_tco: initialized (0xf8052400). heartbeat=60 sec (nowayout=0)
        [   15.401554] initcall sp5100_tco_init_module+0x0/0x1000 [sp5100_tco] returned 0 after 557 usecs

[…]
Comment 19 Paul Menzel 2012-07-25 10:58:50 UTC
(In reply to comment #17)
> Created an attachment (id=75951) [details]
> Proposed Patch V2

[…]

> > > Has someone already tried this method? 
> > 
> > Yes, I did and it worked, that means no error message.
> 
> Thank you for testing. Sorry, but I have wrote a patch that use AcpiMMioEn
> and
> SBResourceMMIO_Base, and the patch has improved so that it may not
> re-programming to chipset as much as possible.

I do not know what that means, but it sounds nice. ;-)

> If you don't mind, Could you test the new patch. :)

Sure.

> I have tested M4A89GTD-PRO/USB3 and DL165G7. The results of tests were as
> follows.

[…]

Again tested this on ASRock A780FullHD and it worked fine.

        $ dmesg | grep sp5100_tco
        [   30.089499] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
        [   30.089580] sp5100_tco: PCI Revision ID: 0x3a
        [   30.089603] sp5100_tco: Watchdog Timer Base Address = 0xfec000f0
        [   30.089608] sp5100_tco: SBResource_MMIO address = 0x0000
        [   30.089612] sp5100_tco: Allocated resource address = 0xfec00400
        [   30.089639] sp5100_tco: Using 0xfec00400 for watchdog base address
        [   30.089649] sp5100_tco: Watchdog reboot not detected
        [   30.089711] sp5100_tco: initialized (0xf806c400). heartbeat=60 sec (nowayout=0)

> The 'SB800 BIOS Developer's Guide' specifies the recommended
> address(0xfed80000) of the AcpiMmioEn register. 'SP5100 BIOS Developer's
> Guide'
> doesn't specifies the recommended address of the SBResource_MMIO_Base
> address.
> So, I think that PC with SB8x0(or later) can use AcpiMmioEn register, and
> doesn't need to program a chipset.
> 
> I will attach the patch. The patch can apply to v3.5-rc1(or later).

Again I tested this on the following commit.

    commit 28a33cbc24e4256c143dce96c7d93bf423229f92
    Author: Linus Torvalds <torvalds@linux-foundation.org>
    Date:   Sat Jul 21 13:58:29 2012 -0700

        Linux 3.5

Testing this with

    $ echo a | sudo tee /dev/watchdog 

resulted in a reset system after 60 seconds.

    [ 8166.398895] sp5100_tco: Unexpected close, not stopping watchdog!

> Processing of my patch is as follows.
> 
>   First, Get the watchdog base address from indirect I/O(0xCD6/0xCD7).
>    1) This address is used if this address hasn't conflicted.
>       But, Currently, this address conflicts the IOAPIC register map.
>       So, progress to the next step.
> 
>   Next, Get the SBResource_MMIO base address from AcpiMmioEN or
>   SBResource_MMIO
> register.
>    1) This address is used if these register has enabled by BIOS.
>    2) This address is used if this address hasn't conflicted. 
>    3) If above condition isn't true, progress to the next step.
> 
>   Last, Get free mmio address from allocate_resource(), and, re-programming 
>   the got address to watchdog timer register(PM_Reg: 44h). 

Please put that and my Tested-by line into the commit message already so that developers do not have to go through this Bugzilla thread with all its comments.

> I think that It's OK to re-programming the watchdog timer base address during
> a
> disabled watchdog. I will post this patch to LKML, and ask for comment from
> Linux kernel folks.

Nice. Also do not forget to ask about if you should send that to stable@kernel.org too.

Ralf, any chance that you test this too?
Comment 20 Ralf 2012-07-25 17:36:13 UTC
Created attachment 76081 [details]
dmesg of linux 3.5+version 2 of the patch by Takahisa

(In reply to comment #19)
> Ralf, any chance that you test this too?

No luck here: I tested the V2 patch based on linux v3.5, but it's still giving the MMIO error on boot:

[    9.950076] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
[    9.950162] sp5100_tco: PCI Revision ID: 0x3a
[    9.950187] sp5100_tco: Watchdog Timer Base Address = 0xfec000f0
[    9.950193] sp5100_tco: SBResource_MMIO address = 0x0000
[    9.950198] sp5100_tco: MMIO allocation failed

The full dmesg log is attached.
Comment 21 Takahisa Tanaka 2012-07-26 13:48:30 UTC
(In reply to comment #19)

Hi Paul,

Thank you for testing and reply!
I understood your advice.


Thanks,
Comment 22 Takahisa Tanaka 2012-07-26 13:49:42 UTC
(In reply to comment #20)
> Created an attachment (id=76081) [details]
> dmesg of linux 3.5+version 2 of the patch by Takahisa
> 
> (In reply to comment #19)
> > Ralf, any chance that you test this too?
> 
> No luck here: I tested the V2 patch based on linux v3.5, but it's still
> giving
> the MMIO error on boot:
> 
> [    9.950076] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
> [    9.950162] sp5100_tco: PCI Revision ID: 0x3a
> [    9.950187] sp5100_tco: Watchdog Timer Base Address = 0xfec000f0
> [    9.950193] sp5100_tco: SBResource_MMIO address = 0x0000
> [    9.950198] sp5100_tco: MMIO allocation failed
> 
> The full dmesg log is attached.

Hi Ralf,

Thank you for testing and information!
I have investigated the attached log, and I found the curious message. 

   [    0.000000] Booting paravirtualized kernel on bare hardware

I haven't tested the paravirtualized kernel yet. Please give me a few days to investigate about it.


Thanks,
Comment 23 Ralf 2012-07-26 16:30:09 UTC
(In reply to comment #22)
> Hi Ralf,
> 
> Thank you for testing and information!
> I have investigated the attached log, and I found the curious message. 
> 
>    [    0.000000] Booting paravirtualized kernel on bare hardware
> 
> I haven't tested the paravirtualized kernel yet. Please give me a few days to
> investigate about it.
I used "make localmodconfig" to obtain my .config - maybe this option got set because it is set by default for Debian Kernels (I am running Debian testing). I am currently re-compiling the kernel without support for running in a hypervisor and will let you know whether that makes any difference (but probably I won't be able to boot into the new kernel before tomorrow).
Comment 24 Takahisa Tanaka 2012-07-27 12:37:58 UTC
(In reply to comment #23)
> I used "make localmodconfig" to obtain my .config - maybe this option got set
> because it is set by default for Debian Kernels (I am running Debian
> testing).
> I am currently re-compiling the kernel without support for running in a
> hypervisor and will let you know whether that makes any difference (but
> probably I won't be able to boot into the new kernel before tomorrow).

Hi Ralf,

Sorry, the paravirtualized kernel didn't cause the problem. My result of test and 'cat /proc/iomem' is attached.

I am investigating why the allocate_resource() fails. I guess that the memory map region is run out at this time. 

Could you post the result of 'cat /proc/iomem' on your PC?


Thanks,
Comment 25 Takahisa Tanaka 2012-07-27 12:43:18 UTC
Created attachment 76211 [details]
Result of dmesg and /proc/iomem

This is a log when allocate_resource() is successful.
Comment 26 Ralf 2012-07-27 15:38:08 UTC
Created attachment 76241 [details]
Content of /proc/iomem on my laptop

(In reply to comment #24)
> Sorry, the paravirtualized kernel didn't cause the problem. My result of test
> and 'cat /proc/iomem' is attached.
Indeed, compiling without support for running in a hypervisor did not change anything.

> I am investigating why the allocate_resource() fails. I guess that the memory
> map region is run out at this time. 
> 
> Could you post the result of 'cat /proc/iomem' on your PC?
You can find it attached.
Thanks a lot for your efforts to solve this problem!
Comment 27 Takahisa Tanaka 2012-07-29 12:18:46 UTC
Hi Ralf,

The watchdog timer MMIO region of your PC was reserved from 0xf0000000 to 0xffffffff by PCI root bridge, and there is no free MMIO region. Hence allocate_resource() function can't allocate the MMIO address for watchdog timer.

   [    0.259157] pci_root PNP0A03:00: host bridge window [mem 0xf0000000-0xfffdffff]
   [    0.259159] pci_root PNP0A03:00: host bridge window [mem 0xffe00000-0xffffffff]
   [    0.259164] pci_root PNP0A03:00: host bridge window expanded to [mem 0xf0000000-0xffffffff]; [mem 0xffe00000-0xffffffff] ignored

The watchdog timer MMIO region of my PC(and Paul's PC) was reserved from 0xf0000000 to 0xfebfffff by PCI root bridge, and there is free MMIO region. Hence allocate_resource() function is successful.

   [    0.249872] pci_bus 0000:00: root bus resource [mem 0xf0000000-0xfebfffff]

I wnated to inform you of this ASAP.


Takahisa,
Comment 28 Ralf 2012-07-29 14:24:42 UTC
Hi Takahisa,

thanks for the update, however I have no idea what this means ;-) . I am not into this hardware programming at all.
Is there any chance of the bug being fixed, or does this means there's a more fundamental problem?

Kind regards,
Ralf
Comment 29 Takahisa Tanaka 2012-07-30 14:09:06 UTC
(In reply to comment #28)
Hi Ralf,

> Is there any chance of the bug being fixed, or does this means there's a more
> fundamental problem?

The kernel can't allocate MMIO address for watchdog timer on your PC, because the PCI Root device of your PC occupies all MMIO region address(0xf0000000-0xffffffff). That's why my patch doesn't work on your PC. I think that this is a problem of hardware vendor(or BIOS?) dependence. Sorry, I have no idea why the PCI Root device occupies all MMIO regions.

Did you test the following patch on your PC? 
http://thread.gmane.org/gmane.linux.kernel/1184383

This patch avoids IOAPIC and watchdog timer MMIO address conflict. However, in the case of your PC, Since the PCI Root device occupies MMIO region, the kernel can't assign addresses to watchdog timer. Therefore, I guess that this patch doesn't have an effect in your PC.

Although unfortunately my patch wasn't able to correspond to your PC, I will post my patch to LKML, because my patch was able to work fine on Paul's PC and my PC. However, My patch is using check_mem_region(). I have noticed that check_mem_region() is deprecated, today. Hence, I need to modify current patch...


Regards,
Takahisa
Comment 30 Ralf 2012-07-30 15:15:40 UTC
(In reply to comment #29)
> Did you test the following patch on your PC? 
> http://thread.gmane.org/gmane.linux.kernel/1184383
> 
> This patch avoids IOAPIC and watchdog timer MMIO address conflict. However,
> in
> the case of your PC, Since the PCI Root device occupies MMIO region, the
> kernel
> can't assign addresses to watchdog timer. Therefore, I guess that this patch
> doesn't have an effect in your PC.
These are exactly the patches I mentioned in my original report, aren't they? Bjorn wrote them in response to my original report, and they do fix the problem. The watchdog actually works when applying them (or, it did a year ago).
Comment 31 Takahisa Tanaka 2012-08-01 13:15:34 UTC
(In reply to comment #30)
Hi Ralf,

> These are exactly the patches I mentioned in my original report, aren't they?

Yes. 

> The watchdog actually works when applying them (or, it did a year ago).

I already have verified Bjorn's patch myself. Bjorn's patch works fine on my test environment(ASUS M4A89GTD-PRO/USB3 and DL165G7), because these PC allocate 0xf0000000-0xfebfffff as MMIO address of PCI Root Bridge. Hence, 0xfec00f0-0xfec00f7 for watchdog timer which doesn't conflict with IO-APIC MMIO address is available.

However, If PC which allocate 0xf0000000-0xffffffff as MMIO address of PCI Root Bridge, such as your PC, 0xfec00f0-0xfec00f7 for watchdog timer conflicts with 0xf0000000-0xffffffff as MMIO address of PCI Root Bridge, and kernel can't assign 0xfec00f0-0xfec00f7 as a iomem resource. Therefor, I thought that Bjorn's patch doesn't have an effect in *your PC*. Unfortunately due to lack of hardware, I can't verify this case myself.


Regards,
Takahisa
Comment 32 Ralf 2012-08-01 15:05:06 UTC
(In reply to comment #31)
> However, If PC which allocate 0xf0000000-0xffffffff as MMIO address of PCI
> Root
> Bridge, such as your PC, 0xfec00f0-0xfec00f7 for watchdog timer conflicts
> with
> 0xf0000000-0xffffffff as MMIO address of PCI Root Bridge, and kernel can't
> assign 0xfec00f0-0xfec00f7 as a iomem resource. Therefor, I thought that
> Bjorn's patch doesn't have an effect in *your PC*. Unfortunately due to lack
> of
> hardware, I can't verify this case myself.
I verified it, on top of some rc of Linux 3.4. The patch works on my hardware. Please don't ask me why, though ;-)
Would you like me to verify it again on top of Linux 3.5?
Comment 33 Takahisa Tanaka 2012-08-01 15:32:24 UTC
(In reply to comment #32)

> I verified it, on top of some rc of Linux 3.4. The patch works on my
> hardware.
> Please don't ask me why, though ;-)
> Would you like me to verify it again on top of Linux 3.5?

I thought that request_mem_region_exclusive() is fail... But I was wrong.
I'm sorry to bother you.


Regards,
Takahisa
Comment 34 Takahisa Tanaka 2012-08-09 13:44:30 UTC
Hi Ralf,


I have modified the patch V3 so that your PC could use watchdog timer. 
However, in the case of your PC, You have to specify the MMIO address directly
to sp5100_tco driver parameter(force_addr). Please see the comment of the
patch V3 for details. 

The force_addr parameter may be dangerous if the wrong MMIO address is
specified, and your laptop is main work machine! Therefore It's up to you
whether you try to test or not.

I am going to post this patch to LKML in order to get comment from Linux
kernel folks.


Regards,
Takahisa
Comment 35 Takahisa Tanaka 2012-08-09 13:45:42 UTC
Created attachment 77221 [details]
Patch V3
Comment 36 Takahisa Tanaka 2012-08-09 14:16:54 UTC
Hi Paul,

I'm sorry to bother you again. Once again, Could you test the patch V3?
In the case of your PC, Since there is no re-programming of chipset,
it can test safely. 

Regards,
Takahisa
Comment 37 Paul Menzel 2012-08-09 15:33:40 UTC
Dear Takahisa,


again thank you for your work on this!


(In reply to comment #34)

[…]

> I am going to post this patch to LKML in order to get comment from Linux
> kernel folks.

Could you please add myself (and Ralf?) to the CC list of that message.


Thanks,

Paul
Comment 38 Paul Menzel 2012-08-09 15:34:52 UTC
Dear Takahisa,


(In reply to comment #36)

> I'm sorry to bother you again. Once again, Could you test the patch V3?
> In the case of your PC, Since there is no re-programming of chipset,
> it can test safely.

I will report back tomorrow.


Thanks,

Paul
Comment 39 Ralf 2012-08-09 17:46:37 UTC
(In reply to comment #34)
> I have modified the patch V3 so that your PC could use watchdog timer. 
> However, in the case of your PC, You have to specify the MMIO address
> directly
> to sp5100_tco driver parameter(force_addr). Please see the comment of the
> patch V3 for details. 
> 
> The force_addr parameter may be dangerous if the wrong MMIO address is
> specified, and your laptop is main work machine! Therefore It's up to you
> whether you try to test or not.
> 
> I am going to post this patch to LKML in order to get comment from Linux
> kernel folks.
Thanks a lot for your efforts - however, I hope you understand that I will not test dangerous patches on my machine. Also, since I don't need the watchdog (for me, the bug is mainly about the error on boot), a fix which requires manually configuring anything is not really a fix for me. I mean, I can live with that message, I just think that it's a bug and usually, a kernel should boot without any error message without using special configuration (I could also disable watchdog support to get rid of it, for example). I understand from this bug though that maybe the hardware is broken in some way that makes this impossible (though, somehow, Bjorn wrote patches fixing the issue), but I understand very little of hardware programming.
Nevertheless, if you or someone comes up with a patch that should work fully automatic, and that's (reasonably) safe to test, I am more than willing to do so.
Comment 40 Takahisa Tanaka 2012-08-11 02:54:28 UTC
(In reply to comment #38)
Hi Paul,

> Could you please add myself (and Ralf?) to the CC list of that message.

Sure!

> I will report back tomorrow.

Thank you, I wait for your report.


Regards,
Takahisa
Comment 41 Takahisa Tanaka 2012-08-11 03:03:23 UTC
Hi Ralf,

>  however, I hope you understand that I will not test dangerous patches
> on my machine.

No problem. :) I understood what you mean by reading the your comment.
Thank you for your opinion!

Regards,
Takahisa
Comment 42 Paul Menzel 2012-08-11 17:46:03 UTC
(In reply to comment #40)
> (In reply to comment #38)

[…]

> > I will report back tomorrow.
> 
> Thank you, I wait for your report.

I am sorry it took a day longer. It still seems to work!

Tested-by: Paul Menzel <paulepanter@users.sourceforge.net>

[   28.878768] calling  sp5100_tco_init_module+0x0/0x1000 [sp5100_tco] @ 557
[   28.878775] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
[   28.878848] sp5100_tco: PCI Revision ID: 0x3a
[   28.878902] sp5100_tco: Using 0xfec00400 for watchdog MMIO address
[   28.878912] sp5100_tco: Watchdog reboot not detected
[   28.878973] sp5100_tco: initialized (0xf802e400). heartbeat=60 sec (nowayout=0, force_addr=0x0)
[   28.878990] initcall sp5100_tco_init_module+0x0/0x1000 [sp5100_tco] returned 0 after 207 usecs

This is with `initcall_debug printk.time=y`. The patch is applied on top of the following commit.

commit f4ba394c1b02e7fc2179fda8d3941a5b3b65efb6
Merge: bf44ce8 5d299f3
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Aug 8 20:06:43 2012 +0300

Takahisa, please also add

    CC: stable@kernel.org

above your Signed-off-by line so that it get applied there as well and users of older Linux versions can profit from your work too.
Comment 43 Takahisa Tanaka 2012-08-12 02:44:45 UTC
Hi Paul,

> I am sorry it took a day longer. It still seems to work!

No problem. Thank you for quick testing!


> Takahisa, please also add
> 
>    CC: stable@kernel.org
>
> above your Signed-off-by line so that it get applied there as well and
> users of older Linux versions can profit from your work too.

Thank you for explanation, I understood. 


Regards,
Takahisa
Comment 44 Takahisa Tanaka 2012-08-12 13:32:56 UTC
(In reply to comment #43)
> > Takahisa, please also add
> > 
> >    CC: stable@kernel.org
> >
> > above your Signed-off-by line so that it get applied there as well and
> > users of older Linux versions can profit from your work too.
> 
> Thank you for explanation, I understood. 
> 
Hi Paul,

I have posted the patch to LKML. But, Only the stable@kernel.org was bounced, because that's why "User Unknown". 

I'm investigating the reason for an "User Unknown" error. Please let me know if you have any information. 

Regards,
Takahisa
Comment 45 Paul Menzel 2012-08-12 13:56:06 UTC
(In reply to comment #44)
> (In reply to comment #43)
> > > Takahisa, please also add
> > > 
> > >    CC: stable@kernel.org
> > >
> > > above your Signed-off-by line so that it get applied there as well and
> > > users of older Linux versions can profit from your work too.
> > 
> > Thank you for explanation, I understood. 
> > 
> Hi Paul,
> 
> I have posted the patch to LKML. But, Only the stable@kernel.org was bounced,
> because that's why "User Unknown". 
> 
> I'm investigating the reason for an "User Unknown" error. Please let me know
> if
> you have any information.

I am sorry. I messed up. Reading `stable_kernel_rules.txt` [1] again it must be <stable@vger.kernel.org>.

[1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=Documentation/stable_kernel_rules.txt;h=b0714d8f678ac51d0c280a4f5f2980196052421f;hb=HEAD
Comment 46 Takahisa Tanaka 2012-08-12 15:05:06 UTC
(In reply to comment #45)

Thank you for quick your help, the mail address was changed!
I'm sorry for not finding out the document. 

It is better for me to have changed the mail address(stable@vger.kernel.org) and to retransmit a patch? 


Regards,
Takahisa
Comment 47 Paul Menzel 2012-08-12 15:13:29 UTC
(In reply to comment #46)
> (In reply to comment #45)

[…]

> It is better for me to have changed the mail address(stable@vger.kernel.org)
> and to retransmit a patch?

Since you have to send a [PATCH v2] anyway for the errors I spotted, I think you should wait for the review of the Linux developers and only send an updated patch afterward.
Comment 48 Takahisa Tanaka 2012-08-13 11:34:56 UTC
(In reply to comment #47)
> Since you have to send a [PATCH v2] anyway for the errors I spotted, I think
> you should wait for the review of the Linux developers and only send an
> updated
> patch afterward.

Thank you for the reply at LKML. I understand.

I posted the patch to LKML for the first time. Many thanks for your advice. 


Regards,
Takahisa
Comment 49 Arkadiusz Miskiewicz 2012-08-26 09:08:36 UTC
With https://patchwork.kernel.org/patch/1309571/ on 3.5.3 I'm getting:

[    7.191845] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
[    7.191945] sp5100_tco: PCI Revision ID: 0x42
[    7.192041] sp5100_tco: Using 0xfed80b00 for watchdog MMIO address
[    7.192056] sp5100_tco: Watchdog reboot not detected
[    7.192164] sp5100_tco: initialized (0xffffc90000c7eb00). heartbeat=60 sec (nowayout=0, force_addr=0x0)

system is Asus E35M1-I DELUXE, BIOS 1402 07/26/2012. Didn't try if watchdog is actually working.

force_addr=none instead of 0x0, it's misleading a bit?

Offtopic:

        /* Check to see if last reboot was due to watchdog timeout */
        pr_info("Watchdog reboot %sdetected\n",

could be improved. "Last reboot was (not) triggered by watchdog." or something.
Comment 50 Takahisa Tanaka 2012-08-27 14:13:53 UTC
(In reply to comment #49)
> 
> force_addr=none instead of 0x0, it's misleading a bit?
> 
> Offtopic:
> 
>         /* Check to see if last reboot was due to watchdog timeout */
>         pr_info("Watchdog reboot %sdetected\n",
> 
> could be improved. "Last reboot was (not) triggered by watchdog." or
> something.

Thank you for trying my patch!

I agree with your comment. I have to correct the mistake in the module description message of the current my patch. When correcting my patch, concurrently I will correct messages which you pointed out.


Thanks,
Takahisa
Comment 51 Paul Menzel 2012-11-05 08:19:58 UTC
Could somebody give an update about the status of this patch? That would be great.
Comment 52 Jonathan Nieder 2012-11-05 08:28:55 UTC
(In reply to comment #51)
> Could somebody give an update about the status of this patch?

I think the ball's in your court. ;-)  Basically, the next step is
to take Takahisa's latest patch, update it according to Arkadiusz's
review, add your sign-off, and send it as v2 to
linux-watchdog@vger.kernel.org, cc-ing Wim Van Sebroeck <wim@iguana.be>,
Takahisa, Arkadiusz, Bjorn, and akpm for review and possible application.
Comment 53 Ralf 2012-11-05 12:49:35 UTC
I don't own the laptop anymore which showed this error, so I can no longer help here.
Comment 54 Takahisa Tanaka 2012-11-11 03:10:22 UTC
Hi Paul,

Sorry for the delay. I have finished the patch V2. 
Just to be safe, I'm testing the patch V2. Please wait for a couple of days.


Regards,
Takahisa
Comment 55 Takahisa Tanaka 2012-11-13 13:46:18 UTC
Created attachment 86321 [details]
Proposed Patch V4

Hi Paul,


I finished the test of the patch. I confirmed that the patch works fine on ASUS M4A89GTD-PRO/USB3(SB850 chipset) and DL165G7(SP5100 chipset).

When I'm testing my patch, I found a bug that can't correctly determine the watchdog fired in original sp5100_tco driver(See below), and I have fixed the bug. The sp5100_tco.c has been changed. So, I'm sorry to bother you again. Once again, Could you test the proposed patch V4?

<sp5100_tco.c>
static unsigned char __devinit sp5100_tco_setupdevice(void)
{
    ...snip...
    /* Check that the watchdog action is set to reset the system. */
    val = readl(SP5100_WDT_CONTROL(tcobase));
    val &= ~SP5100_PM_WATCHDOG_ACTION_RESET; <--- The WatchDogFired field is cleared here.
    writel(val, SP5100_WDT_CONTROL(tcobase));

static int __devinit sp5100_tco_init(struct platform_device *dev)
{
    ...snip...
    /* Check to see if last reboot was due to watchdog timeout */
    pr_info("Watchdog reboot %sdetected\n",
            readl(SP5100_WDT_CONTROL(tcobase)) & SP5100_PM_WATCHDOG_FIRED ? <--- always False!
            "" : "not ");


Regards,
Takahisa
Comment 56 Arkadiusz Miskiewicz 2012-11-13 17:22:49 UTC
Tested it a bit more this time:

[   10.553738] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
[   10.553859] sp5100_tco: PCI Revision ID: 0x42
[   10.553955] sp5100_tco: Using 0xfed80b00 for watchdog MMIO address
[   10.553970] sp5100_tco: Watchdog reboot not detected
[   10.554053] sp5100_tco: initialized (0xffffc90000c7eb00). heartbeat=60 sec (nowayout=1, force_addr=0x0)


- force_addr=0x0 - still misleading. I didn't specify any address.
- watchdog works (it rebooted my machine fine with nowayout=1 after I killed userspace watchdog process)
- "Watchdog reboot not detected" - this lies. Driver rebooted my machine fine in testing but on the next boot it told me "watchdog reboot not detected".

3.5.7 kernel + patch
Asus E35M1-I DELUXE, BIOS 1404 08/29/2012
Comment 57 Takahisa Tanaka 2012-11-14 12:28:29 UTC
Created attachment 86381 [details]
Proposed Patch V5
Comment 58 Takahisa Tanaka 2012-11-14 12:30:07 UTC
(In reply to comment #56)
> Tested it a bit more this time:

Thank you for testing. 

Sorry about my fault. I had uploaded the old patch. 
I uploaded the correct patch. So, I'm sorry to bother you again. Could you test the Proposed Patch V5?


Regards,
Takahisa
Comment 59 Arkadiusz Miskiewicz 2012-11-14 13:24:16 UTC
Now looks good.

booted; killed userspace watchdog process; sp5100 driver rebooted machine exactly after 60s (matches my hearbeat setting); booted again and it informed me correctly that "Last reboot was triggered by watchdog."

[    6.976826] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03arekm
[    6.976927] sp5100_tco: PCI Revision ID: 0x42
[    6.977022] sp5100_tco: Using 0xfed80b00 for watchdog MMIO address
[    6.977036] sp5100_tco: Last reboot was triggered by watchdog.
[    6.977647] sp5100_tco: initialized (0xffffc90000c7eb00). heartbeat=60 sec (nowayout=1, force_addr=none)

reboot again and:
[    5.531840] sp5100_tco: Last reboot was not triggered by watchdog.

So works fine for me.
Comment 60 Takahisa Tanaka 2012-11-14 13:43:20 UTC
(In reply to comment #59)
> Now looks good.

Thank you for quick testing!
May I add your name to 'Tested-by: ' of my patch? 


Regards,
Takahisa
Comment 61 Arkadiusz Miskiewicz 2012-11-14 13:47:12 UTC
(In reply to comment #60)

> May I add your name to 'Tested-by: ' of my patch? 

Sure.
Comment 62 Takahisa Tanaka 2012-11-15 13:02:12 UTC
(In reply to comment #61)

Thanks. 
I will send the patch to linux-watchdog.


Takahisa
Comment 63 Paul Menzel 2012-11-28 09:09:32 UTC
Dear Takahisa,


thanks a lot for ongoing awesome work in this area.

Applying your patch on top of

commit 2844a48706e54ddda4a04269dba4250b42f449de
Merge: 5687100 aa10990
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Nov 26 18:33:33 2012 -0800

    Merge branch 'akpm' (Fixes from Andrew)

I get the following with added `initcall_debug printk.time=y` to the Linux command line.

[   35.683193] calling  sp5100_tco_init_module+0x0/0x1000 [sp5100_tco] @ 539
[   35.683200] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
[   35.683273] sp5100_tco: PCI Revision ID: 0x3a
[   35.683326] sp5100_tco: Using 0xfec00400 for watchdog MMIO address
[   35.683335] sp5100_tco: Last reboot was not triggered by watchdog.
[   35.683401] sp5100_tco: initialized (0xf804e400). heartbeat=60 sec (nowayout=0, force_addr=none)
[   35.683415] initcall sp5100_tco_init_module+0x0/0x1000 [sp5100_tco] returned 0 after 207 usecs
Comment 64 Paul Menzel 2012-11-28 09:12:19 UTC
By the way, is there a way to test this, without putting the file system consistency at risk? Boot to the initramfs and then trigger the watchdog with the following?

    echo a | sudo tee /dev/watchdog
Comment 65 Takahisa Tanaka 2012-11-29 13:15:54 UTC
(In reply to comment #64)
Hi Paul,

Thank you for testing and your comments!
I will fix the ChangeLog of my patch. 

> Boot to the initramfs and then trigger the watchdog with the following?

Yes. The 'rd.break=pre-mount' option can boot a kernel without mounting
physical filesystem. In the case of Fedora17, I was able to test watchdog
timer in the following step. 

  1. Add the sp5100_tco driver to initramfs-<kernel version>.img. 

      # dracut --force --add-drivers "sp5100_tco"

  2. reboot

      # reboot

  3. The GRUB menu is displayed.

      2-1. Select kernel for watchdog test.
      2-2. Press e. Then Enter edit mode.
      2-3. Add 'rd.break=pre-mount' to linux line.
      2-4. Press F10 or Ctrl-x. Then boot.

  4. During boot, dropping to shell with root privileges before root filesystem
     is mounted.

  5. Start the watchdog timer. When the watchdog device file is opened,
     the watchdog timer starts. In the case of a default value, PC will
     reboot in 60 sec. 

      # echo a > /dev/watchdog



Regards,
Takahisa
Comment 66 Paul Menzel 2012-12-04 16:13:23 UTC
After doing `more /dev/watchdog` the watchdog restarted the system after 60 seconds, but during the next start the information about having been restarted is incorrect.

[   92.444223] calling  sp5100_tco_init_module+0x0/0x1000 [sp5100_tco] @ 601
[   92.444230] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
[   92.444312] sp5100_tco: PCI Revision ID: 0x3a
[   92.444371] sp5100_tco: Using 0xfec00400 for watchdog MMIO address
[   92.444380] sp5100_tco: Last reboot was not triggered by watchdog.  
[   92.444760] sp5100_tco: initialized (0xf802e400). heartbeat=60 sec (nowayout=0, force_addr=none)
[   92.446562] initcall sp5100_tco_init_module+0x0/0x1000 [sp5100_tco] returned 0 after 2272 usecs

I applied your patch on top of.

commit b69f0859dc8e633c5d8c06845811588fe17e68b3
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Dec 3 11:22:37 2012 -0800

    Linux 3.7-rc8
Comment 67 Paul Menzel 2012-12-04 16:14:59 UTC
Under Debian one can add `break=premount` to the Linux command line to get into the BusyBox shell of the initramfs. See `man initramfs-tools`. As Takahisa one needs to add `sp5100_tco` manually to the initramfs image.
Comment 68 Paul Menzel 2012-12-13 15:03:08 UTC
Under Debian I added `sp5100_tco` to `/etc/initramfs-tools/modules` and ran `sudo update-initramfs -u` and restarted as explained in comment #67.
Comment 69 Paul Menzel 2012-12-13 15:06:40 UTC
I tested again on top of Linux 3.7.

        commit 29594404d7fe73cd80eaa4ee8c43dcc53970c60e
        Author: Linus Torvalds <torvalds@linux-foundation.org>
        Date:   Mon Dec 10 19:30:57 2012 -0800

            Linux 3.7

and can confirm, that after a watchdog reset, the output is incorrect, that the last reboot was not caused by the watchdog.

        $ dmesg | grep 5100
        [    1.689883] calling  sp5100_tco_init_module+0x0/0x1000 [sp5100_tco] @ 89
        [    1.689891] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
        [    1.689962] sp5100_tco: PCI Revision ID: 0x3a
        [    1.690012] sp5100_tco: Using 0xfec00400 for watchdog MMIO address
        [    1.690021] sp5100_tco: Last reboot was not triggered by watchdog.
        [    1.697283] sp5100_tco: initialized (0xf800a400). heartbeat=60 sec (nowayout=0, force_addr=none)
        [    1.697303] initcall sp5100_tco_init_module+0x0/0x1000 [sp5100_tco] returned 0 after 7233 usecs
Comment 70 Paul Menzel 2012-12-13 15:18:10 UTC
I also confirmed again that I have patch iteration 5 (v5) from the report applied.
Comment 71 Paul Menzel 2012-12-13 15:19:52 UTC
To conclude my comment spamming, I would still vote to get the current patch into Linux 3.8, but note in the commit message, that at least on SB8xx figuring out if the last  boot was triggered by the watchdog or not, does not work.
Comment 72 Takahisa Tanaka 2012-12-14 13:21:14 UTC
(In reply to comment #71)
> To conclude my comment spamming, I would still vote to get the current patch
> into Linux 3.8, but note in the commit message, that at least on SB8xx
> figuring
> out if the last  boot was triggered by the watchdog or not, does not work.

Many thanks for testing!

I'm sure that you are using SB7x0 chipset. I have tested only PC with SP5100 chipset, because I don't have PC with SB7x0 chipset.

I will investigate this problem. Please give me some time to investigating this problem. 


Regards,
Takahisa
Comment 73 Paul Menzel 2012-12-14 14:23:13 UTC
(In reply to comment #72)
> (In reply to comment #71)
> > To conclude my comment spamming, I would still vote to get the current
> patch
> > into Linux 3.8, but note in the commit message, that at least on SB8xx
> figuring
> > out if the last  boot was triggered by the watchdog or not, does not work.
> 
> Many thanks for testing!

No problem. You did all of the work writing that patch!

> I'm sure that you are using SB7x0 chipset. I have tested only PC with SP5100
> chipset, because I don't have PC with SB7x0 chipset.

You are right. This is a ASRock A780FullHD [1].

$ lspci
00:00.0 Host bridge: Advanced Micro Devices [AMD] RS780 Host Bridge
00:01.0 PCI bridge: ASRock Incorporation Device 9602
00:09.0 PCI bridge: Advanced Micro Devices [AMD] RS780/RS880 PCI to PCI bridge (PCIE port 4)
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] RS780/RS880 PCI to PCI bridge (PCIE port 5)
00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]
00:12.0 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:12.1 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0 USB OHCI1 Controller
00:12.2 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:13.0 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:13.1 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0 USB OHCI1 Controller
00:13.2 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:14.0 SMBus: Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller (rev 3a)
00:14.1 IDE interface: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 IDE Controller
00:14.2 Audio device: Advanced Micro Devices [AMD] nee ATI SBx00 Azalia (Intel HDA)
00:14.3 ISA bridge: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 LPC host controller
00:14.4 PCI bridge: Advanced Micro Devices [AMD] nee ATI SBx00 PCI to PCI Bridge
00:14.5 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:05.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI RS780 [Radeon HD 3200]
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)

> I will investigate this problem. Please give me some time to investigating
> this
> problem.

As written I would favor getting this in in the current state and fix that message correctness later on. Maybe bug the maintainers again, to get it in this pull request for Linux 3.8.


[1] http://www.asrock.com/MB/overview.asp?Model=A780FullHD
Comment 74 Paul Menzel 2012-12-18 11:44:53 UTC
Takahisa, according to the vendor Web page, I do have a SB700 chipset. Though reading your commit message, SB700 should have been supported beforehand. But I did get the error message and your patch fixes it. What am I missing?
Comment 75 Paul Menzel 2012-12-18 11:47:51 UTC
Reading the register guide [1] I do not find a register storing wether the reboot was triggered by the watchdog or not. Did you find a register/bit storing that information?

[1] http://developer.amd.com/wordpress/media/2012/10/43009_sb7xx_rrg_pub_1.00.pdf
Comment 76 Paul Menzel 2012-12-18 11:50:01 UTC
I took you patch and split it up. Could you please check it? Another cleanup would be, to factor out the whitespace and comment changes (full stop) to a separate patch.

Additionally I took out the description change, as all other modules do not use a full stop at the end. In my opinion this should be changed in all watchdog descriptions at one in a separate patch.
Comment 77 Paul Menzel 2012-12-18 11:53:28 UTC
Created attachment 89421 [details]
[PATCH v6 1/2] sp5100_tco: Add module parameter `force_addr` to set MMIO address

Split patch to add module parameter. Commit message can be improved by adding an example I guess.
Comment 78 Paul Menzel 2012-12-18 11:55:33 UTC
Created attachment 89431 [details]
[PATCH v6 2/2] sp5100_tco: Add SB8x0 chipset support

Patch to add SB8x0 support. Comment and white space changes could be factored out.

Additionally it could be documented that SBxxx does not have a register to indicate that reboot was triggered by the watchdog. (If this is true.)
Comment 79 Takahisa Tanaka 2012-12-18 12:47:10 UTC
(In reply to comment #75)
> Takahisa, according to the vendor Web page, I do have a SB700 chipset. Though
> reading your commit message, SB700 should have been supported beforehand. But
> I did get the error message and your patch fixes it. What am I missing?

SB700 chipset is the same register construction as SP5100 chipset. Therefore, SB700 chipset generates the same problem as SP5100. 


> Reading the register guide [1] I do not find a register storing wether the
> reboot was triggered by the watchdog or not. Did you find a register/bit
> storing that information?

It is the WatchDogFired bit of the WatchDogTimer register of the page 233.
Comment 80 Paul Menzel 2012-12-18 12:56:19 UTC
(In reply to comment #79)
> (In reply to comment #75)
> > Takahisa, according to the vendor Web page, I do have a SB700 chipset.
> Though
> > reading your commit message, SB700 should have been supported beforehand.
> But
> > I did get the error message and your patch fixes it. What am I missing?
> 
> SB700 chipset is the same register construction as SP5100 chipset. Therefore,
> SB700 chipset generates the same problem as SP5100. 

But if I have SB700, why is the offset incorrect? Reading the commit message it sounds like only SB8xx has problems.

> > Reading the register guide [1] I do not find a register storing wether the
> > reboot was triggered by the watchdog or not. Did you find a register/bit
> > storing that information?
> 
> It is the WatchDogFired bit of the WatchDogTimer register of the page 233.

Missed those. :( Thanks!

»A value of “1” indicates that the watchdog timer has expired
and caused the current restart. The bit is cleared by writing a
“1” to bit 1 in the Watchdog Control register. Writing a “0” has
no effect. The bit is cleared by a power cycle or by the
operating system and it must remain cleared for any restart
that is not caused by the watchdog timer firing. The bit is only
valid when the watchdog is enabled.«
Comment 81 Paul Menzel 2012-12-18 12:57:52 UTC
(In reply to comment #77)
> Created an attachment (id=89421) [details]
> [PATCH v6 1/2] sp5100_tco: Add module parameter `force_addr` to set MMIO
> address
> 
> Split patch to add module parameter. Commit message can be improved by adding
> an example I guess.

I screwed up and did not move the declaration of `addr_str`to the first patch. I’ll send a correct patch, if Takahisa and the maintainers agree that splitting this up is wanted.
Comment 82 Paul Menzel 2012-12-18 13:11:26 UTC
Regarding the correct output, if the watchdog fired the reboot or not, I might try the old cold by giving the new MMIO address to the module I guess. I will try that during the next days and report back.

Looking over the code, Takahisa seems to be doing the same as before though. So it might be a hardware bug.
Comment 83 Takahisa Tanaka 2012-12-18 13:19:28 UTC
(In reply to comment #82)
> Regarding the correct output, if the watchdog fired the reboot or not, I
> might
> try the old cold by giving the new MMIO address to the module I guess. I will
> try that during the next days and report back.
> 
> Looking over the code, Takahisa seems to be doing the same as before though.
> So
> it might be a hardware bug.

I have re-tested my patch by SP5100 chipset that has the same register construction as SB700 chipset. However, I was not able to reproduce the same problem. The following is a log of a test result on SP5100 chipset, when OS reboots due to watchdog timer timeout.  

  Fedora17/x86_64(kernel-3.6.10-2.fc17)
    <In case of allocate_resource()>
      # dmesg | grep 5100
      [   10.751088] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
      [   10.751209] sp5100_tco: PCI Revision ID: 0x3d
      [   10.751267] sp5100_tco: Using 0xfec00400 for watchdog MMIO address
      [   10.751278] sp5100_tco: Last reboot was triggered by watchdog.
      [   10.751428] sp5100_tco: initialized (0xffffc90010884400). heartbeat=60 sec (nowayout=0, force_addr=none)
      # 

    <In case of force_addr>
      # dmesg | grep 5100
      [    9.297143] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
      [    9.298662] sp5100_tco: PCI Revision ID: 0x3d
      [    9.298704] sp5100_tco: Force the use of 0xfec00800 as MMIO address
      [    9.298753] sp5100_tco: Using 0xfec00800 for watchdog MMIO address
      [    9.298765] sp5100_tco: Last reboot was triggered by watchdog.
      [    9.298909] sp5100_tco: initialized (0xffffc90000334800). heartbeat=60 sec (nowayout=0, force_addr=0xfec00800)
      # 

  Fedora17/i686.PAE(kernel-3.6.10-2.fc17)
    <In case of allocate_resource()>
      # dmesg | grep 5100
      [    9.370740] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
      [    9.370870] sp5100_tco: PCI Revision ID: 0x3d
      [    9.370942] sp5100_tco: Using 0xfec00400 for watchdog MMIO address
      [    9.370953] sp5100_tco: Last reboot was triggered by watchdog.
      [    9.371079] sp5100_tco: initialized (0xf7e3c400). heartbeat=60 sec (nowayout=0, force_addr=none)
      # 

    <In case of force_addr>
     # dmesg | grep 5100
     [    8.960918] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
     [    8.961040] sp5100_tco: PCI Revision ID: 0x3d
     [    8.961073] sp5100_tco: Force the use of 0xfec00800 as MMIO address
     [    8.961115] sp5100_tco: Using 0xfec00800 for watchdog MMIO address
     [    8.961126] sp5100_tco: Last reboot was triggered by watchdog.
     [    8.961261] sp5100_tco: initialized (0xf7e24800). heartbeat=60 sec (nowayout=0, force_addr=0xfec00800)
     # 

So, at this time, I guess that this problem depend on the SB700 chipset or BIOS of your PC.
Comment 84 Takahisa Tanaka 2012-12-18 13:37:43 UTC
(In reply to comment #80)
> > SB700 chipset is the same register construction as SP5100 chipset.
> Therefore,
> > SB700 chipset generates the same problem as SP5100. 
> 
> But if I have SB700, why is the offset incorrect? Reading the commit message
> it
> sounds like only SB8xx has problems.

The offset address recommends 0xfec000f0 in SP5100 and SB7x0 and SB8x0 Southbridges BIOS Developer's Guide. As far as I know, almost all PCs that are using the SP5100 or SB7x0 or SB8x0 chipsets use 0xfec000f0 for an offset address. 

<In case of SB7x0 chipset>
http://developer.amd.com/wordpress/media/2012/10/43366_sb7xx_bdg_pub_1.00.pdf

Page 42

Ensure that the watchdog timer base address is set to a non zero value, typically 
0FEC000F0h. The watchdog base address is set at PMIO address 6Ch-6Fh as shown in 
the sample program below. (PMIO is addressed as byte index/data):
Comment 85 Takahisa Tanaka 2012-12-18 13:52:11 UTC
(In reply to comment #81)
> I screwed up and did not move the declaration of `addr_str`to the first
> patch.
> I’ll send a correct patch, if Takahisa and the maintainers agree that
> splitting
> this up is wanted.

I found out that my patch was merged into the linux-next and linux-watchdog-next repository, yesterday.

https://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=f245c1948ce9ff332bb45cc22ff29355dda6f258

http://www.linux-watchdog.org/cgi-bin/gitweb.cgi?p=linux-watchdog-next.git;a=commit;h=f245c1948ce9ff332bb45cc22ff29355dda6f258


At such a time as this, What should I do?


Regards,
Takahisa
Comment 86 Paul Menzel 2012-12-18 16:33:34 UTC
(In reply to comment #85)
> (In reply to comment #81)
> > I screwed up and did not move the declaration of `addr_str`to the first
> patch.
> > I’ll send a correct patch, if Takahisa and the maintainers agree that
> splitting
> > this up is wanted.
> 
> I found out that my patch was merged into the linux-next and
> linux-watchdog-next repository, yesterday.
> 
>
> https://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=f245c1948ce9ff332bb45cc22ff29355dda6f258
> 
>
> http://www.linux-watchdog.org/cgi-bin/gitweb.cgi?p=linux-watchdog-next.git;a=commit;h=f245c1948ce9ff332bb45cc22ff29355dda6f258
> 
> 
> At such a time as this, What should I do?

I guess be happy and let things be. As the subsystem maintainer committed it, your patch is good enough I guess. Congratulations and thanks again!
Comment 87 Takahisa Tanaka 2012-12-19 13:01:55 UTC
(In reply to comment #86)

Thank you, Paul. Thank you, everybody.

> I guess be happy and let things be. As the subsystem maintainer committed it,

I underdstand.

> your patch is good enough I guess. Congratulations and thanks again!

The problem of the Comment #69 has not been solved yet. I think that it is better to continue investigation in a new bug report. If you open a new bug report, although I don't have PC with SB700 chipset, I will cooperate in solving a problem.


Regards,
Takahisa
Comment 88 Florian Mickler 2012-12-22 09:26:05 UTC
A patch referencing this bug report has been merged in Linux v3.8-rc1:

commit 740fbddf5c3f9ad8b23c5d917ba1cc7e376a5104
Author: Takahisa Tanaka <mc74hc00@gmail.com>
Date:   Sun Dec 2 14:33:18 2012 +0900

    watchdog: sp5100_tco: Add SB8x0 chipset support
Comment 89 Takahisa Tanaka 2012-12-22 13:09:01 UTC
(In reply to comment #88)
> A patch referencing this bug report has been merged in Linux v3.8-rc1:

I appreciate your letting me know about that!

Thanks,
Takahisa
Comment 90 Takahisa Tanaka 2013-01-14 02:26:06 UTC
I have updated sp5100_tco.c. 

https://lkml.org/lkml/2013/1/13/176
https://lkml.org/lkml/2013/1/13/173

For your convenience, I will post the test result in my test environments. 
I confirmed that this patch works fine in my test environments.

<<<  In case of force_addr option on SP5100 chipset >>>
# dmesg | grep 5100
[    9.995987] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
[    9.996122] sp5100_tco: PCI Revision ID: 0x3d
[    9.996160] sp5100_tco: Force the use of 0xfec00800 as MMIO address
[    9.996201] sp5100_tco: Using 0xfec00800 for watchdog MMIO address
[    9.996212] sp5100_tco: Last reboot was triggered by watchdog.
[    9.996331] sp5100_tco: initialized (0xffffc90000334800). heartbeat=30 sec (nowayout=0, force_addr=0xfec00800)
#

<<<  In case of allocate_resource() on SP5100 chipset >>>
# dmesg | grep 5100
[   10.521378] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
[   10.521542] sp5100_tco: PCI Revision ID: 0x3d
[   10.521620] sp5100_tco: Using 0xfec00400 for watchdog MMIO address
[   10.521632] sp5100_tco: Last reboot was triggered by watchdog.
[   10.521761] sp5100_tco: initialized (0xffffc90010884400). heartbeat=30 sec (nowayout=0, force_addr=none)
#

<<<  In case of AcpiMmio on SB850 chipset >>>
# dmesg | grep 5100
[   76.259103] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
[   76.335323] sp5100_tco: PCI Revision ID: 0x41
[   76.424747] sp5100_tco: Using 0xfed80b00 for watchdog MMIO address
[   76.515239] sp5100_tco: Last reboot was triggered by watchdog.
[   76.601781] sp5100_tco: initialized (0xffffc900115f8b00). heartbeat=30 sec (nowayout=0, force_addr=none)
#

<<<  In case of force_addr option on SB850 chipset >>>
# dmesg | grep 5100
[   73.673064] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
[   73.765642] sp5100_tco: PCI Revision ID: 0x41
[   73.834344] sp5100_tco: Force the use of 0xfec00800 as MMIO address
[   73.925904] sp5100_tco: Using 0xfec00800 for watchdog MMIO address
[   74.016177] sp5100_tco: Last reboot was triggered by watchdog.
[   74.098472] sp5100_tco: initialized (0xffffc900115f8800). heartbeat=30 sec (nowayout=0, force_addr=0xfec00800)
#

<<<  In case of allocate_resource() on SB850 chipset >>>
# dmesg | grep 5100
[   73.709593] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
[   73.785697] sp5100_tco: PCI Revision ID: 0x41
[   73.837899] sp5100_tco: Using 0xfec00400 for watchdog MMIO address
[   73.911740] sp5100_tco: Last reboot was triggered by watchdog.
[   73.911868] sp5100_tco: initialized (0xffffc900115f8400). heartbeat=30 sec (nowayout=0, force_addr=none)
#


Thanks,
Takahisa
Comment 91 Paul Menzel 2013-02-06 08:36:32 UTC
(In reply to comment #90)
> I have updated sp5100_tco.c. 

Thank you very much and sorry for the late reply.

> https://lkml.org/lkml/2013/1/13/176
> https://lkml.org/lkml/2013/1/13/173

Thank you. The patches have been committed to watchdog-next in the meantime.

I will reply to them asking to add `CC: stable@vger.kernel.org` too.

> For your convenience, I will post the test result in my test environments. 
> I confirmed that this patch works fine in my test environments.

[…]

I applied your original patch (in 3.8-rc1) and your two patches from above on top of the stable series 3.7.5.

    commit 13280f4e2f8cd45240aef9c31643b17bffb3e9de
    Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Date:   Sun Jan 27 20:50:55 2013 -0800

        Linux 3.7.5

and tested it on a different board this time. The ASRock E350M1 [1] has a SB800 chipset (A50M).

    00:15.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB700/SB800/SB900 PCI to PCI bridge (PCIE port 0)
    00:15.1 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB700/SB800/SB900 PCI to PCI bridge (PCIE port 1)

Testing the not-working part, that it was not detected if the last reboot was triggered by the watchdog, I used Debian Sid/unstable as described in comment #67.

And it indeed worked with the default settings. Here is the output.

    [    1.876695] calling  sp5100_tco_init_module+0x0/0x1000 [sp5100_tco] @ 97
    [    1.876775] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.03
    [    1.876883] sp5100_tco: PCI Revision ID: 0x42
    [    1.876996] sp5100_tco: Using 0xfed80b00 for watchdog MMIO address
    [    1.877010] sp5100_tco: Last reboot was triggered by watchdog.
    [    1.879322] sp5100_tco: initialized (0xf8176b00). heartbeat=60 sec (nowayout=0, force_addr=none)
    [    1.879427] initcall sp5100_tco_init_module+0x0/0x1000 [sp5100_tco] returned 0 after 2585 usecs


Thanks,

Paul


[1] http://www.asrock.com/mb/overview.asp?Model=E350M1
Comment 92 Takahisa Tanaka 2013-02-07 14:14:53 UTC
(In reply to comment #91)
> 
> I will reply to them asking to add `CC: stable@vger.kernel.org` too.

I'm sorry to have troubled you owing to my mistake. I should have reconfirmed the list of CC...


Regards,
Takahisa
Comment 93 David Einerson 2013-02-25 02:11:34 UTC
Created attachment 94011 [details]
dmesg for Linux ubuntu 3.5.0-25-generic x86_64

Linux <hostname> 3.5.0-25-generic #38-Ubuntu SMP Mon Feb 18 23:27:42 UTC 2013 x86_64 GNU/Linux
Comment 94 Paul Menzel 2013-02-25 07:17:50 UTC
(In reply to comment #93)

Dear David,


thank you for following up, but seriously …

> Created an attachment (id=94011) [details]
> dmesg for Linux ubuntu 3.5.0-25-generic x86_64

what is the point of attaching a text file, with a link to a paste site? Just past (this small) output to the comment here. I am doing it for you.

1. What board do you have? You could attach the output of `dmidecode` and `lspci` for example.

[   19.962304] ACPI Warning: 0x0000000000000b00-0x0000000000000b07 SystemIO 
conflicts with Region \_SB_.PCI0.SBRG.ASOC.SMRG 1 (20120320/utaddress-251)

You could report a bug to the Linux ACPI component about this error. They might be able to fix your BIOS. You should also contact your mainboard vendor about this.

[   19.962310] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[   19.976415] ip_tables: (C) 2000-2006 Netfilter Core Team
[   19.985265] sp5100_tco: SP5100 TCO WatchDog Timer Driver v0.01
[   19.985338] sp5100_tco: mmio address 0xb8fe00 already in use

Sure this is still there, …

[   20.024316] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[   20.037771] EDAC MC: Ver: 2.1.0
[   20.155347] Linux video capture interface: v2.00
[   20.164345] kvm: Nested Virtualization enabled
[   20.164349] kvm: Nested Paging enabled
[   20.196862] uvcvideo: Found UVC 1.00 device <unnamed> (046d:0821)
[   20.207678] AMD64 EDAC driver v3.4.0
[   20.207712] EDAC amd64: DRAM ECC disabled.
[   20.207719] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.

> Linux <hostname> 3.5.0-25-generic #38-Ubuntu SMP Mon Feb 18 23:27:42 UTC 2013
> x86_64 GNU/Linux

… as this patch is only in Linux v3.8-rc1 as Florian commented in comment #88.

You could find the corresponding Ubuntu Launchpad report (it should be there already), reference this here and tell the Ubuntu folks to backport the patch. Just remember that it might cause problems [1][2].

[1] https://lkml.org/lkml/2013/2/18/353
[2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1116835
Comment 95 Takahisa Tanaka 2013-02-25 13:46:31 UTC
(In reply to comment #93)
> Created an attachment (id=94011) [details]
> dmesg for Linux ubuntu 3.5.0-25-generic x86_64
> 
> Linux <hostname> 3.5.0-25-generic #38-Ubuntu SMP Mon Feb 18 23:27:42 UTC 2013
> x86_64 GNU/Linux

Hi David,

You are using the sp5100_tco driver version 0.01. 

  [   19.985265] sp5100_tco: SP5100 TCO WatchDog Timer Driver v0.01

The sp5100_tco driver updated by patch of this thread is version 0.03, and the patch of this thread isn't applied to linux-image-3.5.0-25-generic. Therefore, the patch of this thread doesn't cause the problem.

I agree with Paul's advice. 

> You could report a bug to the Linux ACPI component about this error. They
> might be able to fix your BIOS. You should also contact your mainboard
> vendor about this.


Regards,
Takahisa
Comment 96 Florian Mickler 2013-03-04 21:21:06 UTC
A patch referencing this bug report has been merged in Linux v3.9-rc1:

commit 10ab329b5db7e592a3a60b4594e4e5f40b60c45c
Author: Takahisa Tanaka <mc74hc00@gmail.com>
Date:   Mon Jan 14 11:01:57 2013 +0900

    watchdog: sp5100_tco: Fix wrong indirect I/O access for getting value of reserved bits
Comment 97 Florian Mickler 2013-03-04 22:50:58 UTC
A patch referencing this bug report has been merged in Linux v3.9-rc1:

commit 41adafbd7b84c66c2cdad857b75d5d45032310a6
Author: Takahisa Tanaka <mc74hc00@gmail.com>
Date:   Mon Jan 14 11:01:58 2013 +0900

    watchdog: sp5100_tco: Write back the original value to reserved bits, instead of zero