Bug 170741 - Changes to i2c-piix4.c initialisation prevent loading of sp5100_tco watchdog driver on AMD SB800 chipset
Summary: Changes to i2c-piix4.c initialisation prevent loading of sp5100_tco watchdog ...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: I2C (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Drivers/I2C virtual user
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-19 09:17 UTC by Tim Small
Modified: 2018-07-10 12:02 UTC (History)
19 users (show)

See Also:
Kernel Version: 4.6.4
Tree: Mainline
Regression: No


Attachments
Patch to make i2c-piix4 coexist with sp5100_tco (10.64 KB, patch)
2017-03-29 12:52 UTC, Zoltan Boszormenyi
Details | Diff
Patch 1/3 to make i2c-piix4 coexist with sp5100_tco (1.69 KB, patch)
2017-04-03 07:54 UTC, Zoltan Boszormenyi
Details | Diff
Patch 2/3 to make i2c-piix4 coexist with sp5100_tco (5.21 KB, patch)
2017-04-03 07:54 UTC, Zoltan Boszormenyi
Details | Diff
Patch 3/3 to make i2c-piix4 coexist with sp5100_tco (5.12 KB, patch)
2017-04-03 07:55 UTC, Zoltan Boszormenyi
Details | Diff
Patch 1/5 Extend the request_region infrastructure (5.30 KB, patch)
2017-06-21 02:50 UTC, Zoltan Boszormenyi
Details | Diff
Patch 2/5 Modify behaviour of request_*muxed_region (1.56 KB, patch)
2017-06-21 02:51 UTC, Zoltan Boszormenyi
Details | Diff
Patch 3/5 Use request_declared_muxed_region() in the USB PCI quirks (1.52 KB, patch)
2017-06-21 02:52 UTC, Zoltan Boszormenyi
Details | Diff
Patch 4/5 Use request_declared_muxed_region() in i2c-piix4 (5.87 KB, patch)
2017-06-21 02:53 UTC, Zoltan Boszormenyi
Details | Diff
Patch 5/5 Use request_declared_muxed_region() in sp5100_tco (5.52 KB, patch)
2017-06-21 02:54 UTC, Zoltan Boszormenyi
Details | Diff
Patch 1/5 Extend the request_region infrastructure (6.09 KB, patch)
2017-06-22 13:24 UTC, Zoltan Boszormenyi
Details | Diff
Patch 2/5 Modify behaviour of request_*muxed_region (2.00 KB, patch)
2017-06-22 13:24 UTC, Zoltan Boszormenyi
Details | Diff
Patch 3/5 Use request_declared_muxed_region() in the USB PCI quirks (2.13 KB, patch)
2017-06-22 13:25 UTC, Zoltan Boszormenyi
Details | Diff
Patch 4/5 Use request_declared_muxed_region() in i2c-piix4 (5.44 KB, patch)
2017-06-22 13:25 UTC, Zoltan Boszormenyi
Details | Diff
Patch 5/5 Use request_declared_muxed_region() in sp5100_tco (5.67 KB, patch)
2017-06-22 13:26 UTC, Zoltan Boszormenyi
Details | Diff
Patch 1/5 Extend the request_region infrastructure (6.13 KB, patch)
2017-12-18 08:04 UTC, Zoltan Boszormenyi
Details | Diff
Patch 2/5 Modify behaviour of request_*muxed_region (2.03 KB, patch)
2017-12-18 08:05 UTC, Zoltan Boszormenyi
Details | Diff
Patch 3/5 Use request_declared_muxed_region() in the USB PCI quirks (2.17 KB, patch)
2017-12-18 08:05 UTC, Zoltan Boszormenyi
Details | Diff
Patch 4/5 Use request_declared_muxed_region() in i2c-piix4 (7.35 KB, patch)
2017-12-18 08:06 UTC, Zoltan Boszormenyi
Details | Diff
Patch 5/5 Use request_declared_muxed_region() in sp5100_tco (5.70 KB, patch)
2017-12-18 08:07 UTC, Zoltan Boszormenyi
Details | Diff

Description Tim Small 2016-09-19 09:17:55 UTC
On the AMD Turion N40L and other related SoCs, the i2c-piix4 driver now claims the 0xcd6 ioport, preventing the sp5100_tco watchdog driver from loading:

piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0
piix4_smbus 0000:00:14.0: Using register 0x2c for SMBus port selection
piix4_smbus 0000:00:14.0: Auxiliary SMBus Host Controller at 0xb20
sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05
sp5100_tco: PCI Vendor ID: 0x1002, Device ID: 0x4385, Revision ID: 0x42
sp5100_tco: I/O address 0x0cd6 already in use


This breaks watchdog operation on existing systems on upgrade and new deployments unless the i2c-piix4 driver is blacklisted.

See:

drivers/watchdog/sp5100_tco.c

tco_timer_enable(void)

(SB800_IO_PM_INDEX_REG is defined in drivers/watchdog/sp5100_tco.h)

This is the commit which prevents the watchdog driver from loading:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2fee61d22e606fc99ade9079fda15fdee83ec33e

See also AMD docs

45483_sb800_bdg_pub_3.03

Perhaps a fix is to switch both drivers from static to dynamic allocation of the IO ports in question, since the watchdog driver only accesses the port during initialisation (with backoff/retry maybe to avoid races?).
Comment 1 Paul Menzel 2017-01-29 22:07:18 UTC
I am seeing this too, and wonder why these commits haven’t been reverted yet.
Comment 2 Paul Menzel 2017-02-08 06:55:10 UTC
This issue is still present in Linux 4.10-rc6.
Comment 3 Paul Menzel 2017-02-08 07:07:26 UTC
In the Debian bug tracking system this issue has the number #853122.

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=853122
Comment 4 Paul Menzel 2017-02-10 22:00:39 UTC
Thorsten, this is a serious regression, and was reported five months ago. Do you know, if it is already tracked somewhere else?
Comment 5 Nehal Shah 2017-03-03 07:11:08 UTC
Is anybody is looking into this regression ? It is serious concern as it is breaking the functionality if both watchdog timer and i2c are required to be loaded at the same time.
Comment 6 Zoltan Boszormenyi 2017-03-29 12:52:44 UTC
Created attachment 255627 [details]
Patch to make i2c-piix4 coexist with sp5100_tco

The attached patch makes both i2c-piix4 and sp5100_tco coexist peacefully.
With the patch present in the kernel, the boot messages are as below:

kernel: piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0
kernel: piix4_smbus 0000:00:14.0: Using register 0x2c for SMBus port selection
kernel: piix4_smbus 0000:00:14.0: Auxiliary SMBus Host Controller at 0xb20
kernel: sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05
kernel: sp5100_tco: PCI Vendor ID: 0x1002, Device ID: 0x4385, Revision ID: 0x42
kernel: sp5100_tco: Using 0xfed80b00 for watchdog MMIO address
kernel: sp5100_tco: Last reboot was not triggered by watchdog.
kernel: sp5100_tco: initialized (0xffffb70d41925b00). heartbeat=60 sec (nowayout=0)

/dev/watchdog exists and work.

I tested it with samples/watchdog/watchdog-simple.c from the kernel sources.
After killing the watchdog-simple process, this appeared in the kernel:

kernel: sp5100_tco: Unexpected close, not stopping watchdog!

After reboot, the sp5100_tco driver message said this:

kernel: sp5100_tco: Last reboot was triggered by watchdog.
Comment 7 Zoltan Boszormenyi 2017-04-03 07:54:09 UTC
Created attachment 255731 [details]
Patch 1/3 to make i2c-piix4 coexist with sp5100_tco
Comment 8 Zoltan Boszormenyi 2017-04-03 07:54:34 UTC
Created attachment 255733 [details]
Patch 2/3 to make i2c-piix4 coexist with sp5100_tco
Comment 9 Zoltan Boszormenyi 2017-04-03 07:55:12 UTC
Created attachment 255735 [details]
Patch 3/3 to make i2c-piix4 coexist with sp5100_tco
Comment 10 Chris Horler 2017-04-16 09:47:27 UTC
patches tested on 4.11.0-rc6

chorler@foxtrot:~> dmesg | grep tco
[   16.643084] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05
[   16.643219] sp5100_tco: PCI Vendor ID: 0x1002, Device ID: 0x4385, Revision ID: 0x42
[   16.643226] sp5100_tco: I/O address 0x0cd6 already in use


I find the last message a little confusing
Comment 11 Zoltan Boszormenyi 2017-04-17 11:36:30 UTC
(In reply to Chris Horler from comment #10)
> patches tested on 4.11.0-rc6
> 
> chorler@foxtrot:~> dmesg | grep tco
> [   16.643084] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05
> [   16.643219] sp5100_tco: PCI Vendor ID: 0x1002, Device ID: 0x4385,
> Revision ID: 0x42
> [   16.643226] sp5100_tco: I/O address 0x0cd6 already in use
> 
> 
> I find the last message a little confusing

Something went wrong with your testing because after the 3rd patch is applied
this message is removed from sp5100_tco. Two other "MMIO address already in use"
messages are left but they have nothing to do with the I/O port pair 0xcd6-0xcd7.

You need all 3 patches attached to this issue.
Comment 12 Chris Horler 2017-04-17 13:37:05 UTC
you're 100% correct, I just checked the build log... I had the patches, but was still building a vanilla kernel.  I'll tweak and rebuild today/tomorrow and report back.

apologies!
Comment 13 Chris Horler 2017-04-17 19:49:43 UTC
rebuilt, I think it's now as you were expecting:

chorler@foxtrot:~> grep _tco sp5100_tco.log 
[   17.691272] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05
[   17.691488] sp5100_tco: PCI Vendor ID: 0x1002, Device ID: 0x4385, Revision ID: 0x42
[   17.691623] sp5100_tco: Using 0xfed80b00 for watchdog MMIO address
[   17.691638] sp5100_tco: Last reboot was not triggered by watchdog.
[   17.693445] sp5100_tco: initialized (0xffffa85e40655b00). heartbeat=60 sec (nowayout=0)
Comment 14 Zoltan Boszormenyi 2017-04-18 06:15:47 UTC
Yes. /dev/watchdog not exists and any watchdog daemon should work too.
Comment 15 Chris Horler 2017-04-18 18:36:52 UTC
(In reply to Zoltan Boszormenyi from comment #14)
> Yes. /dev/watchdog not exists and any watchdog daemon should work too.

yes, it exists.  I haven't tested systemd in that regard yet.
Comment 16 Zoltan Boszormenyi 2017-04-18 18:46:31 UTC
Damn typo. I wanted to write "/dev/watchdog no*w* exists".
The subsequent lines after the first two tell you that the driver actually
initialized the watchdog hardware.
Comment 17 Jeff Layton 2017-05-18 10:42:42 UTC
I'm hitting this error too on my new motherboard. I haven't tried the patches yet, but will plan to soon. Note that there _is_ a maintainer for WATCHDOG DEVICE DRIVERS in the MAINTAINERS file:

WATCHDOG DEVICE DRIVERS
M:      Wim Van Sebroeck <wim@iguana.be>
R:      Guenter Roeck <linux@roeck-us.net>
L:      linux-watchdog@vger.kernel.org
W:      http://www.linux-watchdog.org/
T:      git git://www.linux-watchdog.org/linux-watchdog.git
S:      Maintained
F:      Documentation/devicetree/bindings/watchdog/
F:      Documentation/watchdog/
F:      drivers/watchdog/
F:      include/linux/watchdog.h
F:      include/uapi/linux/watchdog.h

...it's probably worth re-sending these patches and making sure that maintainer and list are cc'ed?
Comment 18 Zoltan Boszormenyi 2017-05-18 10:58:45 UTC
(In reply to Jeff Layton from comment #17)
> I'm hitting this error too on my new motherboard. I haven't tried the
> patches yet, but will plan to soon. Note that there _is_ a maintainer for
> WATCHDOG DEVICE DRIVERS in the MAINTAINERS file:
> 
> WATCHDOG DEVICE DRIVERS
> M:      Wim Van Sebroeck <wim@iguana.be>
> R:      Guenter Roeck <linux@roeck-us.net>
> L:      linux-watchdog@vger.kernel.org
> W:      http://www.linux-watchdog.org/
> T:      git git://www.linux-watchdog.org/linux-watchdog.git
> S:      Maintained
> F:      Documentation/devicetree/bindings/watchdog/
> F:      Documentation/watchdog/
> F:      drivers/watchdog/
> F:      include/linux/watchdog.h
> F:      include/uapi/linux/watchdog.h
> 
> ...it's probably worth re-sending these patches and making sure that
> maintainer and list are cc'ed?

The list WAS cc-ed. The watchdog maintainer just needs to work on his mailing list backlog.
Comment 19 Marc Perrudin 2017-06-14 13:49:15 UTC
Hi,

I don't see the fix in the incoming 4.12 kernel. I need the watchdog, what the i2c-piix4 module is needed for. Can I blacklist it without problem ?

Thanks.
Comment 20 Marc Ranolfi 2017-06-20 10:17:51 UTC
Reviewed and tested by me with Linux 4.11.5 and 4.12rc6 for hardware ASUS Sabertooth 990FX R2.0. Works perfectly and the code is clear.

Though for consistency, I think the following blank lines should be removed:

drivers/i2c/busses/i2c-piix4.c:159
drivers/watchdog/sp5100_tco.c:148
drivers/watchdog/sp5100_tco.c:161

Also, I/O ports are specified in drivers/watchdog/sp5100_tco.c:45 but not in drivers/i2c/busses/i2c-piix4.c:148. Maybe these should be normalized.


Thanks Mr. Zoltán Böszörményi for the patches.


The watchdog maintainer is Mr. Wim Van Sebroeck, who's been rather absent for the past months. Can someone ping him? Or what else can we do to get this mainlined?
Having a non-operational hardware watchdog is a serious malfunction.

Also (offtopic), are Bugzilla bugs mirrored in the LKML or vice-versa?

Regards
Comment 21 Marc Ranolfi 2017-06-20 10:26:33 UTC
Oh... After posting the previous message, I saw a small info from Bugzilla displaying the list CC members. The maintainer wasn't included, so I added his email and also Mr. Guenter Roeck's email. Although I couldn't add linux-watchdog@vger.kernel.org - Bugzilla returns an error.

Perhaps Mr. Van Sebroeck and Mr. Roeck will take notice about this issue now.
Comment 22 Guenter Roeck 2017-06-20 13:45:19 UTC
This will require approval from the USB maintainer first, and v2 of that patch got it all wrong, as it removed a header file which declared 'extern struct mutex sb800_mutex;'. For my part, independent of the USB patch status, I am not inclined to accept an extern declaration in a source file.

WARNING: externs should be avoided in .c files
#51: FILE: drivers/watchdog/sp5100_tco.c:47:
+extern struct mutex sb800_mutex;

On top of that, the patch series makes the watchdog and i2c drivers dependent on USB_HOST. The series, as is, will thus cause lots of compile failures. The dependency itself is unnecessary and only needed because the introduced mutex resides in usb code, which it should not.

Overall I am (still) not sure I understand why request_muxed_region() can not be used to solve this problem.
Comment 23 Zoltan Boszormenyi 2017-06-20 15:18:50 UTC
The objection against the first version of the patches, i.e. the one that was
using request_muxed_region() was that the patch was not handling a potential
error from request_muxed_region() which can indeed fail, e.g. with an allocation
error when the memory is low.

The problem with handling the error from request_muxed_region() becomes obvious
if you look at how the USB PCI quirk code is used. If request_muxed_region()
fails then e.g. it would/should result in an URB transfer failure but it isn't
possible without major surgery to the USB code. The alternative is to proceed
without the locking provided by request_muxed_region(). Then there are two
choices: either go ahead and program the quirk without locking or you don't
program the quirk. It's unsafe if two kernel threads are programming the same
hardware that works through index/data register pairs, the index register can be
set to the wrong value for the other thread also expects it's the only one
doing it. Without programming the quirk, the result is the broken transfer the
comment talks about in pci-quirks.c about usb_amd_quirk_pll().

In the mail thread on LKML (linux-watchdog and linux-usb were Cc-ed) I have
said that it should be possible to implement a variant of request_muxed_region()
that accepts a pointer to an already allocated resource structure which can
possibly be on the stack instead of being allocated. That way the sole reason
of failure, the allocation error in request_muxed_region() would be eliminated
for the use case of these 3 drivers. No one replied to this idea, neither
positively nor negatively.

If we had such a request_muxed_region() variant that cannot fail and wait
just like mutex_lock(), then the impact on these 3 drivers would be the
same as with mutex_lock().

If this idea is acceptable, I can implement it.
Comment 24 Guenter Roeck 2017-06-20 18:04:13 UTC
There are several other possible solutions for the USB quirks problem. For example, the USB quirks code could read the data it needs from the ports once at startup instead of each time it is needed, or a dedicated non-failing API (to be used by all drivers) could be used to access the two ports.

I can not comment on the merits of implementing a non-failing version of request_muxed_region(). However, its absence is not an acceptable reason for creating other problems (such as unnecessary inter-dependencies between modules), nor for holding up a fix for the bug discussed here. The synchronization problem in the current USB quirks code is orthogonal to the problems observed here: Those problems can be fixed without addressing the USB quirks problem. The port access race condition of various drivers against the USB code is a different problem that can be solved separately (not that this would be optimal).
Comment 25 Zoltan Boszormenyi 2017-06-20 19:37:42 UTC
Unfortunately the interdependency is already there because 3 different
drivers want to use the same I/O port pair, possibly in parallel. It's how
these drivers were written. It's the responsibility of the 3 subsystem
developers to find a satisfactory resolution for the long term. This BZ
ticket is just the messenger so don't shoot. :-)

Maybe, just maybe the best way to eliminate the regression would still be
to revert commit 2fee61d22e606fc99ade9079fda15fdee83ec33e as it caused
the regression in the first place in 4.4-rc4. Almost two years ago and
no one has responded to the bug reports.

sp5100_tco was there first to request_region() for 0xcd6/0xcd7.
The USB quirks code doesn't even do locking for these ports and
it can get away with it. Perhaps the same should be true for i2c-piix4
as a bandaid.

I don't want to create a common platform device with the sole purpose
to provide a common lock for these drivers. The inter-dependency would
also be there this way. As a more abstract example, you shouldn't be
able to use more than one filesystem types on the same harddisk, because
they use the same common hardware resource. It's an obvious interdependency
between EXT4 and VFAT!

In the meantime, I implemented the non-failing version of request_muxed_region()
Actualy, I modified the current __request_region() so a new IORESOURCE_MUXED
request always goes to sleep but complains loudly if a conflicting resource
doesn't have the IORESOURCE_MUXED flag. This is one failure case.
The advantage of this change is that we can know what causes a resource
conflict. It works just like before when all users of the same I/O region
use IORESOURCE_MUXED. The main parts of __request_region() is factored out
into a new function that doesn't allocate, the new function accepts a
struct resource *, the actual structure can be anywhere. It's compiling at
the moment. I will report back after I installed and booted the patched
4.11.6 kernel. The patches were created against the current Linux GIT.
Comment 26 Guenter Roeck 2017-06-20 20:44:57 UTC
Yes, the interdependency is there. That doesn't mean it has to be fixed with a single patch or a single patch series. It would be much easier to address the immediate problem first (ie the watchdog and i2c drivers blocking each other) and then discuss a solution that fixes all three.

That is, of course, just my personal opinion. All I can say is that I won't accept a watchdog driver change which declares an external variable in a source file, nor one that creates a dependency against a logically unrelated subsystem.
Comment 27 Marc Ranolfi 2017-06-21 01:19:10 UTC
I agree with Roeck that implementing the mutex in USB_HOST is wrong.

And while I agree that commit 2fee61d22e606fc99ade9079fda15fdee83ec33e is far from ideal, I also believe that reverting it is out of question at this point since that would itself be a regression (we shouldn't break current hw sensor support).

I also agree that the optimal course of action now is to focus on sorting the i2c-piix4/sp5100_tco conflict as a priority. There's still a few days ahead before kernel 4.12-rc7.

Regarding the proposed "bandaid", it is a bad idea indeed.

Though I didn't know about declaring extern's in C files before, why wasn't the 'sb800_mutex' mutex declared in the header file? sp5100_tco.h exists (and i2c-piix4.h doesn't exist yet - but creating should be fine).

Also, Böszörményi, Where is the review for the first version? I'd like to read it. If an of out-of-memory scenario can cause request_muxed_region() to fail, I'm willing to learn how others dealt with this elsewhere.
Comment 28 Zoltan Boszormenyi 2017-06-21 02:50:40 UTC
Created attachment 257101 [details]
Patch 1/5 Extend the request_region infrastructure
Comment 29 Zoltan Boszormenyi 2017-06-21 02:51:25 UTC
Created attachment 257103 [details]
Patch 2/5 Modify behaviour of request_*muxed_region
Comment 30 Zoltan Boszormenyi 2017-06-21 02:52:49 UTC
Created attachment 257105 [details]
Patch 3/5 Use request_declared_muxed_region() in the USB PCI quirks
Comment 31 Zoltan Boszormenyi 2017-06-21 02:53:43 UTC
Created attachment 257107 [details]
Patch 4/5 Use request_declared_muxed_region() in i2c-piix4
Comment 32 Zoltan Boszormenyi 2017-06-21 02:54:14 UTC
Created attachment 257109 [details]
Patch 5/5 Use request_declared_muxed_region() in sp5100_tco
Comment 33 Zoltan Boszormenyi 2017-06-21 03:06:29 UTC
The new patch series extends and modifies the request_*region() handling
the way I wrote above.

The driver patches don't use a common mutex anymore, so there is no
inter-dependency any more than with using request_region() to exclude
each other from using the same I/O ports. They are using the new
request_declared_muxed_region() macro and the pre-existing
DEFINE_RES_IO_NAMED() static initializer macro for declaring their
struct resource on the stack.

It boots and works.

[    5.687884] piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0
[    5.687891] piix4_smbus 0000:00:14.0: Using register 0x2c for SMBus port selection
[    5.688266] piix4_smbus 0000:00:14.0: Auxiliary SMBus Host Controller at 0xb20
[    5.696899] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05
[    5.696966] sp5100_tco: PCI Vendor ID: 0x1002, Device ID: 0x4385, Revision ID: 0x42
[    5.697002] sp5100_tco: Using 0xfed80b00 for watchdog MMIO address
[    5.697012] sp5100_tco: Last reboot was not triggered by watchdog.
[    5.697776] sp5100_tco: initialized (0xffffba2f4192db00). heartbeat=60 sec (nowayout=0)
Comment 34 Zoltan Boszormenyi 2017-06-21 03:30:44 UTC
(In reply to Marcelo "Marc" Ranolfi from comment #27)
> Also, Böszörményi, Where is the review for the first version? I'd like to
> read it. If an of out-of-memory scenario can cause request_muxed_region() to
> fail, I'm willing to learn how others dealt with this elsewhere.

The first patch series I sent out was here:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1368060.html
My memory was failing, the first version of the patches also used a common mutex.
Comment 35 Marc Ranolfi 2017-06-21 03:54:24 UTC
Thanks! Looks good, I'm testing this first thing tomorrow.

Can you please have a look at this question I opened earlier on Unix Stack Exchange? It's about version 2 of your patch series, but still. https://unix.stackexchange.com/questions/372376/watchdog-reboot-not-properly-reported
Comment 36 Zoltan Boszormenyi 2017-06-21 04:45:48 UTC
Sorry, I don't know what caused that message. I am using the Fedora kernel
SRPM to test my patches and the v2 patchset was properly reporting that
the last reboot was caused by the watchdog after I killed the simple watchdog
daemon back then when I was testing it. This v3 should not be different in
this regard.
Comment 37 Marc Ranolfi 2017-06-21 23:17:24 UTC
Tested and working.

On system reboot, this is reported:
"sp5100_tco: Unexpected close, not stopping watchdog!"

However, following a web search, I found this bug was present even before the offending commit 2fee61d22e606fc99ade9079fda15fdee83ec33e. Safe to ignore for now, since the machine reboots as expected. Only happens when rebooting (shutdown not affected).

Some considerations about the code:

patch #1
a) declarations in ioport.h:227 and ioport.h:230 should probably be switched (in order) for consistency.
b) in resource.c, function __request_region() should probably be left where it was previously. The new function __request_declared_region() in lines 1124-1171 should be moved below that.

patch # 2
c) a high-level description would be nice to have near lines 1117-1119.
d) It appears to me that ` & IORESOURCE_MUXED` is redundant in the context of file resource.c:1154. `if (!conflict->flags)` should suffice. I'm not sure that's the condition you intended, though.

patch #4
f) re-definition of `res` in file i2c-piix4.c:349, previously defined in line 263.
Comment 38 Marc Ranolfi 2017-06-22 00:28:09 UTC
Gah, I should have remembered bitwise operators. Please disregard me previous point d). Also I went from d) to f), sorry about that too.

I'll abstain from making further code reviews for now.
Comment 39 Zoltan Boszormenyi 2017-06-22 11:21:29 UTC
(In reply to Marcelo "Marc" Ranolfi from comment #37)
> Tested and working.

Thanks for testing.

> On system reboot, this is reported:
> "sp5100_tco: Unexpected close, not stopping watchdog!"
> 
> However, following a web search, I found this bug was present even before
> the offending commit 2fee61d22e606fc99ade9079fda15fdee83ec33e. Safe to
> ignore for now, since the machine reboots as expected. Only happens when
> rebooting (shutdown not affected).
> 
> Some considerations about the code:
> 
> patch #1
> a) declarations in ioport.h:227 and ioport.h:230 should probably be switched
> (in order) for consistency.

Done.

Also added new macros to make the extended API complete:
request_declared_region() and release_declared_region()

> b) in resource.c, function __request_region() should probably be left where
> it was previously. The new function __request_declared_region() in lines
> 1124-1171 should be moved below that.

Done. The code churn in the patch is a little less now.

> 
> patch # 2
> c) a high-level description would be nice to have near lines 1117-1119.

Done, added extra doc to the first patch for the extra macros and
extended it in the second patch for the behaviour change.

> d) It appears to me that ` & IORESOURCE_MUXED` is redundant in the context
> of file resource.c:1154. `if (!conflict->flags)` should suffice. I'm not
> sure that's the condition you intended, though.

You already noticed that it's a bitwise AND and not a logical AND.
"flags" has bits inside.

> 
> patch #4
> f) re-definition of `res` in file i2c-piix4.c:349, previously defined in
> line 263.

Fixed, it's moved to a global static variable.
This was done in all of the patches. The search for conflicting regions
only checks the start and end values but not the addresses of resource
descriptors so it should be safe(TM).

It's building now and if testing shows no problems I will post new patches
in mail and here, too.
Comment 40 Zoltan Boszormenyi 2017-06-22 13:24:05 UTC
Created attachment 257119 [details]
Patch 1/5 Extend the request_region infrastructure
Comment 41 Zoltan Boszormenyi 2017-06-22 13:24:36 UTC
Created attachment 257121 [details]
Patch 2/5 Modify behaviour of request_*muxed_region
Comment 42 Zoltan Boszormenyi 2017-06-22 13:25:00 UTC
Created attachment 257123 [details]
Patch 3/5 Use request_declared_muxed_region() in the USB PCI quirks
Comment 43 Zoltan Boszormenyi 2017-06-22 13:25:26 UTC
Created attachment 257125 [details]
Patch 4/5 Use request_declared_muxed_region() in i2c-piix4
Comment 44 Zoltan Boszormenyi 2017-06-22 13:26:13 UTC
Created attachment 257127 [details]
Patch 5/5 Use request_declared_muxed_region() in sp5100_tco
Comment 45 Tim Small 2017-11-08 10:26:15 UTC
Unfortunately I haven't had time to test the latest round of patches from Zoltan, due to personal circumstances, but I can do-so if this would still be helpful?
Comment 46 Zoltan Boszormenyi 2017-12-18 08:04:48 UTC
Created attachment 261229 [details]
Patch 1/5 Extend the request_region infrastructure
Comment 47 Zoltan Boszormenyi 2017-12-18 08:05:28 UTC
Created attachment 261231 [details]
Patch 2/5 Modify behaviour of request_*muxed_region
Comment 48 Zoltan Boszormenyi 2017-12-18 08:05:55 UTC
Created attachment 261233 [details]
Patch 3/5 Use request_declared_muxed_region() in the USB PCI quirks
Comment 49 Zoltan Boszormenyi 2017-12-18 08:06:34 UTC
Created attachment 261235 [details]
Patch 4/5 Use request_declared_muxed_region() in i2c-piix4
Comment 50 Zoltan Boszormenyi 2017-12-18 08:07:08 UTC
Created attachment 261237 [details]
Patch 5/5 Use request_declared_muxed_region() in sp5100_tco
Comment 51 Zoltan Boszormenyi 2017-12-18 08:07:43 UTC
New patchset is attached that applies to 4.15-rc4.
Comment 52 Andreas 2018-02-08 21:08:21 UTC
I tried your patchset from 2017-12-18 on gentoo-sources kernel 4.15 and this was the result (I additionally activated i2c debug output):

# cat dmesg-t | grep -e i2c -e piix4 -e sp5100 -e SB800
i2c-core: driver [dummy] registered
i2c i2c-0: adapter [AMDGPU DM i2c hw bus 0] registered
i2c i2c-1: adapter [dmdc] registered
i2c i2c-2: adapter [AMDGPU DM i2c hw bus 1] registered
i2c i2c-3: adapter [dmdc] registered
i2c i2c-4: adapter [AMDGPU DM i2c hw bus 2] registered
i2c i2c-5: adapter [AMDGPU DM i2c hw bus 3] registered
i2c i2c-6: adapter [AMDGPU DM i2c hw bus 4] registered
i2c i2c-6: master_xfer[0] W, addr=0x50, len=1
i2c i2c-6: master_xfer[1] R, addr=0x50, len=1
i2c i2c-6: master_xfer[0] W, addr=0x50, len=1
i2c i2c-6: master_xfer[1] R, addr=0x50, len=128
i2c i2c-6: master_xfer[0] W, addr=0x50, len=1
i2c i2c-6: master_xfer[1] R, addr=0x50, len=128
piix4_smbus 0000:00:14.0: Using SMI# for SMBus
piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0
piix4_smbus 0000:00:14.0: Using register 0x02 for SMBus port selection
i2c i2c-7: adapter [SMBus PIIX4 adapter port 0 at 0b00] registered
i2c i2c-8: adapter [SMBus PIIX4 adapter port 2 at 0b00] registered
i2c i2c-9: adapter [SMBus PIIX4 adapter port 3 at 0b00] registered
i2c i2c-10: adapter [SMBus PIIX4 adapter port 4 at 0b00] registered
sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05
sp5100_tco: PCI Vendor ID: 0x1022, Device ID: 0x790b, Revision ID: 0x59
sp5100_tco: failed to find MMIO address, giving up.

For references, this was before the patch (without i2c debug):
# cat dmesg-t | grep -e i2c -e piix4 -e sp5100 -e SB800                                                                                                                                                                                     
i2c-core: driver [dummy] registered                                                                                                                                                                                                         
piix4_smbus 0000:00:14.0: Using SMI# for SMBus                                                                                                                                                                                              
piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0                                                                                                                                                                        
piix4_smbus 0000:00:14.0: Using register 0x02 for SMBus port selection                                                                                                                                                                      
i2c i2c-0: adapter [SMBus PIIX4 adapter port 0 at 0b00] registered                                                                                                                                                                          
i2c i2c-1: adapter [SMBus PIIX4 adapter port 2 at 0b00] registered
i2c i2c-2: adapter [SMBus PIIX4 adapter port 3 at 0b00] registered
i2c i2c-3: adapter [SMBus PIIX4 adapter port 4 at 0b00] registered
sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05
sp5100_tco: PCI Vendor ID: 0x1022, Device ID: 0x790b, Revision ID: 0x59
sp5100_tco: I/O address 0x0cd6 already in use

# cat dmesg-t | grep -e 0cd6
system 00:05: [io  0x0cd6-0x0cd7] has been reserved
sp5100_tco: I/O address 0x0cd6 already in use

Now, piix4_smbus is not i2c_piix4... Still, address 0x0cd6 is part of the x86 system debug (which one? pci? pnp?), and even if I don't seem to grasp which driver is using this address, sp5100_tco remains unable to pick it up. So there are other problems as well.

What can I do to help?
Comment 53 Andreas 2018-02-08 21:11:42 UTC
Fount it: it's smba_idx

# cat /proc/ioports | grep -e 0cd6
  0cd6-0cd7 : pnp 00:05
    0cd6-0cd7 : smba_idx

There's mothing in dmesg about smba_idx...
Comment 54 Guenter Roeck 2018-02-08 22:24:12 UTC
smba_idx is the i2c-piix4 driver. The statement "piix4_smbus is not i2c_piix4" does not really make sense since the i2c-piix4 driver uses piix4_smbus as its driver name.
The out-of-tree i2c driver patches are still needed in v4.15 (and in ToT) since none of the patches submitted to fix the problem (neither Zoltan's patch nor mine) have been accepted upstream.
The sp5100_tco watchdog driver was reworked substantially in the upstream kernel. I would suggest to use the latest version from ToT for any tests going forward.
Comment 55 Andreas 2018-02-08 23:54:37 UTC
Well, then the patches from Zoltan Boszormenyi from 2017-12-18 don't work for me:

sp5100_tco: failed to find MMIO address, giving up.

What's wrong?

This is kernel 4.15 I'm using. My hardware is an ASUS PRIME X370-PRO mainboard and an AMD Ryzen 7 1800X processor. UEFI firmware is the latest version. Graphics card is an AMD Radeon RX Vega 64, driver AMDGPU+DC.

What does ToT mean?
Comment 56 Guenter Roeck 2018-02-09 01:30:23 UTC
ToT means "Top of Tree" and refers to the upstream Linux kernel.

As for "what is wrong", please refer to the post-v4.15 upstream commits in drivers/watchdog/sp5100_tco.c. In a nutshell, though, the driver in v4.15 does not support Ryzen CPUs.

Note that even with Ryzen support there is no guarantee that the driver actually works. For example, it is disabled on various MSI boards such as B350M MORTAR or B350 TOMAHAWK.
Comment 57 Andreas 2018-02-11 19:20:38 UTC
(In reply to Guenter Roeck from comment #56)
> ToT means "Top of Tree" and refers to the upstream Linux kernel.

Thanks.

> As for "what is wrong", please refer to the post-v4.15 upstream commits in
> drivers/watchdog/sp5100_tco.c. In a nutshell, though, the driver in v4.15
> does not support Ryzen CPUs.

Thanks for the summary. I'll just have to wait then... and hope that it will be made Ryzen compatible.

> Note that even with Ryzen support there is no guarantee that the driver
> actually works. For example, it is disabled on various MSI boards such as
> B350M MORTAR or B350 TOMAHAWK.

Yes, it already occured to me that it could also be dependant on the actual BIOS/UEFI implementation and/or version. Very sad that vendors don't test their firmware for Linux compatibility...
Comment 58 Guenter Roeck 2018-02-11 20:13:08 UTC
(In reply to Andreas from comment #57)
> (In reply to Guenter Roeck from comment #56)
> > ToT means "Top of Tree" and refers to the upstream Linux kernel.
> 
> Thanks.
> 
> > As for "what is wrong", please refer to the post-v4.15 upstream commits in
> > drivers/watchdog/sp5100_tco.c. In a nutshell, though, the driver in v4.15
> > does not support Ryzen CPUs.
> 
> Thanks for the summary. I'll just have to wait then... and hope that it will
> be made Ryzen compatible.
> 
It will be, in v4.16. However, you'll still need the piix4 i2c changes, which won't be in v4.16.

> > Note that even with Ryzen support there is no guarantee that the driver
> > actually works. For example, it is disabled on various MSI boards such as
> > B350M MORTAR or B350 TOMAHAWK.
> 
> Yes, it already occured to me that it could also be dependant on the actual
> BIOS/UEFI implementation and/or version. Very sad that vendors don't test
> their firmware for Linux compatibility...

AFAICS that is on purpose. The MSI boards use the watchdog on the Super-IO chip.
Comment 59 Przemek 2018-04-04 07:15:51 UTC
(In reply to Guenter Roeck from comment #58)
 
> It will be, in v4.16. However, you'll still need the piix4 i2c changes,
> which won't be in v4.16.

Guenter,
so seeing in 4.16 dmesg output:

"[   14.571240] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver
[   14.571349] sp5100-tco sp5100-tco: Using 0xfed80b00 for watchdog MMIO address
[   14.571354] sp5100-tco sp5100-tco: Watchdog hardware is disabled"

I should not worry about it since sp5100 was working previously on 4.15.14:

"[   13.507867] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05
[   13.508519] sp5100_tco: PCI Vendor ID: 0x1022, Device ID: 0x780b, Revision ID: 0x42
[   13.508555] sp5100_tco: Using 0xfeb00000 for watchdog MMIO address
[   13.508562] sp5100_tco: Last reboot was triggered by watchdog.
[   13.509527] sp5100_tco: initialized (0x00000000b22ce291). heartbeat=60 sec (nowayout=0)"

Gentoo ~amd64, kernel 4.16, Lenovo A6-6310 APU netbook, AMD SB700/SB800/Hudson-2/3, CONFIG_ATA_PIIX=y, CONFIG_I2C_PIIX4=y, CONFIG_SP5100_TCO=m.

Thanks,
Przemek.
Comment 60 Andreas 2018-04-08 14:36:46 UTC
Patch 5/5 fails on 4.16.0.

This is my patch attempt with all the patches provided:

kernel-patches/patch_1_of_5.patch                                                                                                                                                                                                           
patching file include/linux/ioport.h                                                                                                                                                                                                        
patching file kernel/resource.c                                                                                                                                                                                                             
Hunk #3 succeeded at 1134 (offset 1 line).                                                                                                                                                                                                  
Hunk #4 succeeded at 1163 (offset 1 line).                                                                                                                                                                                                  
Hunk #5 succeeded at 1172 (offset 1 line).                                                                                                                                                                                                  
Hunk #6 succeeded at 1220 (offset 1 line).                                                                                                                                                                                                  
kernel-patches/patch_2_of_5.patch                                                                                                                                                                                                           
patching file kernel/resource.c                                                                                                                                                                                                             
Hunk #1 succeeded at 1144 (offset 1 line).                                                                                                                                                                                                  
Hunk #2 succeeded at 1211 (offset 1 line).                                                                                                                                                                                                  
kernel-patches/patch_3_of_5.patch                                                                                                                                                                                                           
patching file drivers/usb/host/pci-quirks.c                                                                                                                                                                                                 
Hunk #1 succeeded at 315 (offset 17 lines).                                                                                                                                                                                                 
Hunk #2 succeeded at 352 (offset 17 lines).                                                                                                                                                                                                 
kernel-patches/patch_4_of_5.patch                                                                                                                                                                                                           
patching file drivers/i2c/busses/i2c-piix4.c
kernel-patches/patch_5_of_5.patch
patching file drivers/watchdog/sp5100_tco.c
Hunk #1 FAILED at 48.
Hunk #2 succeeded at 73 (offset 3 lines).
Hunk #3 FAILED at 144.
Hunk #4 FAILED at 155.
Hunk #5 FAILED at 169.
Hunk #6 FAILED at 366.
Hunk #7 FAILED at 385.
Hunk #8 FAILED at 405.
Hunk #9 FAILED at 413.
Hunk #10 FAILED at 434.
Hunk #11 FAILED at 474.
Hunk #12 FAILED at 522.
Hunk #13 FAILED at 536.
12 out of 13 hunks FAILED -- saving rejects to file drivers/watchdog/sp5100_tco.c.rej

How should I proceed?
Comment 61 Guenter Roeck 2018-04-10 16:49:36 UTC
(In reply to Przemek from comment #59)
> (In reply to Guenter Roeck from comment #58)
>  
> > It will be, in v4.16. However, you'll still need the piix4 i2c changes,
> > which won't be in v4.16.
> 
> Guenter,
> so seeing in 4.16 dmesg output:
> 
> "[   14.571240] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver
> [   14.571349] sp5100-tco sp5100-tco: Using 0xfed80b00 for watchdog MMIO
> address
> [   14.571354] sp5100-tco sp5100-tco: Watchdog hardware is disabled"
> 
> I should not worry about it since sp5100 was working previously on 4.15.14:
> 
> "[   13.507867] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05
> [   13.508519] sp5100_tco: PCI Vendor ID: 0x1022, Device ID: 0x780b,
> Revision ID: 0x42
> [   13.508555] sp5100_tco: Using 0xfeb00000 for watchdog MMIO address
> [   13.508562] sp5100_tco: Last reboot was triggered by watchdog.
> [   13.509527] sp5100_tco: initialized (0x00000000b22ce291). heartbeat=60
> sec (nowayout=0)"
> 

Was it _really_ working, or just loading ? What happens if you trigger a watchdog timeout with 4.15.14 ? Does it reboot the system ?
Comment 62 Przemek 2018-04-11 09:59:48 UTC
(In reply to Guenter Roeck from comment #61)
> Was it _really_ working, or just loading ? What happens if you trigger a
> watchdog timeout with 4.15.14 ? Does it reboot the system ?

I think that you are correct. IMHO this is the case that is covered by this bug report and I think that my question was relevant if I should worry about message in dmesg that 'Watchdog hardware is disabled' or not. I just assumed that there is a big rework in sp5100 driver.

I could not apply last 5th patch on 4.16 because of rejected hunks thus impossible for me to test it out.

I am not in a hurry but wanted to make sure if I am not mistaken.

Command outputs on 4.15.14:

"wd_identify -v --config-file /root/.test.conf
wd_identify: String 'watchdog-device' found as '/dev/watchdog'
wd_identify: String 'log-dir' found as '/var/log/watchdog'
wd_identify: Variable 'realtime' found as 'yes' = 1
wd_identify: Integer 'priority' found = 1
SP5100 TCO timer
watchdog was set to 60 seconds"


"watchdog -v --config-file /root/.test.conf
watchdog: String 'watchdog-device' found as '/dev/watchdog'
watchdog: String 'log-dir' found as '/var/log/watchdog'
watchdog: Variable 'realtime' found as 'yes' = 1
watchdog: Integer 'priority' found = 1"

"dmesg | grep sp5100
[   13.136877] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05
[   13.136977] sp5100_tco: PCI Vendor ID: 0x1022, Device ID: 0x780b, Revision ID: 0x42
[   13.137008] sp5100_tco: Using 0xfeb00000 for watchdog MMIO address
[   13.137015] sp5100_tco: Last reboot was triggered by watchdog.
[   13.137088] sp5100_tco: initialized (0x000000000e62c704). heartbeat=60 sec (nowayout=0)
[ 1100.329890] sp5100_tco: Unexpected close, not stopping watchdog!"


"wd_identify --config-file /root/.test.conf
wd_identify: cannot open /dev/watchdog (errno = 16 = 'Device or resource busy')"

The watchdog(daemon?) is still running according to 'top'.

Thanks,
Przemek.
Comment 63 Guenter Roeck 2018-04-11 13:37:18 UTC
(In reply to Przemek from comment #62)
> (In reply to Guenter Roeck from comment #61)
> > Was it _really_ working, or just loading ? What happens if you trigger a
> > watchdog timeout with 4.15.14 ? Does it reboot the system ?
> 
> I think that you are correct. IMHO this is the case that is covered by this
> bug report and I think that my question was relevant if I should worry about
> message in dmesg that 'Watchdog hardware is disabled' or not. I just assumed
> that there is a big rework in sp5100 driver.
> 
> I could not apply last 5th patch on 4.16 because of rejected hunks thus
> impossible for me to test it out.
> 

None of the patches submitted here will make it into the kernel. You'll need to apply commits 04b6fcaba346 and 0e89b2fec748 from mainline.

...

> 
> The watchdog(daemon?) is still running according to 'top'.
> 

You'll have to stop the watchdog daemon, then access /dev/watchdog.
Comment 64 Marc Ranolfi 2018-04-14 00:55:05 UTC
Guenter, you said previously (2018-02-08) that your own patches to fix this one were not accepted upstream. Why weren't they?

If this is still an ongoing issue I'd be willing to tackle on it.
Comment 65 Guenter Roeck 2018-04-14 01:57:00 UTC
The patches have now been accepted and are available in the mainline kernel.

04b6fcaba346 i2c: piix4: Use request_muxed_region
0e89b2fec748 i2c: piix4: Use usleep_range()
Comment 66 Przemek 2018-04-15 09:00:22 UTC
Just to inform you, it seems that I was totally wrong, and Guenter was right. If those are the only pathes that are missing from 4.16 to make sp5100 work, applying them does not make any change. Watchog hardware is still disabled, "/dev/watchdog" does not exist. In both kernels sp5100 diver is loaded.

I apologize for hijacking this bug report.

I was trying to trigger watchog timeout manually on 4.15.14 by "echo '\0x00' > /dev/watchdog" with watchdog daemon turned off, but received same message "sp5100_tco: Unexpected close, not stopping watchdog!"

If I am doing everything right, it becoming obvious that this is impossible to manually play with this hardware. This is a Lenovo netbook, and there is anything in the machine bios to setup watchdog.

If anyone has another idea and if I should open another bug report to work this out please let me know.

lspci 4.16.2 with patches applied:

"00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:780b] (rev 42)
        Subsystem: Lenovo FCH SMBus Controller [17aa:3801]
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Kernel modules: sp5100_tco"

Once again I'm sorry for whole this mess.

Thanks,
Przemek.
Comment 67 Guenter Roeck 2018-04-15 13:53:21 UTC
The 4.15 message is ok and as expected. Question is if the system reboots subsequently.
Comment 68 Przemek 2018-04-15 16:40:05 UTC
(In reply to Guenter Roeck from comment #67)
> The 4.15 message is ok and as expected. Question is if the system reboots
> subsequently.

No it doesn't. That's why I've reached a grim conclusion about it.

Thanks,
Przemek.

Note You need to log in before you can comment on or make changes to this bug.