On the AMD Turion N40L and other related SoCs, the i2c-piix4 driver now claims the 0xcd6 ioport, preventing the sp5100_tco watchdog driver from loading: piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0 piix4_smbus 0000:00:14.0: Using register 0x2c for SMBus port selection piix4_smbus 0000:00:14.0: Auxiliary SMBus Host Controller at 0xb20 sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05 sp5100_tco: PCI Vendor ID: 0x1002, Device ID: 0x4385, Revision ID: 0x42 sp5100_tco: I/O address 0x0cd6 already in use This breaks watchdog operation on existing systems on upgrade and new deployments unless the i2c-piix4 driver is blacklisted. See: drivers/watchdog/sp5100_tco.c tco_timer_enable(void) (SB800_IO_PM_INDEX_REG is defined in drivers/watchdog/sp5100_tco.h) This is the commit which prevents the watchdog driver from loading: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2fee61d22e606fc99ade9079fda15fdee83ec33e See also AMD docs 45483_sb800_bdg_pub_3.03 Perhaps a fix is to switch both drivers from static to dynamic allocation of the IO ports in question, since the watchdog driver only accesses the port during initialisation (with backoff/retry maybe to avoid races?).
I am seeing this too, and wonder why these commits haven’t been reverted yet.
This issue is still present in Linux 4.10-rc6.
In the Debian bug tracking system this issue has the number #853122. [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=853122
Thorsten, this is a serious regression, and was reported five months ago. Do you know, if it is already tracked somewhere else?
Is anybody is looking into this regression ? It is serious concern as it is breaking the functionality if both watchdog timer and i2c are required to be loaded at the same time.
Created attachment 255627 [details] Patch to make i2c-piix4 coexist with sp5100_tco The attached patch makes both i2c-piix4 and sp5100_tco coexist peacefully. With the patch present in the kernel, the boot messages are as below: kernel: piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0 kernel: piix4_smbus 0000:00:14.0: Using register 0x2c for SMBus port selection kernel: piix4_smbus 0000:00:14.0: Auxiliary SMBus Host Controller at 0xb20 kernel: sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05 kernel: sp5100_tco: PCI Vendor ID: 0x1002, Device ID: 0x4385, Revision ID: 0x42 kernel: sp5100_tco: Using 0xfed80b00 for watchdog MMIO address kernel: sp5100_tco: Last reboot was not triggered by watchdog. kernel: sp5100_tco: initialized (0xffffb70d41925b00). heartbeat=60 sec (nowayout=0) /dev/watchdog exists and work. I tested it with samples/watchdog/watchdog-simple.c from the kernel sources. After killing the watchdog-simple process, this appeared in the kernel: kernel: sp5100_tco: Unexpected close, not stopping watchdog! After reboot, the sp5100_tco driver message said this: kernel: sp5100_tco: Last reboot was triggered by watchdog.
Created attachment 255731 [details] Patch 1/3 to make i2c-piix4 coexist with sp5100_tco
Created attachment 255733 [details] Patch 2/3 to make i2c-piix4 coexist with sp5100_tco
Created attachment 255735 [details] Patch 3/3 to make i2c-piix4 coexist with sp5100_tco
patches tested on 4.11.0-rc6 chorler@foxtrot:~> dmesg | grep tco [ 16.643084] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05 [ 16.643219] sp5100_tco: PCI Vendor ID: 0x1002, Device ID: 0x4385, Revision ID: 0x42 [ 16.643226] sp5100_tco: I/O address 0x0cd6 already in use I find the last message a little confusing
(In reply to Chris Horler from comment #10) > patches tested on 4.11.0-rc6 > > chorler@foxtrot:~> dmesg | grep tco > [ 16.643084] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05 > [ 16.643219] sp5100_tco: PCI Vendor ID: 0x1002, Device ID: 0x4385, > Revision ID: 0x42 > [ 16.643226] sp5100_tco: I/O address 0x0cd6 already in use > > > I find the last message a little confusing Something went wrong with your testing because after the 3rd patch is applied this message is removed from sp5100_tco. Two other "MMIO address already in use" messages are left but they have nothing to do with the I/O port pair 0xcd6-0xcd7. You need all 3 patches attached to this issue.
you're 100% correct, I just checked the build log... I had the patches, but was still building a vanilla kernel. I'll tweak and rebuild today/tomorrow and report back. apologies!
rebuilt, I think it's now as you were expecting: chorler@foxtrot:~> grep _tco sp5100_tco.log [ 17.691272] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05 [ 17.691488] sp5100_tco: PCI Vendor ID: 0x1002, Device ID: 0x4385, Revision ID: 0x42 [ 17.691623] sp5100_tco: Using 0xfed80b00 for watchdog MMIO address [ 17.691638] sp5100_tco: Last reboot was not triggered by watchdog. [ 17.693445] sp5100_tco: initialized (0xffffa85e40655b00). heartbeat=60 sec (nowayout=0)
Yes. /dev/watchdog not exists and any watchdog daemon should work too.
(In reply to Zoltan Boszormenyi from comment #14) > Yes. /dev/watchdog not exists and any watchdog daemon should work too. yes, it exists. I haven't tested systemd in that regard yet.
Damn typo. I wanted to write "/dev/watchdog no*w* exists". The subsequent lines after the first two tell you that the driver actually initialized the watchdog hardware.
I'm hitting this error too on my new motherboard. I haven't tried the patches yet, but will plan to soon. Note that there _is_ a maintainer for WATCHDOG DEVICE DRIVERS in the MAINTAINERS file: WATCHDOG DEVICE DRIVERS M: Wim Van Sebroeck <wim@iguana.be> R: Guenter Roeck <linux@roeck-us.net> L: linux-watchdog@vger.kernel.org W: http://www.linux-watchdog.org/ T: git git://www.linux-watchdog.org/linux-watchdog.git S: Maintained F: Documentation/devicetree/bindings/watchdog/ F: Documentation/watchdog/ F: drivers/watchdog/ F: include/linux/watchdog.h F: include/uapi/linux/watchdog.h ...it's probably worth re-sending these patches and making sure that maintainer and list are cc'ed?
(In reply to Jeff Layton from comment #17) > I'm hitting this error too on my new motherboard. I haven't tried the > patches yet, but will plan to soon. Note that there _is_ a maintainer for > WATCHDOG DEVICE DRIVERS in the MAINTAINERS file: > > WATCHDOG DEVICE DRIVERS > M: Wim Van Sebroeck <wim@iguana.be> > R: Guenter Roeck <linux@roeck-us.net> > L: linux-watchdog@vger.kernel.org > W: http://www.linux-watchdog.org/ > T: git git://www.linux-watchdog.org/linux-watchdog.git > S: Maintained > F: Documentation/devicetree/bindings/watchdog/ > F: Documentation/watchdog/ > F: drivers/watchdog/ > F: include/linux/watchdog.h > F: include/uapi/linux/watchdog.h > > ...it's probably worth re-sending these patches and making sure that > maintainer and list are cc'ed? The list WAS cc-ed. The watchdog maintainer just needs to work on his mailing list backlog.
Hi, I don't see the fix in the incoming 4.12 kernel. I need the watchdog, what the i2c-piix4 module is needed for. Can I blacklist it without problem ? Thanks.
Reviewed and tested by me with Linux 4.11.5 and 4.12rc6 for hardware ASUS Sabertooth 990FX R2.0. Works perfectly and the code is clear. Though for consistency, I think the following blank lines should be removed: drivers/i2c/busses/i2c-piix4.c:159 drivers/watchdog/sp5100_tco.c:148 drivers/watchdog/sp5100_tco.c:161 Also, I/O ports are specified in drivers/watchdog/sp5100_tco.c:45 but not in drivers/i2c/busses/i2c-piix4.c:148. Maybe these should be normalized. Thanks Mr. Zoltán Böszörményi for the patches. The watchdog maintainer is Mr. Wim Van Sebroeck, who's been rather absent for the past months. Can someone ping him? Or what else can we do to get this mainlined? Having a non-operational hardware watchdog is a serious malfunction. Also (offtopic), are Bugzilla bugs mirrored in the LKML or vice-versa? Regards
Oh... After posting the previous message, I saw a small info from Bugzilla displaying the list CC members. The maintainer wasn't included, so I added his email and also Mr. Guenter Roeck's email. Although I couldn't add linux-watchdog@vger.kernel.org - Bugzilla returns an error. Perhaps Mr. Van Sebroeck and Mr. Roeck will take notice about this issue now.
This will require approval from the USB maintainer first, and v2 of that patch got it all wrong, as it removed a header file which declared 'extern struct mutex sb800_mutex;'. For my part, independent of the USB patch status, I am not inclined to accept an extern declaration in a source file. WARNING: externs should be avoided in .c files #51: FILE: drivers/watchdog/sp5100_tco.c:47: +extern struct mutex sb800_mutex; On top of that, the patch series makes the watchdog and i2c drivers dependent on USB_HOST. The series, as is, will thus cause lots of compile failures. The dependency itself is unnecessary and only needed because the introduced mutex resides in usb code, which it should not. Overall I am (still) not sure I understand why request_muxed_region() can not be used to solve this problem.
The objection against the first version of the patches, i.e. the one that was using request_muxed_region() was that the patch was not handling a potential error from request_muxed_region() which can indeed fail, e.g. with an allocation error when the memory is low. The problem with handling the error from request_muxed_region() becomes obvious if you look at how the USB PCI quirk code is used. If request_muxed_region() fails then e.g. it would/should result in an URB transfer failure but it isn't possible without major surgery to the USB code. The alternative is to proceed without the locking provided by request_muxed_region(). Then there are two choices: either go ahead and program the quirk without locking or you don't program the quirk. It's unsafe if two kernel threads are programming the same hardware that works through index/data register pairs, the index register can be set to the wrong value for the other thread also expects it's the only one doing it. Without programming the quirk, the result is the broken transfer the comment talks about in pci-quirks.c about usb_amd_quirk_pll(). In the mail thread on LKML (linux-watchdog and linux-usb were Cc-ed) I have said that it should be possible to implement a variant of request_muxed_region() that accepts a pointer to an already allocated resource structure which can possibly be on the stack instead of being allocated. That way the sole reason of failure, the allocation error in request_muxed_region() would be eliminated for the use case of these 3 drivers. No one replied to this idea, neither positively nor negatively. If we had such a request_muxed_region() variant that cannot fail and wait just like mutex_lock(), then the impact on these 3 drivers would be the same as with mutex_lock(). If this idea is acceptable, I can implement it.
There are several other possible solutions for the USB quirks problem. For example, the USB quirks code could read the data it needs from the ports once at startup instead of each time it is needed, or a dedicated non-failing API (to be used by all drivers) could be used to access the two ports. I can not comment on the merits of implementing a non-failing version of request_muxed_region(). However, its absence is not an acceptable reason for creating other problems (such as unnecessary inter-dependencies between modules), nor for holding up a fix for the bug discussed here. The synchronization problem in the current USB quirks code is orthogonal to the problems observed here: Those problems can be fixed without addressing the USB quirks problem. The port access race condition of various drivers against the USB code is a different problem that can be solved separately (not that this would be optimal).
Unfortunately the interdependency is already there because 3 different drivers want to use the same I/O port pair, possibly in parallel. It's how these drivers were written. It's the responsibility of the 3 subsystem developers to find a satisfactory resolution for the long term. This BZ ticket is just the messenger so don't shoot. :-) Maybe, just maybe the best way to eliminate the regression would still be to revert commit 2fee61d22e606fc99ade9079fda15fdee83ec33e as it caused the regression in the first place in 4.4-rc4. Almost two years ago and no one has responded to the bug reports. sp5100_tco was there first to request_region() for 0xcd6/0xcd7. The USB quirks code doesn't even do locking for these ports and it can get away with it. Perhaps the same should be true for i2c-piix4 as a bandaid. I don't want to create a common platform device with the sole purpose to provide a common lock for these drivers. The inter-dependency would also be there this way. As a more abstract example, you shouldn't be able to use more than one filesystem types on the same harddisk, because they use the same common hardware resource. It's an obvious interdependency between EXT4 and VFAT! In the meantime, I implemented the non-failing version of request_muxed_region() Actualy, I modified the current __request_region() so a new IORESOURCE_MUXED request always goes to sleep but complains loudly if a conflicting resource doesn't have the IORESOURCE_MUXED flag. This is one failure case. The advantage of this change is that we can know what causes a resource conflict. It works just like before when all users of the same I/O region use IORESOURCE_MUXED. The main parts of __request_region() is factored out into a new function that doesn't allocate, the new function accepts a struct resource *, the actual structure can be anywhere. It's compiling at the moment. I will report back after I installed and booted the patched 4.11.6 kernel. The patches were created against the current Linux GIT.
Yes, the interdependency is there. That doesn't mean it has to be fixed with a single patch or a single patch series. It would be much easier to address the immediate problem first (ie the watchdog and i2c drivers blocking each other) and then discuss a solution that fixes all three. That is, of course, just my personal opinion. All I can say is that I won't accept a watchdog driver change which declares an external variable in a source file, nor one that creates a dependency against a logically unrelated subsystem.
I agree with Roeck that implementing the mutex in USB_HOST is wrong. And while I agree that commit 2fee61d22e606fc99ade9079fda15fdee83ec33e is far from ideal, I also believe that reverting it is out of question at this point since that would itself be a regression (we shouldn't break current hw sensor support). I also agree that the optimal course of action now is to focus on sorting the i2c-piix4/sp5100_tco conflict as a priority. There's still a few days ahead before kernel 4.12-rc7. Regarding the proposed "bandaid", it is a bad idea indeed. Though I didn't know about declaring extern's in C files before, why wasn't the 'sb800_mutex' mutex declared in the header file? sp5100_tco.h exists (and i2c-piix4.h doesn't exist yet - but creating should be fine). Also, Böszörményi, Where is the review for the first version? I'd like to read it. If an of out-of-memory scenario can cause request_muxed_region() to fail, I'm willing to learn how others dealt with this elsewhere.
Created attachment 257101 [details] Patch 1/5 Extend the request_region infrastructure
Created attachment 257103 [details] Patch 2/5 Modify behaviour of request_*muxed_region
Created attachment 257105 [details] Patch 3/5 Use request_declared_muxed_region() in the USB PCI quirks
Created attachment 257107 [details] Patch 4/5 Use request_declared_muxed_region() in i2c-piix4
Created attachment 257109 [details] Patch 5/5 Use request_declared_muxed_region() in sp5100_tco
The new patch series extends and modifies the request_*region() handling the way I wrote above. The driver patches don't use a common mutex anymore, so there is no inter-dependency any more than with using request_region() to exclude each other from using the same I/O ports. They are using the new request_declared_muxed_region() macro and the pre-existing DEFINE_RES_IO_NAMED() static initializer macro for declaring their struct resource on the stack. It boots and works. [ 5.687884] piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0 [ 5.687891] piix4_smbus 0000:00:14.0: Using register 0x2c for SMBus port selection [ 5.688266] piix4_smbus 0000:00:14.0: Auxiliary SMBus Host Controller at 0xb20 [ 5.696899] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05 [ 5.696966] sp5100_tco: PCI Vendor ID: 0x1002, Device ID: 0x4385, Revision ID: 0x42 [ 5.697002] sp5100_tco: Using 0xfed80b00 for watchdog MMIO address [ 5.697012] sp5100_tco: Last reboot was not triggered by watchdog. [ 5.697776] sp5100_tco: initialized (0xffffba2f4192db00). heartbeat=60 sec (nowayout=0)
(In reply to Marcelo "Marc" Ranolfi from comment #27) > Also, Böszörményi, Where is the review for the first version? I'd like to > read it. If an of out-of-memory scenario can cause request_muxed_region() to > fail, I'm willing to learn how others dealt with this elsewhere. The first patch series I sent out was here: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1368060.html My memory was failing, the first version of the patches also used a common mutex.
Thanks! Looks good, I'm testing this first thing tomorrow. Can you please have a look at this question I opened earlier on Unix Stack Exchange? It's about version 2 of your patch series, but still. https://unix.stackexchange.com/questions/372376/watchdog-reboot-not-properly-reported
Sorry, I don't know what caused that message. I am using the Fedora kernel SRPM to test my patches and the v2 patchset was properly reporting that the last reboot was caused by the watchdog after I killed the simple watchdog daemon back then when I was testing it. This v3 should not be different in this regard.
Tested and working. On system reboot, this is reported: "sp5100_tco: Unexpected close, not stopping watchdog!" However, following a web search, I found this bug was present even before the offending commit 2fee61d22e606fc99ade9079fda15fdee83ec33e. Safe to ignore for now, since the machine reboots as expected. Only happens when rebooting (shutdown not affected). Some considerations about the code: patch #1 a) declarations in ioport.h:227 and ioport.h:230 should probably be switched (in order) for consistency. b) in resource.c, function __request_region() should probably be left where it was previously. The new function __request_declared_region() in lines 1124-1171 should be moved below that. patch # 2 c) a high-level description would be nice to have near lines 1117-1119. d) It appears to me that ` & IORESOURCE_MUXED` is redundant in the context of file resource.c:1154. `if (!conflict->flags)` should suffice. I'm not sure that's the condition you intended, though. patch #4 f) re-definition of `res` in file i2c-piix4.c:349, previously defined in line 263.
Gah, I should have remembered bitwise operators. Please disregard me previous point d). Also I went from d) to f), sorry about that too. I'll abstain from making further code reviews for now.
(In reply to Marcelo "Marc" Ranolfi from comment #37) > Tested and working. Thanks for testing. > On system reboot, this is reported: > "sp5100_tco: Unexpected close, not stopping watchdog!" > > However, following a web search, I found this bug was present even before > the offending commit 2fee61d22e606fc99ade9079fda15fdee83ec33e. Safe to > ignore for now, since the machine reboots as expected. Only happens when > rebooting (shutdown not affected). > > Some considerations about the code: > > patch #1 > a) declarations in ioport.h:227 and ioport.h:230 should probably be switched > (in order) for consistency. Done. Also added new macros to make the extended API complete: request_declared_region() and release_declared_region() > b) in resource.c, function __request_region() should probably be left where > it was previously. The new function __request_declared_region() in lines > 1124-1171 should be moved below that. Done. The code churn in the patch is a little less now. > > patch # 2 > c) a high-level description would be nice to have near lines 1117-1119. Done, added extra doc to the first patch for the extra macros and extended it in the second patch for the behaviour change. > d) It appears to me that ` & IORESOURCE_MUXED` is redundant in the context > of file resource.c:1154. `if (!conflict->flags)` should suffice. I'm not > sure that's the condition you intended, though. You already noticed that it's a bitwise AND and not a logical AND. "flags" has bits inside. > > patch #4 > f) re-definition of `res` in file i2c-piix4.c:349, previously defined in > line 263. Fixed, it's moved to a global static variable. This was done in all of the patches. The search for conflicting regions only checks the start and end values but not the addresses of resource descriptors so it should be safe(TM). It's building now and if testing shows no problems I will post new patches in mail and here, too.
Created attachment 257119 [details] Patch 1/5 Extend the request_region infrastructure
Created attachment 257121 [details] Patch 2/5 Modify behaviour of request_*muxed_region
Created attachment 257123 [details] Patch 3/5 Use request_declared_muxed_region() in the USB PCI quirks
Created attachment 257125 [details] Patch 4/5 Use request_declared_muxed_region() in i2c-piix4
Created attachment 257127 [details] Patch 5/5 Use request_declared_muxed_region() in sp5100_tco
Unfortunately I haven't had time to test the latest round of patches from Zoltan, due to personal circumstances, but I can do-so if this would still be helpful?
Created attachment 261229 [details] Patch 1/5 Extend the request_region infrastructure
Created attachment 261231 [details] Patch 2/5 Modify behaviour of request_*muxed_region
Created attachment 261233 [details] Patch 3/5 Use request_declared_muxed_region() in the USB PCI quirks
Created attachment 261235 [details] Patch 4/5 Use request_declared_muxed_region() in i2c-piix4
Created attachment 261237 [details] Patch 5/5 Use request_declared_muxed_region() in sp5100_tco
New patchset is attached that applies to 4.15-rc4.
I tried your patchset from 2017-12-18 on gentoo-sources kernel 4.15 and this was the result (I additionally activated i2c debug output): # cat dmesg-t | grep -e i2c -e piix4 -e sp5100 -e SB800 i2c-core: driver [dummy] registered i2c i2c-0: adapter [AMDGPU DM i2c hw bus 0] registered i2c i2c-1: adapter [dmdc] registered i2c i2c-2: adapter [AMDGPU DM i2c hw bus 1] registered i2c i2c-3: adapter [dmdc] registered i2c i2c-4: adapter [AMDGPU DM i2c hw bus 2] registered i2c i2c-5: adapter [AMDGPU DM i2c hw bus 3] registered i2c i2c-6: adapter [AMDGPU DM i2c hw bus 4] registered i2c i2c-6: master_xfer[0] W, addr=0x50, len=1 i2c i2c-6: master_xfer[1] R, addr=0x50, len=1 i2c i2c-6: master_xfer[0] W, addr=0x50, len=1 i2c i2c-6: master_xfer[1] R, addr=0x50, len=128 i2c i2c-6: master_xfer[0] W, addr=0x50, len=1 i2c i2c-6: master_xfer[1] R, addr=0x50, len=128 piix4_smbus 0000:00:14.0: Using SMI# for SMBus piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0 piix4_smbus 0000:00:14.0: Using register 0x02 for SMBus port selection i2c i2c-7: adapter [SMBus PIIX4 adapter port 0 at 0b00] registered i2c i2c-8: adapter [SMBus PIIX4 adapter port 2 at 0b00] registered i2c i2c-9: adapter [SMBus PIIX4 adapter port 3 at 0b00] registered i2c i2c-10: adapter [SMBus PIIX4 adapter port 4 at 0b00] registered sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05 sp5100_tco: PCI Vendor ID: 0x1022, Device ID: 0x790b, Revision ID: 0x59 sp5100_tco: failed to find MMIO address, giving up. For references, this was before the patch (without i2c debug): # cat dmesg-t | grep -e i2c -e piix4 -e sp5100 -e SB800 i2c-core: driver [dummy] registered piix4_smbus 0000:00:14.0: Using SMI# for SMBus piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0 piix4_smbus 0000:00:14.0: Using register 0x02 for SMBus port selection i2c i2c-0: adapter [SMBus PIIX4 adapter port 0 at 0b00] registered i2c i2c-1: adapter [SMBus PIIX4 adapter port 2 at 0b00] registered i2c i2c-2: adapter [SMBus PIIX4 adapter port 3 at 0b00] registered i2c i2c-3: adapter [SMBus PIIX4 adapter port 4 at 0b00] registered sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05 sp5100_tco: PCI Vendor ID: 0x1022, Device ID: 0x790b, Revision ID: 0x59 sp5100_tco: I/O address 0x0cd6 already in use # cat dmesg-t | grep -e 0cd6 system 00:05: [io 0x0cd6-0x0cd7] has been reserved sp5100_tco: I/O address 0x0cd6 already in use Now, piix4_smbus is not i2c_piix4... Still, address 0x0cd6 is part of the x86 system debug (which one? pci? pnp?), and even if I don't seem to grasp which driver is using this address, sp5100_tco remains unable to pick it up. So there are other problems as well. What can I do to help?
Fount it: it's smba_idx # cat /proc/ioports | grep -e 0cd6 0cd6-0cd7 : pnp 00:05 0cd6-0cd7 : smba_idx There's mothing in dmesg about smba_idx...
smba_idx is the i2c-piix4 driver. The statement "piix4_smbus is not i2c_piix4" does not really make sense since the i2c-piix4 driver uses piix4_smbus as its driver name. The out-of-tree i2c driver patches are still needed in v4.15 (and in ToT) since none of the patches submitted to fix the problem (neither Zoltan's patch nor mine) have been accepted upstream. The sp5100_tco watchdog driver was reworked substantially in the upstream kernel. I would suggest to use the latest version from ToT for any tests going forward.
Well, then the patches from Zoltan Boszormenyi from 2017-12-18 don't work for me: sp5100_tco: failed to find MMIO address, giving up. What's wrong? This is kernel 4.15 I'm using. My hardware is an ASUS PRIME X370-PRO mainboard and an AMD Ryzen 7 1800X processor. UEFI firmware is the latest version. Graphics card is an AMD Radeon RX Vega 64, driver AMDGPU+DC. What does ToT mean?
ToT means "Top of Tree" and refers to the upstream Linux kernel. As for "what is wrong", please refer to the post-v4.15 upstream commits in drivers/watchdog/sp5100_tco.c. In a nutshell, though, the driver in v4.15 does not support Ryzen CPUs. Note that even with Ryzen support there is no guarantee that the driver actually works. For example, it is disabled on various MSI boards such as B350M MORTAR or B350 TOMAHAWK.
(In reply to Guenter Roeck from comment #56) > ToT means "Top of Tree" and refers to the upstream Linux kernel. Thanks. > As for "what is wrong", please refer to the post-v4.15 upstream commits in > drivers/watchdog/sp5100_tco.c. In a nutshell, though, the driver in v4.15 > does not support Ryzen CPUs. Thanks for the summary. I'll just have to wait then... and hope that it will be made Ryzen compatible. > Note that even with Ryzen support there is no guarantee that the driver > actually works. For example, it is disabled on various MSI boards such as > B350M MORTAR or B350 TOMAHAWK. Yes, it already occured to me that it could also be dependant on the actual BIOS/UEFI implementation and/or version. Very sad that vendors don't test their firmware for Linux compatibility...
(In reply to Andreas from comment #57) > (In reply to Guenter Roeck from comment #56) > > ToT means "Top of Tree" and refers to the upstream Linux kernel. > > Thanks. > > > As for "what is wrong", please refer to the post-v4.15 upstream commits in > > drivers/watchdog/sp5100_tco.c. In a nutshell, though, the driver in v4.15 > > does not support Ryzen CPUs. > > Thanks for the summary. I'll just have to wait then... and hope that it will > be made Ryzen compatible. > It will be, in v4.16. However, you'll still need the piix4 i2c changes, which won't be in v4.16. > > Note that even with Ryzen support there is no guarantee that the driver > > actually works. For example, it is disabled on various MSI boards such as > > B350M MORTAR or B350 TOMAHAWK. > > Yes, it already occured to me that it could also be dependant on the actual > BIOS/UEFI implementation and/or version. Very sad that vendors don't test > their firmware for Linux compatibility... AFAICS that is on purpose. The MSI boards use the watchdog on the Super-IO chip.
(In reply to Guenter Roeck from comment #58) > It will be, in v4.16. However, you'll still need the piix4 i2c changes, > which won't be in v4.16. Guenter, so seeing in 4.16 dmesg output: "[ 14.571240] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver [ 14.571349] sp5100-tco sp5100-tco: Using 0xfed80b00 for watchdog MMIO address [ 14.571354] sp5100-tco sp5100-tco: Watchdog hardware is disabled" I should not worry about it since sp5100 was working previously on 4.15.14: "[ 13.507867] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05 [ 13.508519] sp5100_tco: PCI Vendor ID: 0x1022, Device ID: 0x780b, Revision ID: 0x42 [ 13.508555] sp5100_tco: Using 0xfeb00000 for watchdog MMIO address [ 13.508562] sp5100_tco: Last reboot was triggered by watchdog. [ 13.509527] sp5100_tco: initialized (0x00000000b22ce291). heartbeat=60 sec (nowayout=0)" Gentoo ~amd64, kernel 4.16, Lenovo A6-6310 APU netbook, AMD SB700/SB800/Hudson-2/3, CONFIG_ATA_PIIX=y, CONFIG_I2C_PIIX4=y, CONFIG_SP5100_TCO=m. Thanks, Przemek.
Patch 5/5 fails on 4.16.0. This is my patch attempt with all the patches provided: kernel-patches/patch_1_of_5.patch patching file include/linux/ioport.h patching file kernel/resource.c Hunk #3 succeeded at 1134 (offset 1 line). Hunk #4 succeeded at 1163 (offset 1 line). Hunk #5 succeeded at 1172 (offset 1 line). Hunk #6 succeeded at 1220 (offset 1 line). kernel-patches/patch_2_of_5.patch patching file kernel/resource.c Hunk #1 succeeded at 1144 (offset 1 line). Hunk #2 succeeded at 1211 (offset 1 line). kernel-patches/patch_3_of_5.patch patching file drivers/usb/host/pci-quirks.c Hunk #1 succeeded at 315 (offset 17 lines). Hunk #2 succeeded at 352 (offset 17 lines). kernel-patches/patch_4_of_5.patch patching file drivers/i2c/busses/i2c-piix4.c kernel-patches/patch_5_of_5.patch patching file drivers/watchdog/sp5100_tco.c Hunk #1 FAILED at 48. Hunk #2 succeeded at 73 (offset 3 lines). Hunk #3 FAILED at 144. Hunk #4 FAILED at 155. Hunk #5 FAILED at 169. Hunk #6 FAILED at 366. Hunk #7 FAILED at 385. Hunk #8 FAILED at 405. Hunk #9 FAILED at 413. Hunk #10 FAILED at 434. Hunk #11 FAILED at 474. Hunk #12 FAILED at 522. Hunk #13 FAILED at 536. 12 out of 13 hunks FAILED -- saving rejects to file drivers/watchdog/sp5100_tco.c.rej How should I proceed?
(In reply to Przemek from comment #59) > (In reply to Guenter Roeck from comment #58) > > > It will be, in v4.16. However, you'll still need the piix4 i2c changes, > > which won't be in v4.16. > > Guenter, > so seeing in 4.16 dmesg output: > > "[ 14.571240] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver > [ 14.571349] sp5100-tco sp5100-tco: Using 0xfed80b00 for watchdog MMIO > address > [ 14.571354] sp5100-tco sp5100-tco: Watchdog hardware is disabled" > > I should not worry about it since sp5100 was working previously on 4.15.14: > > "[ 13.507867] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05 > [ 13.508519] sp5100_tco: PCI Vendor ID: 0x1022, Device ID: 0x780b, > Revision ID: 0x42 > [ 13.508555] sp5100_tco: Using 0xfeb00000 for watchdog MMIO address > [ 13.508562] sp5100_tco: Last reboot was triggered by watchdog. > [ 13.509527] sp5100_tco: initialized (0x00000000b22ce291). heartbeat=60 > sec (nowayout=0)" > Was it _really_ working, or just loading ? What happens if you trigger a watchdog timeout with 4.15.14 ? Does it reboot the system ?
(In reply to Guenter Roeck from comment #61) > Was it _really_ working, or just loading ? What happens if you trigger a > watchdog timeout with 4.15.14 ? Does it reboot the system ? I think that you are correct. IMHO this is the case that is covered by this bug report and I think that my question was relevant if I should worry about message in dmesg that 'Watchdog hardware is disabled' or not. I just assumed that there is a big rework in sp5100 driver. I could not apply last 5th patch on 4.16 because of rejected hunks thus impossible for me to test it out. I am not in a hurry but wanted to make sure if I am not mistaken. Command outputs on 4.15.14: "wd_identify -v --config-file /root/.test.conf wd_identify: String 'watchdog-device' found as '/dev/watchdog' wd_identify: String 'log-dir' found as '/var/log/watchdog' wd_identify: Variable 'realtime' found as 'yes' = 1 wd_identify: Integer 'priority' found = 1 SP5100 TCO timer watchdog was set to 60 seconds" "watchdog -v --config-file /root/.test.conf watchdog: String 'watchdog-device' found as '/dev/watchdog' watchdog: String 'log-dir' found as '/var/log/watchdog' watchdog: Variable 'realtime' found as 'yes' = 1 watchdog: Integer 'priority' found = 1" "dmesg | grep sp5100 [ 13.136877] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05 [ 13.136977] sp5100_tco: PCI Vendor ID: 0x1022, Device ID: 0x780b, Revision ID: 0x42 [ 13.137008] sp5100_tco: Using 0xfeb00000 for watchdog MMIO address [ 13.137015] sp5100_tco: Last reboot was triggered by watchdog. [ 13.137088] sp5100_tco: initialized (0x000000000e62c704). heartbeat=60 sec (nowayout=0) [ 1100.329890] sp5100_tco: Unexpected close, not stopping watchdog!" "wd_identify --config-file /root/.test.conf wd_identify: cannot open /dev/watchdog (errno = 16 = 'Device or resource busy')" The watchdog(daemon?) is still running according to 'top'. Thanks, Przemek.
(In reply to Przemek from comment #62) > (In reply to Guenter Roeck from comment #61) > > Was it _really_ working, or just loading ? What happens if you trigger a > > watchdog timeout with 4.15.14 ? Does it reboot the system ? > > I think that you are correct. IMHO this is the case that is covered by this > bug report and I think that my question was relevant if I should worry about > message in dmesg that 'Watchdog hardware is disabled' or not. I just assumed > that there is a big rework in sp5100 driver. > > I could not apply last 5th patch on 4.16 because of rejected hunks thus > impossible for me to test it out. > None of the patches submitted here will make it into the kernel. You'll need to apply commits 04b6fcaba346 and 0e89b2fec748 from mainline. ... > > The watchdog(daemon?) is still running according to 'top'. > You'll have to stop the watchdog daemon, then access /dev/watchdog.
Guenter, you said previously (2018-02-08) that your own patches to fix this one were not accepted upstream. Why weren't they? If this is still an ongoing issue I'd be willing to tackle on it.
The patches have now been accepted and are available in the mainline kernel. 04b6fcaba346 i2c: piix4: Use request_muxed_region 0e89b2fec748 i2c: piix4: Use usleep_range()
Just to inform you, it seems that I was totally wrong, and Guenter was right. If those are the only pathes that are missing from 4.16 to make sp5100 work, applying them does not make any change. Watchog hardware is still disabled, "/dev/watchdog" does not exist. In both kernels sp5100 diver is loaded. I apologize for hijacking this bug report. I was trying to trigger watchog timeout manually on 4.15.14 by "echo '\0x00' > /dev/watchdog" with watchdog daemon turned off, but received same message "sp5100_tco: Unexpected close, not stopping watchdog!" If I am doing everything right, it becoming obvious that this is impossible to manually play with this hardware. This is a Lenovo netbook, and there is anything in the machine bios to setup watchdog. If anyone has another idea and if I should open another bug report to work this out please let me know. lspci 4.16.2 with patches applied: "00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:780b] (rev 42) Subsystem: Lenovo FCH SMBus Controller [17aa:3801] Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Kernel modules: sp5100_tco" Once again I'm sorry for whole this mess. Thanks, Przemek.
The 4.15 message is ok and as expected. Question is if the system reboots subsequently.
(In reply to Guenter Roeck from comment #67) > The 4.15 message is ok and as expected. Question is if the system reboots > subsequently. No it doesn't. That's why I've reached a grim conclusion about it. Thanks, Przemek.