After suspend/resume on a recent AMD CPU, the rdrand instruction fails. symptoms are that openssl fails to generate keys (trying to do kernel module signing), and ssh anywhere does not work. The problem was eventually diagnosed by disabling that instruction in ssh - i.e. "OPENSSL_ia32cap=~0x4000000000000000 ssh ..." works. proc/cpuinfo https://bugzilla.redhat.com/attachment.cgi?id=944940 dmesg https://bugzilla.redhat.com/attachment.cgi?id=944986 and more details are described in: https://bugzilla.redhat.com/show_bug.cgi?id=1150286
The error message from running the module signing key of kernel build is: Generating a 4096 bit RSA private key .Error Generating Key 48012007979840:error:0307A071:bignum routines:BN_rand_range:too many iterations:bn_rand.c:269: 48012007979840:error:04081003:rsa routines:RSA_BUILTIN_KEYGEN:BN lib:rsa_gen.c:515: make[1]: *** [signing_key.x509] Error 1 make: *** [kernel] Error 2
still a problem with 3.17.3-200.fc20.x86_64 . it is also affecting "wget https://..." now. e.g. $ wget -m https://gmplib.org/~tege/x86-timing.pdf --2014-11-22 23:25:00-- https://gmplib.org/~tege/x86-timing.pdf Resolving gmplib.org (gmplib.org)... 37.252.124.96 Connecting to gmplib.org (gmplib.org)|37.252.124.96|:443... connected. OpenSSL: error:0307A071:bignum routines:BN_rand_range:too many iterations OpenSSL: error:1409802B:SSL routines:SSL3_SEND_CLIENT_KEY_EXCHANGE:reason(43) Unable to establish SSL connection. The workaround "OPENSSL_ia32cap=~0x4000000000000000 ..." continues to work; I am inclined to set it systemwide until a fix can be found... Is there anything I can help? I am not against looking at the kernel source myself.
I don't know what changed, but I just upgraded from fedora 20 to fedora 21 yesterday, and therefore switching from 3.17.6-200.fc20.x86_64 to 3.17.6-300.fc21.x86_64 , and both ssh and generating the module signing key of kernel build also start to work. As it happened, I have started to script/alias wget to always have OPENSSL_ia32cap=~0x4000000000000000 yesterday merely 1/2 a day before upgrading, so it definitely did not work merely 10 hours before I booted f21 for the first time. This can be closed, I think, though I would like to know whether it is userland or the kernel which affects it; I'll boot into the older f20 kernel with f21 userland soon, just to see which way it is.
Given that it kicks in after suspend/resume it is probably a BIOS bug which calls for a kernel workaround. In particular I am *guessing* that there is some AMD-specific feature control register (presumably something equivalent to MSR_IA32_MISC_ENABLE) which needed to be saved and restored? Booting the fc20 kernel with the fc21 userland would be useful, as would examining the third-party patches that go into the -300.fc21 kernel.
The fc20 kernel actually behaves correctly under a fc21 userland, strangely enough; and I am certainly that it did not with fc20 userland, as I have been on the same kernel for a week, and had to do my usual OPENSSL_ia32cap=~0x4000000000000000 for wget/ssh until as recent as 10 hours before upgrade, I think. The url for the patches for fedora is at http://pkgs.fedoraproject.org/cgit/kernel.git/ .
That would imply that openssl contains either a fix or a workaround. It is somewhat odd what kind of userspace fix could fix a suspend/resume problem, though...
Oh no, openssl just disabled the use of RDRAND: https://software.intel.com/en-us/blogs/2014/10/03/changes-to-rdrand-integration-in-openssl So the problem still remains. It would be good if someone with access to the relevant AMD hardware and documentation could look at this.
That makes sense - it had crossed my mind to put OPENSSL_ia32cap=~0x4000000000000000 or equivalent by putting it in early in /etc/sysconfig or replaced my /usr/lib64/libssl.so* , since it has been over two months of manually putting it on. They have just beaten me to it... I suspend intel won't mention or comment that their competitor doesn't implement that instruction correctly and/or needs special save+reset, so I'd rather hear why the change was made from the openssl people directly. FWIW, I was on openssl-1.0.1e-40.fc20.x86_64 and now openssl-1.0.1j-1.fc21.x86_64 , so indeed I have moved from pre 1.0.1f to post 1.0.1f. I'd be happy to test things if there are simple way of testing it independent of openssl...
Someone with the proper permission bits may want to update this bug title to refer to AMD family 22 and up the priority of this bug. See https://github.com/systemd/systemd/issues/11810#issuecomment-490275562
Also, there's this patch from https://paste.fedoraproject.org/paste/Qhao0f9NszPj8K9EgCSbnw referenced in https://github.com/systemd/systemd/issues/11810#issuecomment-490284361 diff --git a/src/basic/random-util.c b/src/basic/random-util.c index ca25fd2420..b7cfc1bc2d 100644 --- a/src/basic/random-util.c +++ b/src/basic/random-util.c @@ -58,6 +58,8 @@ int rdrand(unsigned long *ret) { msan_unpoison(&err, sizeof(err)); if (!err) return -EAGAIN; + if (*ret == 0 || *ret == ULONG_MAX) /* filter out obvious crap, in case of AMD */ + return -EAGAIN; return 0; #else
Changed the title (me being the original reporter...). Not sure if it is appropriate to change priority though - on the whole, I think the priority should be set by the people who are going to spend time attempting to fix the issue, rather than by the reporter(s)/users experiencing the issue.
The potential, well, fix: https://lkml.kernel.org/r/776cb5c2d33e7fd0d2893904724c0e52b394f24a.1565817448.git.thomas.lendacky@amd.com
Looks forgotten, feel free to reopen of there's still interest.
I checked a kernel source tree I have lying around, 5.4.94 (ubuntu raspberrypi), and the code changes in the patch in #12 looks like it is included. So this is all fixed. Thanks for tracking the issues; I thought I mentioned it, but apparently I did not - I no longer have that hardware since 2017-ish . Old laptop broke and beyond economical repair (ie. getting new laptop is cheaper than paying for parts...).