Bug 103351 - Machine check exception on Broadwell quad-core with SpeedStep enabled
Summary: Machine check exception on Broadwell quad-core with SpeedStep enabled
Status: CLOSED DOCUMENTED
Alias: None
Product: Power Management
Classification: Unclassified
Component: intel_idle (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Len Brown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-23 13:45 UTC by Kristóf Marussy
Modified: 2016-11-22 18:16 UTC (History)
27 users (show)

See Also:
Kernel Version: 4.2rc8
Tree: Mainline
Regression: No


Attachments
attachment-9832-0.html (1.74 KB, text/html)
2015-09-30 13:20 UTC, matthew.dewitt
Details
attachment-966-0.html (1.36 KB, text/html)
2015-10-01 18:59 UTC, matthew.dewitt
Details
/proc/cpuinfo when booted with microcode update 0x12 (8.31 KB, application/octet-stream)
2015-10-07 11:40 UTC, Jonas Platte
Details
/proc/cpuinfo for i5-5675C with the 0x12 ucode patch applied at early boot. (4.14 KB, text/plain)
2015-10-07 16:36 UTC, kernel@benjam.info
Details
attachment-20547-0.html (1.15 KB, text/html)
2015-10-07 21:51 UTC, matthew.dewitt
Details
attachment-2813-0.html (1.30 KB, text/html)
2015-10-08 18:02 UTC, matthew.dewitt
Details
attachment-31416-0.html (19.96 KB, text/html)
2015-10-21 23:00 UTC, matthew.dewitt
Details
attachment-5053-0.html (2.18 KB, text/html)
2016-11-22 17:02 UTC, matthew.dewitt
Details
attachment-5257-0.html (3.09 KB, text/html)
2016-11-22 17:07 UTC, matthew.dewitt
Details
attachment-2487-0.html (1.46 KB, text/html)
2016-11-22 17:57 UTC, matthew.dewitt
Details

Description Kristóf Marussy 2015-08-23 13:45:11 UTC
On my MSI GE62 2QE Apache Pro laptop with an Intel Core i7 5700HQ processor, I am having machine check exceptions appearing randomly when SpeedStep is enabled in the BIOS. I can trigger them quite reliable by running M-x package-install in emacs, but they also occur without any specific reason.

The bug can be triggered with the 4.2rc7-mainline kernel, as well as the Arch Linux 4.1.6-1 and 3.14.51-1-lts kernels. Booting with intel_pstate=disable does not solve the problem, only disable SpeedStep in the BIOS altogether does (but that also prevents CPU scaling to be done). After booting with mce=3 to ignore the machine check exception, I managed to capture the following error message:

CPU 2: Machine Check Exception: 5 Bank: 4 be00000000800400
RIP !INEXACT! 10:<ffffffff81045de2> {acpi_processor_ffh_cstate_enter+0x92/0xc0}
TSC 41979f424e MISC 7fc7ff2e82b9
PROCESSOR 0:40671 TIME 1439935114 SOCKET 0 APIC 4 microcode d

After this error, the affected CPU core reports numerous stalls, and the system remains unusable. Unfortunately, the output of mcelog ("Hardware event. This is not a software error.") did not enlighten me.

The issue seems to be connected to the stability issue of the Core i7 5775C processor, which was reported by Phoronix along a workaround athttp://www.phoronix.com/scan.php?page=news_item&px=core-i7-5775c-oc-fixed-mode

I also tested the system with memtest86, which did not report any errors in 8 passes. Running Windows 10 x64 Education on the machine (and some CPU and memory intensive scientific computing on it) also remains stable without any anomalies.
Comment 1 Kristóf Marussy 2015-08-24 12:22:01 UTC
I managed to get a stable system by booting with the processor.max_cstate=0 intel_idle.max_cstate=0 idle=poll and SpeedStep enabled. However, if I set idle=halt or idle=nomwait instead, the MCEs remain.

In Windows, the Performance monitor displays that the system utilizes C-states that are claimed to be C1, C2 and C3; so it is quite odd that C1 halt in Linux triggers an MCE...
Comment 2 Kristóf Marussy 2015-08-30 18:28:50 UTC
Today I tried out compiling 4.7rc8 and systematically commenting out the C-states in intel_idle.c that are supported by Broadwell, that is, C1, C1E, C3 and C6. Unfortunately, I was able to produce an MCE and panic with each of the kernels I compiled, which leaves me clueless about what could be this bug.
Comment 3 Ziyue Yang 2015-08-31 05:55:00 UTC
(In reply to Kristóf Marussy from comment #1)
> I managed to get a stable system by booting with the processor.max_cstate=0
> intel_idle.max_cstate=0 idle=poll and SpeedStep enabled. However, if I set
> idle=halt or idle=nomwait instead, the MCEs remain.
> 
> In Windows, the Performance monitor displays that the system utilizes
> C-states that are claimed to be C1, C2 and C3; so it is quite odd that C1
> halt in Linux triggers an MCE...

Same problem here with MSI GE62 2QE Apache Pro.
I tried both debian8.1 and Ubuntu15.04 on the real machine and received same kernel panics. Updating kernels to 4.2 doesn't help.
I also tried those two systems in VMware workstation under Windows 10 and ended up in BSODs saying "MACHINE_CHECK_EXCEPTIONS". 
I then tried to use debian in VMware at runlevel=3. It seems to be stable. I don't know why. 

I'm going to try your way to get a stable system. Will reply if anything new discovered.
Comment 4 Richard Liu 2015-08-31 08:15:14 UTC
I get the same problem on my ASUS Z97-A/USB 3.1.
CPU is Intel Broadwell i7-5775C . 

OS Kubuntu 15.04, Linux kernel 4.2unstable version

If enable and compile a software, it will cause hardware crash. 

But, if disable SpeedStep and Turbo Mode in BIOS, this issue will not happen.
Comment 5 Ziyue Yang 2015-09-01 02:08:03 UTC
(In reply to Ziyue Yang from comment #3)

> 
> Same problem here with MSI GE62 2QE Apache Pro.
> I tried both debian8.1 and Ubuntu15.04 on the real machine and received same
> kernel panics. Updating kernels to 4.2 doesn't help.
> I also tried those two systems in VMware workstation under Windows 10 and
> ended up in BSODs saying "MACHINE_CHECK_EXCEPTIONS". 
> I then tried to use debian in VMware at runlevel=3. It seems to be stable. I
> don't know why. 
> 
> I'm going to try your way to get a stable system. Will reply if anything new
> discovered.

"processor.max_cstate=0 intel_idle.max_cstate=0 idle=poll" option did work for me. MCEs
are caught in the kern.log but system didn't crash.

Since idle=poll actually produces much more heat, I tried to disable speedstep and remove
the idle=poll option. Seems stable as well. Notice that this time there are no more MCE records
in kern.log at all.
Comment 6 Richard Liu 2015-09-01 02:38:18 UTC
Update

disable Speedstep just let system more stable than before, but it cannot solve crash problem. 
it may need more time to re-procedure this issue. 

You can run several heavy process simultaneously , it might duplicate this issue .
Comment 7 saunders.52 2015-09-03 22:22:59 UTC
Interestingly, while Fedora 22 isn't subject to this issue for some reason by and large, I CAN invoke it by running an affected kernel in a virtual machine and it causes the same crash under the same circumstances of the VM being under load.
Comment 8 ed 2015-09-04 13:51:53 UTC
Hi,
having same problem on i7-5775C ASUS ROG MAXIMUS VII HERO GAMING MB
system is not stable when loaded even with "processor.max_cstate=0 intel_idle.max_cstate=0" and disabled SpeedStep and Turbo

Ubuntu 14.04.3 LTS

tried 
3.16.0-46
4.2rc8
4.2rc7
Comment 9 ed 2015-09-04 16:06:08 UTC
my mistake,till now tested without "processor.max_cstate=0 intel_idle.max_cstate=0 idle=poll" system was unstable on all kernels, now trying with these boot parameters and system as for now behave stable on 4.2rc8 under heavy load (BOINC)
Comment 10 Ondřej Janás 2015-09-11 10:09:01 UTC
Hello,

recently I've got this laptop MSI GS70-2QD-Stealth. I installed ubuntu 14.04 LTS without any problems and after i disabled Intel Speedstep, secure boot, fast boot in BIOS i am running the system without problems so far.(no GRUB parameters were added)
Comment 11 Jonas Platte 2015-09-14 00:32:37 UTC
Can reproduce with i7-5775C as well. Adding cpufreq.governor=performance to the kernel command line makes the system more stable, but doesn't completely make the problem go away.

I was also able to freeze Fedora 22, not with VM, but with chrooting into my arch system and running the configure script for glade (which is one of the things that reliably freezes my arch install) from there. Running the same script with the Fedora autotools and whatever is called by them doesn't freeze the system.

FWIW, me and a few other people affected by this were discussing this are also discussing thing on the arch linux forums (https://bbs.archlinux.org/viewtopic.php?id=201194), and the best guess so far has been that this is caused by some exotic CPU instruction used by a core library like glibc, which is disabled in the Fedora build of that library.
Comment 12 Alexey Dyachenko 2015-09-22 10:48:08 UTC
i5-5675C is affected as well (with GA-Z97-HD3 rev2.0 F9; Arch Linux). I have disabled Turbo Boost, as BIOS doesn't have setting for SpeedStep, C1E,C3,C6/C7 were also disabled, however the issue still occurs.

"processor.max_cstate=0 intel_idle.max_cstate=0 idle=poll" does not help.

Speaking of instructions these CPUs are indeed affected by HLE bug

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6b877e0 in __lll_unlock_elision () from /usr/lib/libpthread.so.0

Thinking its some instruction that causes MCE, I recompiled glibc without elision lock and with -march=ivybridge and installed linux-ck-ivybridge, however machine still hangs under load (like compiling glibc).
Comment 13 Adrienne Cohea 2015-09-26 03:03:32 UTC
This sounds a lot like what I am experiencing. I am on 4.1.6-1-ARCH #1 SMP PREEMPT.

I have 100% ability to reproduce the lock up by trying to compile metasploit. Everything I see in the systemd journal for the mcelog unit is of the form

Processor context corrupt
MCA: Internal Timer error
STATUS be00000000800400 MCGSTATUS 0

The CPU number on which the error occurs is generally 0 but not always.

I see a lot of MCGCAP lines that are the same across MCEs:

MCGCAP 1000c09 APICID 0 SOCKETID 0

APICID 0 (12 of 14 times)
APICID 2 (once)
APICID 4 (once)

Using "processor.max_cstate=0 intel_idle.max_cstate=0 idle=poll" worked for me so far (but I only have a couple hours of up time; however, I'm now able to compile stuff without the MCEs that have been happening with extreme regularity during compiles).

I have not changed any settings in the BIOS, so SpeedStep is still on.
Comment 14 firefox.cat-fox 2015-09-28 14:23:47 UTC
Hello,

same on Schenker XMG P505 with Intel i-5700HQ. Tried with Kubuntu 15.04 and 15.10-beta1. Kernel version is: 4.1.0-3.3 Ubuntu, from upstream Version 4.1.3

"processor.max_cstate=0 intel_idle.max_cstate=0 idle=poll" can be used as workaround but the fan is quiet noisy and nervous.

Speedstep cannot be switched off in bios.

best regards.
Comment 15 Crashbit 2015-09-28 14:58:37 UTC
I have similar problems with i5 5675C. I ran Windows without any problem. I use Intel Diagnostics Software and CPU is ok.

First, I have random blocks with Ubuntu 14.04 and 15.10 liveusb.
Finally I can install Ubuntu 15.10 server.

System works well without stress. When I install the desktop it appears random freezes and MCE errors.

mcelog only show Family 6 Model 47 CPU: Only decoding architectural errors

I have been using at restricted drivers Intel microcode firmware on Ubuntu 15.10 daily update, with kernel 4.2.0-11-generic.

My motherboard is Asus H97-Pro Gamer.

I try to dissable Intel StepSpeed, C-States, and Turbo Mode and have random freezes
Comment 16 matthew.dewitt 2015-09-29 00:47:28 UTC
This is definitely a GCC/GLIBC issue/trigger:

- Slackware 14.1 w/ 4.2.1 kernel from kernel.org
- No issues.
- Compiles and runs without issue.

- Slackware 14.1 Stable (with/without 4.2.1, else stock 3.x 14.1 kernel huge.s)
- 'slackpkg upgrade-all' (exclude kernel, update slackpkg first, etc)
- Machine crashes half way throught the update (update of GCC?)

- Slackware -Current (as of 09/28/15)
- Machine is unstable, crashes during any large compile (recompile kernel)

- Fedora 22 - Stable. No issues. GCC not installed.

- Fedora 23 BETA (As of 09/28/15) does not even finish install from Live CD. Crashes.

System:

i7 5700HQ
Comment 17 sac 2015-09-29 16:49:15 UTC
Where is Intel on this one? They have a huge QA department and noone tests a new processor architecture for *nix? I replaced several 5675C (complete marathon & minidump described on https://www.virtualbox.org/ticket/14641 ), the whole line is affected and I feel a little uncomfortable that every program can trigger MCEs on my machine.

So even if we find a workaround here for the Kernel, it would be interesting what's the long-term solution. Do we have the next FDIV like bug that we see in the news tomorrow or can Intel fix this with a Microcode update? I assume we need a new stepping (doubt that the MB vendors can work around, too). However I was not able to find any place where I can adress & report this to Intel as well :(

How can we debug this? All workarounds mentioned don't work / were falsified ("processor.max_cstate=0 intel_idle.max_cstate=0 idle=poll", OC-Fixed-Mode from Phoronix not available on all MBs).
Comment 18 saunders.52 2015-09-29 18:21:53 UTC
Is there a way to forward this to Intel? Unfortunately, these are all laptops, so their customer support page says "contact your manufacturer", and my manufacturer doesn't support Linux.
Comment 19 kernel@benjam.info 2015-09-29 18:26:17 UTC
(In reply to saunders.52 from comment #18)
> Is there a way to forward this to Intel? Unfortunately, these are all
> laptops, so their customer support page says "contact your manufacturer",
> and my manufacturer doesn't support Linux.

This bug is confirmed on the 5700HQ, 5675C, and 5775C. The last two of those are socketed desktop processors.
Comment 20 saunders.52 2015-09-29 18:27:56 UTC
(In reply to kernel@benjam.info from comment #19)
> (In reply to saunders.52 from comment #18)
> > Is there a way to forward this to Intel? Unfortunately, these are all
> > laptops, so their customer support page says "contact your manufacturer",
> > and my manufacturer doesn't support Linux.
> 
> This bug is confirmed on the 5700HQ, 5675C, and 5775C. The last two of those
> are socketed desktop processors.

Ah. Sorry. I still sent an e-mail to an intel contact address I could find pointing them to this bug. Probably wasn't the right one (security) but it's one of the few public facing ones I could find that were tech oriented, and this does crash the system when running code with the affected instructions in a VM.
Comment 21 Adrienne Cohea 2015-09-29 18:43:14 UTC
(In reply to sac from comment #17)
> Where is Intel on this one? They have a huge QA department and noone tests a
> new processor architecture for *nix? I replaced several 5675C (complete
> marathon & minidump described on https://www.virtualbox.org/ticket/14641 ),
> the whole line is affected and I feel a little uncomfortable that every
> program can trigger MCEs on my machine.
> 
> So even if we find a workaround here for the Kernel, it would be interesting
> what's the long-term solution. Do we have the next FDIV like bug that we see
> in the news tomorrow or can Intel fix this with a Microcode update? I assume
> we need a new stepping (doubt that the MB vendors can work around, too).
> However I was not able to find any place where I can adress & report this to
> Intel as well :(
> 
> How can we debug this? All workarounds mentioned don't work / were falsified
> ("processor.max_cstate=0 intel_idle.max_cstate=0 idle=poll", OC-Fixed-Mode
> from Phoronix not available on all MBs).

Please do not claim things like the workarounds are falsified. It's fine to say that they didn't work for *you*, but it is *absolutely* not true that the workaround doesn't work in *general*, and it's unhelpful to kernel maintainers to make positive assertions about other users' experiences that aren't true.

I have a Core i7-5700HQ with my CyberPowerPC Fangbook, and I was getting MCEs starting pretty much ever since I first booted from the Arch Linux Live USB, under various mysterious circumstances. As I mentioned earlier, I have 100% ability to reproduce the lockup: basically compile any larger project. The error is the same every time: MCA Internal Timer Error.

I used the kernel parameters you are saying are "falsified", and I have not experienced any MCEs at all, since then, and I have placed the system under considerable load and various different use cases.

I don't know how it's possible to get much more scientific than that. The parameters in the original bug report do in fact work, at least for some users.
Comment 22 kernel@benjam.info 2015-09-30 04:13:42 UTC
MSI claims to have a UEFI update for their 5700HQ-based machines that fixes this: https://forum-en.msi.com/index.php?topic=261054.msg1498718#msg1498718

My guess is that we need to collectively spam our motherboard manufacturers asking for a similar update.
Comment 23 saunders.52 2015-09-30 04:49:17 UTC
(In reply to kernel@benjam.info from comment #22)
> MSI claims to have a UEFI update for their 5700HQ-based machines that fixes
> this: https://forum-en.msi.com/index.php?topic=261054.msg1498718#msg1498718
> 
> My guess is that we need to collectively spam our motherboard manufacturers
> asking for a similar update.

Oooh. Mine's an MSI - I'll try that and see how it works.
Comment 24 saunders.52 2015-09-30 05:28:52 UTC
Installed the UEFI update to my MSI PX60-2QD, which has a 5750HQ. Tested it by doing a similar "install to a VirtualBox VM" test (with Fedora 23 Beta, mentioned above to be exceptionally bad for this) which triggered this before. (Fedora 23 then broke itself during an upgrade, but that wasn't this issue.)

Changelog for the BIOS update says that it upgraded the microcode to 13, so hopefully that Microcode will be available for loading at boot time sometime soon.
Comment 25 kernel@benjam.info 2015-09-30 05:43:34 UTC
I wrote a hacky script to extract microcode updates from binary blobs a few days ago. Here's what I was able to pull out of one of the MSI updates in question: http://benjam.info/downloads/5700hq-ucode.tar.gz

It obviously doesn't contain any updates for the 5675C or 5775C, but if someone has the 5700HQ on a non-MSI machine, they should be able to install the updates from that tarfile using iucode-tool, and hopefully it will work.

Installing microcode updates from arbitrary binaries should be pretty safe, since they're cryptographically verified by the processor and updates are reverted upon reboot.
Comment 26 matthew.dewitt 2015-09-30 13:20:43 UTC
Created attachment 189091 [details]
attachment-9832-0.html

Can someone post a link to clear instructions on the microcode update with
5700HQ, so folks who haven't done this before can get it resolved? Thanks!

On Wed, Sep 30, 2015 at 1:43 AM, <bugzilla-daemon@bugzilla.kernel.org>
wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=103351
>
> --- Comment #25 from kernel@benjam.info <kernel@benjam.info> ---
> I wrote a hacky script to extract microcode updates from binary blobs a few
> days ago. Here's what I was able to pull out of one of the MSI updates in
> question: http://benjam.info/downloads/5700hq-ucode.tar.gz
>
> It obviously doesn't contain any updates for the 5675C or 5775C, but if
> someone
> has the 5700HQ on a non-MSI machine, they should be able to install the
> updates
> from that tarfile using iucode-tool, and hopefully it will work.
>
> Installing microcode updates from arbitrary binaries should be pretty safe,
> since they're cryptographically verified by the processor and updates are
> reverted upon reboot.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>
Comment 27 kernel@benjam.info 2015-09-30 16:52:27 UTC
(In reply to matthew.dewitt from comment #26)
> Created attachment 189091 [details]
> attachment-9832-0.html
> 
> Can someone post a link to clear instructions on the microcode update with
> 5700HQ, so folks who haven't done this before can get it resolved? Thanks!

1. Check what microcode version you have by running `grep microcode /proc/cpuinfo`
2. Extract the tarfile.
3. Install iucode-tool.
4. For each .bin file that was in the tarfile, run iucode-tool over it. For example, `sudo iucode-tool 2494656.bin`. You can pass multiple files at once to iucode-tool, so you could alternatively just run something like `sudo iucode-tool *.bin`
5. Check if your version of the microcode has changed with `grep microcode /proc/cpuinfo`
6. If your version of the microcode changed, then hopefully your issue is fixed. Make sure to re-run step 4 every time after you reboot, until your distribution's intel-microcode package is updated, or until you get a UEFI update.
Comment 28 kernel@benjam.info 2015-10-01 02:17:36 UTC
I started looking into other MSI UEFI updates, and managed to extract an update for microcode version 0x12 (previously my machine was reporting 0x10) on my i5-5675C. Hopefully the updates will work for the i7-5775C as well. I've distilled this and the i7-5700HQ microcode into a GitHub repository, along with an install script.

After some stress-testing my machine seems stable, without any of the kernel flags and with default UEFI settings!

https://github.com/bgw/bdw-ucode-update-tool
Comment 29 firefox.cat-fox 2015-10-01 11:51:11 UTC
thanks a lot. works for me on xmg p505 with i-5700hq
Comment 30 Jonas Platte 2015-10-01 17:04:28 UTC
Well, apparently that binary blob does contain microcode for other processors. One of the microcode files on GitHub seems to have been applied successfully on my i7-5775C and updated the microcode from 0xd to 0x12.

I haven't yet tested it thoroughly, but I have a package building since a few minutes now that always crashed it very quickly before. Seems to work! :)
Comment 31 Alexey Dyachenko 2015-10-01 18:58:17 UTC
(In reply to kernel@benjam.info from comment #28) 
> https://github.com/bgw/bdw-ucode-update-tool

Confirm this working for 5675C, thank you!
To think that we got an update to fix such critical issue (and HLE glibc crashes aswell, btw) by hacking vendor BIOS rather than from Intel itself is mind boggling.
Comment 32 matthew.dewitt 2015-10-01 18:59:13 UTC
Created attachment 189201 [details]
attachment-966-0.html

Well, it still has to be applied at every reboot right?

Still, very awesome!

On Thu, Oct 1, 2015 at 2:58 PM, <bugzilla-daemon@bugzilla.kernel.org> wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=103351
>
> --- Comment #31 from Alexey Dyachenko <adotfive@gmail.com> ---
> (In reply to kernel@benjam.info from comment #28)
> > https://github.com/bgw/bdw-ucode-update-tool
>
> Confirm this working for 5675C, thank you!
> To think that we got an update to fix such critical issue (and HLE glibc
> crashes aswell, btw) by hacking vendor BIOS rather than from Intel itself
> is
> mind boggling.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>
Comment 33 kernel@benjam.info 2015-10-01 19:01:33 UTC
(In reply to matthew.dewitt from comment #32)
> Well, it still has to be applied at every reboot right?
> 
> Still, very awesome!

If you use iucode-tool with the -K option, it should be able to copy files into the right locations in `/lib/firmware/intel‐ucode` to make the fix persist. I'm going to experiment with it a bit later, and perhaps I'll add an option to the install script to automate it.
Comment 34 Kristóf Marussy 2015-10-01 20:09:00 UTC
This kind of fix should be probably applied before init is ran so that if some cpuinfo flags are changed, they are properly exposed event to early userspace. I think you should load the .bin with the kernel early microcode loading facility: https://www.kernel.org/doc/Documentation/x86/early-microcode.txt

Note that multiple initrd files can be passed to the kernel, so there is no need to concatenate the microcode cpio with the original initrd, see e.g. https://wiki.archlinux.org/index.php/Microcode
Comment 35 Jonas Platte 2015-10-01 21:44:03 UTC
Thanks! I didn't patch my normal initrd like the kernel.org document says, but patched the /boot/intel-ucode.img file I had anyways, then added that as a first initrd. Works like a charm! :)
Comment 36 kernel@benjam.info 2015-10-01 23:36:58 UTC
I added experimental persistence support on Debian-based systems (using initramfs-tools) to the install script: https://github.com/bgw/bdw-ucode-update-tool#install-instructions

I welcome pull requests for other distributions.
Comment 37 Henrique de Moraes Holschuh 2015-10-07 10:42:41 UTC
It would be really helpful to have the output of /proc/cpuinfo on the systems with the fixed microcode, both Broadwell and Skylake.

Can someone with those processors and using the microcode provided in github please attach the output of cat /proc/cpuinfo here?
Comment 38 Jonas Platte 2015-10-07 11:40:24 UTC
Created attachment 189601 [details]
/proc/cpuinfo when booted with microcode update 0x12

Here you go.
Comment 39 Henrique de Moraes Holschuh 2015-10-07 12:57:51 UTC
Thank you.

The cpuinfo dump for rev. 0x12 shows HLE and RTM still enabled.  

Can you confirm through the kernel log that the microcode update was applied by the "early" microcode update driver (check the kernel log for "early" in the microcode update log entries) ?

Does glibc with lock elision enabled work properly with that microcode?
Comment 40 kernel@benjam.info 2015-10-07 16:36:39 UTC
Created attachment 189651 [details]
/proc/cpuinfo for i5-5675C with the 0x12 ucode patch applied at early boot.
Comment 41 kernel@benjam.info 2015-10-07 16:37:44 UTC
I can confirm on my system that with an "early" microcode update:

[    0.000000] microcode: CPU0 microcode updated early to revision 0x12, date = 2015-06-19
[    0.077976] microcode: CPU1 microcode updated early to revision 0x12, date = 2015-06-19
[    0.095826] microcode: CPU2 microcode updated early to revision 0x12, date = 2015-06-19
[    0.113756] microcode: CPU3 microcode updated early to revision 0x12, date = 2015-06-19
[    0.586568] microcode: CPU0 sig=0x40671, pf=0x2, revision=0x12
[    0.586573] microcode: CPU1 sig=0x40671, pf=0x2, revision=0x12
[    0.586577] microcode: CPU2 sig=0x40671, pf=0x2, revision=0x12
[    0.586582] microcode: CPU3 sig=0x40671, pf=0x2, revision=0x12
[    0.586611] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba

HLE and RTM are still enabled.

I've attached /proc/cpuinfo for my machine. (Sorry for sending two emails, I'm still trying to grasp bugzilla)
Comment 42 Jonas Platte 2015-10-07 16:54:16 UTC
Henrique: Yes, I can confirm that the kernel log shows the microcode update:

[jonas@jp-desktop ~]$ dmesg | grep early
[    0.000000] microcode: CPU0 microcode updated early to revision 0x12, date = 2015-06-19
[    0.073758] microcode: CPU1 microcode updated early to revision 0x12, date = 2015-06-19
[    0.084952] microcode: CPU2 microcode updated early to revision 0x12, date = 2015-06-19
[    0.096162] microcode: CPU3 microcode updated early to revision 0x12, date = 2015-06-19

I don't know why you contacted me personally to test your glibc patch, but I don't think I can do that either way as I never experienced glibc SIGILL or SIGSEGV crashes, only kernel panics.
Comment 43 Henrique de Moraes Holschuh 2015-10-07 20:21:06 UTC
Jonas Platte: Thanks for the information.  Sorry, I didn't notice you were not one of the people who also reported glibc issues.

Benjan: Thanks for the information.

Alexey, Matthew: If at all possible, maybe you could repeat the glibc lock-elision testing with microcode 0x12, using a VM or chroot?
Comment 44 matthew.dewitt 2015-10-07 21:51:24 UTC
Created attachment 189671 [details]
attachment-20547-0.html

I'll do that tonight

On Wed, Oct 7, 2015 at 4:21 PM, <bugzilla-daemon@bugzilla.kernel.org> wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=103351
>
> --- Comment #43 from Henrique de Moraes Holschuh <hmh@hmh.eng.br> ---
> Jonas Platte: Thanks for the information.  Sorry, I didn't notice you were
> not
> one of the people who also reported glibc issues.
>
> Benjan: Thanks for the information.
>
> Alexey, Matthew: If at all possible, maybe you could repeat the glibc
> lock-elision testing with microcode 0x12, using a VM or chroot?
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>
Comment 45 Alexey Dyachenko 2015-10-08 10:09:28 UTC
Yup, getting elision lock crashes even with updated microcode 0x12.

Core was generated by `/usr/lib/gnome-session/gnome-session-check-accelerated'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007ff7c1e7c7e0 in __lll_unlock_elision () from /usr/lib/libpthread.so.0
#1  0x00007ff7ba0bf42c in ?? () from /usr/lib/libEGL_nvidia.so.0
#2  0x00007ff7ba04e732 in ?? () from /usr/lib/libEGL_nvidia.so.0
#3  0x00007ffd2dc7c5d0 in ?? ()
#4  0x00007ff7ba0d3fa1 in ?? () from /usr/lib/libEGL_nvidia.so.0
#5  0x00007ffd2dc7c5d0 in ?? ()
#6  0x00007ff7c5cb8885 in _dl_fini () from /lib64/ld-linux-x86-64.so.2

When glibc compiled without lock-elision there is no crash and gdm starts normally.

I do microcode update with the script from kernel@benjam.info, which I put into systemd unit, so its not an 'early' update.

Oct 08 09:52:21 yuuzora kernel: microcode: CPU0 sig=0x40671, pf=0x2, revision=0x10
Oct 08 09:52:21 yuuzora kernel: microcode: CPU0 updated to revision 0x12, date = 2015-06-19
Oct 08 09:52:21 yuuzora kernel: microcode: CPU1 sig=0x40671, pf=0x2, revision=0x10
Oct 08 09:52:21 yuuzora kernel: microcode: CPU1 updated to revision 0x12, date = 2015-06-19
Oct 08 09:52:21 yuuzora kernel: microcode: CPU2 sig=0x40671, pf=0x2, revision=0x10
Oct 08 09:52:21 yuuzora kernel: microcode: CPU2 updated to revision 0x12, date = 2015-06-19
Oct 08 09:52:21 yuuzora kernel: microcode: CPU3 sig=0x40671, pf=0x2, revision=0x10
Oct 08 09:52:21 yuuzora kernel: microcode: CPU3 updated to revision 0x12, date = 2015-06-19
Comment 46 Henrique de Moraes Holschuh 2015-10-08 11:47:32 UTC
Alexey,

Thank you very much for confirming that TSX-NI in Broadwell-H is still broken *and active* on microcode 0x12, dated 2015-06-19.

I highly recommend that you deploy early microcode updates on your system.  When you use the late microcode update mode, you are still at risk of hitting errata that would be fixed by the microcode update.

Also, when you don't do an "early" update, the kernel won't know about any "processor feature" flags change caused by the microcode update (yes, this is a known bug): it will continue to operate with the set of flags it got from the boot microcode for all intents and purposes (including, but not limited to /proc/cpuinfo "flags" information being stale).

I'd also start pestering the motherboard vendor for firmware updates at least every two months, or something to that effect :-(
Comment 47 Alexey Dyachenko 2015-10-08 11:53:35 UTC
I would gladly do realy updates (as a matter of fact I encountered panics during boot before microcode update unit got triggered), its just I don't know how to do that with a bunch of *.bin files hacked from other vendor BIOS, Arch provides intel-ucode.img with official microcode update.

Intel should release proper microcode update so users that bought their broken hardware could work.
Comment 48 Henrique de Moraes Holschuh 2015-10-08 16:36:06 UTC
Alexey,

I have no idea why it is taking so long for Intel to release a new "linux microcode update package", and I certainly agree that Intel dragging their feet on this is hurting end-users.  It is not like motherboard vendors do a proper job of issuing firmware updates.


Anyway, here's how to deploy early microcode updates, Arch-style.

1. Install iucode_tool.

If your Linux distro already has it, just use the one provided by your distro.

Otherwise, get the source code and compile it with "./configure ; make", and copy the iucode_tool binary to somewhere in your PATH (e.g. using "make install" as root).  It only needs glibc, gcc and make to build.

The iucode_tool source code tarball is available from:
https://gitlab.com/iucode-tool/releases/tree/latest


2. Update or create the early-initramfs image with the microcode update:

iucode-tool -Sl --overwrite --write-earlyfw=/boot/intel-ucode.img /lib/firmware/intel-ucode my-new-microcodes.bin

It will consider any microcodes in /lib/firmware/intel-ucode, as well as any microcodes in the ".bin" files you give it on the command line (I used "my-new-microcodes.bin" in the example).  You can list as many files as you want.


3. Set up grub to load the microcodes as per the *Arch-Linux* recommendation (i.e. using a separate initramfs image for microcode):

https://wiki.archlinux.org/index.php/Microcode


If your distro uses the "single initramfs" mode of early microcode updates (Debian, Ubuntu, Fedora), this should *override* the early-initramfs microcode update provided by the distro:  AFAIK, the kernel uses the first early microcode update datafile it finds.

For this reason, make sure to regenerate intel-ucode.bin every time you get a new microcode update package from your distro.  You should probably discontinue its use and switch back to the distro microcode distribution should it start shipping recent enough microcode for your processor.
Comment 49 saunders.52 2015-10-08 17:22:03 UTC
(In reply to Henrique de Moraes Holschuh from comment #46)
> Alexey,
> 
> Thank you very much for confirming that TSX-NI in Broadwell-H is still
> broken *and active* on microcode 0x12, dated 2015-06-19.
> 

There's also a microcode 0x13 the MSI update installed on my machine installs that disables TSX-related stuff, according to /proc/cpuinfo: https://www.dropbox.com/s/dfjs8xbypx20vqm/msi-px60-2qd-microcode13-cpuinfo.txt?dl=0
Comment 50 Henrique de Moraes Holschuh 2015-10-08 17:48:04 UTC
Ah, that's very good news, indeed!

First, because anyone that has Broadwell-H with microcode 0x13 should be able to run any Linux distro without it crashing either the kernel, or glibc...  At least as far as the two errata mentioned in this bug report are concerned.

Second, because it means the chance of TSX-NI (RTM) being fixed in Broadwell-H really should be non-existent as the erratum text says, so blacklisting it becomes uncontroversial.
Comment 51 matthew.dewitt 2015-10-08 18:02:25 UTC
Created attachment 189761 [details]
attachment-2813-0.html

How do I get 0x13??

On Thu, Oct 8, 2015 at 1:48 PM, <bugzilla-daemon@bugzilla.kernel.org> wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=103351
>
> --- Comment #50 from Henrique de Moraes Holschuh <hmh@hmh.eng.br> ---
> Ah, that's very good news, indeed!
>
> First, because anyone that has Broadwell-H with microcode 0x13 should be
> able
> to run any Linux distro without it crashing either the kernel, or
> glibc...  At
> least as far as the two errata mentioned in this bug report are concerned.
>
> Second, because it means the chance of TSX-NI (RTM) being fixed in
> Broadwell-H
> really should be non-existent as the erratum text says, so blacklisting it
> becomes uncontroversial.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>
Comment 52 kernel@benjam.info 2015-10-08 23:21:59 UTC
(In reply to matthew.dewitt from comment #51)
> 
> How do I get 0x13??

I think the 5700HQ updates include microcode version 0x13, but the UEFI update I pulled the 5675C and 5775C updates from was older than the 5700HQ update. It's also possible that the different processors use different version numbers.

However, looking at the microcode signatures, the last 5x75C microcode update is dated 2015-06-19, while the last 5700HQ microcode update is dated 2015-03-27, so I don't really know. I really wish Intel would tell us something.

If anyone can find a 5675C or 5775C motherboard that installs a microcode version past 0x12 for the 5675C or 5775C, I'd be happy to extract the microcode updates and add them to the Github repository.
Comment 53 Henrique de Moraes Holschuh 2015-10-09 01:18:24 UTC
Broadwell-H i7-5700HQ has CPU signature 0x40671...

http://www.intel.com/content/www/us/en/processors/core/desktop-mobile-5th-gen-core-family-spec-update.html

And it looks like that's the CPU signature of the i5/7 5x75C/R as well.
Comment 54 sac 2015-10-09 08:48:58 UTC
Unfortunately not even the mainboard vendor BIOS changelogs are reliable. My Asrock H97 Performance lists an BIOS update with 2.40 "Update Microcode 19." (0x13h), however if you extract the microcode update with MMTool you get Microcode 16. (0x10h). Exactly the same that was already included in BIOS 2.30.

In addition, I've read some guides where someone modded the vendor BIOS to include the latest Microcode Updates (only Win, because all my Lux is currently not working because of this old Microcode MCE :()
http://donovan6000.blogspot.de/2013/06/insyde-bios-modding-cpu-microcodes.html

Has anyone tried modding an UEFI AMI BIOS directly, successfully?
Comment 55 sac 2015-10-09 13:20:38 UTC
BTW: I was able to update my Asrock H97 BIOS Microcode from 0x10 to 0x12 with Ubu and the MCEs on my i5 5675C seem to be gone. System is stable since 3 hours running a VM (where it crashed constantly after 2min earlier). Only found Win tools to mod the BIOS, but please note that a Microcode BIOS mod seems to be the most critical thing to mod and not all tools work for all MB vendors. I only tried because I have a Dual BIOS that restores the original one after 5 failed boot attempts. Try at your own risk.

http://www.win-raid.com/t455f16-Guide-How-to-flash-a-modded-ASUS-or-ASRock-AMI-UEFI-BIOS.html
http://www.win-raid.com/t154f16-Tool-Guide-News-quot-UEFI-BIOS-Updater-quot-UBU.html
Comment 56 sac 2015-10-13 20:06:14 UTC
Haven't teste, but in addition someone from VirtualBox suggested to set the nosmap kernel option (if no updated Microcode is available).
https://forums.virtualbox.org/viewtopic.php?f=6&t=71531&p=342062#p340624
Comment 57 Fredrik Atmer 2015-10-14 15:23:45 UTC
First, thank you, thank you, thank you =)

I've had my setup i5-5675C on an Asus H97I-Plus motherboard for some time now. It's not until now that it actually "works". I was afraid something was broken in my system due to random crashes.

I first tried to install Ubuntu 15.04, but the installer kept crashing. So I tried 14.04 which worked. Turns out I needed 14.10 for the Intel graphic drivers. That worked as well. Well, Unity didn't, and I kind of like it.. So I've been running Metacity since, with the few random crashes. Then I just realized 14.10 is no more. Then at 3:30 AM I find this thread. I did sleep first, but the day after I was able to do a dist-upgrade of my 14.10 system to 15.04 using the 0x12 microcode (I used to have 0xd).

This has to be the most haxxor I ever did. Now I'm running Unity without problems so far in 15.04, Speedstep and Turbo active. I always thought microcode was something embedded in the processor, not to be accessed from the outside world.

I'm using the script from the git repository. It works, but the --persist-debian option fails to make it persistent. I don't know if I'm using it incorrectly, and I'm happy to provide more information if requested. I also tried the method with iucode-tool described above, with the same result. The install.sh script does just that, right?
Comment 58 Henrique de Moraes Holschuh 2015-10-14 18:21:18 UTC
(In reply to Fredrik Atmer from comment #57)
> I've had my setup i5-5675C on an Asus H97I-Plus motherboard for some time

...

> first, but the day after I was able to do a dist-upgrade of my 14.10 system
> to 15.04 using the 0x12 microcode (I used to have 0xd).

Hmm, libpthreads (glibc POSIX threads support) in Ubuntu 15.04 is supposed to have lock elision enabled, and it should not be really happy with the i5-5675C with microcode 0x12 (you'd need microcode 0x13).

Is "rtm" listed in your /proc/cpuinfo "flags" line?

Can you run heavily threaded applications without crashes?
Comment 59 kernel@benjam.info 2015-10-14 20:50:59 UTC
Great news! I tried some other MSI UEFI updates, and managed to extract a microcode update that brings my 5675C up to version 0x13! Since the update came from a laptop UEFI update, and since the 5775C, 5675C, and 5700HQ all have the same signature (https://bugzilla.kernel.org/show_bug.cgi?id=103351#c53), I'm presuming that this means this one update should work for everyone.

To be safe, I've added the new update in a separate "experimental" branch. Once people can independently confirm it works on all the different types of processors, I'll merge it to master.

To use the update in the experimental branch, simply run:

---
$ git clone https://github.com/bgw/bdw-ucode-update-tool.git
$ cd bdw-ucode-update-tool
$ git checkout experimental
$ ./install.sh
---

I've installed this persistently on my machine, and verified that it applies at early boot. My cpu flags with 0x13 are:

---
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat pln pts dtherm intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt
---

Which is exactly the same as it was with 0x12.
Comment 60 kernel@benjam.info 2015-10-14 20:55:23 UTC
(In reply to Fredrik Atmer from comment #57)
> I'm using the script from the git repository. It works, but the
> --persist-debian option fails to make it persistent. I don't know if I'm
> using it incorrectly, and I'm happy to provide more information if
> requested. I also tried the method with iucode-tool described above, with
> the same result. The install.sh script does just that, right?

If you're curious what the `install.sh` script does, you can just open it in a text file and see for yourself. It uses `iucode-tool` with the `--write-firmware` flag to copy the file to `/lib/firmware/intel-ucode`, and then `update-initramfs` to rebuild the early-boot image.

I haven't tested this on Ubuntu (I'm using Debian Jessie), but I'll do some experimentation in a virtual machine.
Comment 61 Fredrik Atmer 2015-10-14 21:14:49 UTC
I thought I had posted this earlier, but here goes.

Don't go ruin my day again... Yes, rtm is in there. Everything I've run so far seem to be working. Is there a stress test for threading or some suggested program I can run for testing? I tried running some sysbench test, but I have no idea what I am looking for. They all bogged down the system completely, but everything returned to normal after they finished.

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 71
model name	: Intel(R) Core(TM) i5-5675C CPU @ 3.10GHz
stepping	: 1
microcode	: 0x12
cpu MHz		: 1399.601
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt
bugs		:
bogomips	: 6185.82
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:
Comment 62 saunders.52 2015-10-14 21:16:45 UTC
(In reply to Fredrik Atmer from comment #61)
> I thought I had posted this earlier, but here goes.
> 
> Don't go ruin my day again... Yes, rtm is in there. Everything I've run so
> far seem to be working. Is there a stress test for threading or some
> suggested program I can run for testing? I tried running some sysbench test,
> but I have no idea what I am looking for. They all bogged down the system
> completely, but everything returned to normal after they finished.

That's so weird given my MSI laptop with UEFI-supplied microcode 13 doesn't have rtm listed. Huh.
Comment 63 Fredrik Atmer 2015-10-14 21:31:01 UTC
I just tried the experimental branch and it works for me. I'm at 0x13 =)

Although the --persist-debian still doesn't persist. I'm back at 0xd after a reboot. This is my output from install.sh. I don't know if update-initramfs looks in the wrong place or what?


*** Initial microcode version information:
microcode	: 0x13
microcode	: 0x13
microcode	: 0x13
microcode	: 0x13

*** Applying microcode updates
*** Setting up persistence
FYI: You can safely ignore warnings about files already existing.
/usr/sbin/iucode-tool: 06-47-01: cannot write to, or create file: File exists
*** Running update-initramfs
update-initramfs: Generating /boot/initrd.img-3.19.0-30-generic

*** Current microcode version information:
microcode	: 0x13
microcode	: 0x13
microcode	: 0x13
microcode	: 0x13

No errors encountered.
If the microcode version did not change, no update has been applied.
Comment 64 Fredrik Atmer 2015-10-14 21:43:58 UTC
I got it, I think. I moved all files in /lib/firmware/intel-ucode before running install.sh and now when I reboot I get the early update, from dmesg | grep early

[    0.000000] CPU0 microcode updated early to revision 0x13, date = 2015-08-03
[    0.080684] CPU1 microcode updated early to revision 0x13, date = 2015-08-03
[    0.098571] CPU2 microcode updated early to revision 0x13, date = 2015-08-03
[    0.116521] CPU3 microcode updated early to revision 0x13, date = 2015-08-03
Comment 65 Henrique de Moraes Holschuh 2015-10-15 01:05:23 UTC
General Warning:

Some of the microcodes mentioned in this thread, particularly the revision 0x13 microcode for the Core i5/i7-5xxxHQ and Core i5/i7 5x75C, 5x75R, *have strict requirements* for safe installation.

These microcodes disable Intel TSX-NI when loaded.  On any system where glibc was compiled with lock elision enabled, these microcodes *must* be installed through either a BIOS update, or through the early microcode loader (at least for the time being).

Do *not* install these microcodes to a standard initramfs.  It can render the system unbootable should it crash systemd or udev inside the standard initramfs.  You *must* use an early-initramfs to safely apply such microcode updates.

For safety, I strongly recommend that they should not be added to /lib/firmware/intel-ucode with standard names.  Debian and Ubuntu rename such microcode by appending ".initramfs" to their names, for example: iucode-tool won't care and will still install them to the early-initramfs, but the kernel microcode module will not find them, rendering them safe.

If you fail install such microcode correctly in a system with a lock-elision-enabled glibc, it will instantly crash systemd, udev, and anything else linked to libpthreads when the microcode update is loaded.

Eventually, Debian, Ubuntu and Fedora are likely to blacklist lock elision for Broadwell processors in glibc, just like it was done for Haswell.  With that blacklist in place, glibc won't crash even if the update is applied at an inappropriate time (after the kernel booted).  But AFAIK nobody has updated the glibc blacklist to add broadwell processors, yet.

So, please use an early-initramfs for microcode updates, and enjoy a far more stable system ;-)
Comment 66 Henrique de Moraes Holschuh 2015-10-15 13:18:26 UTC
> I've installed this persistently on my machine, and verified that it applies
> at early boot. My cpu flags with 0x13 are:
> 
> ---
> fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36
> clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm
> constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
> aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3
> fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer
> aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat pln pts dtherm
> intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle
> avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt
> ---
> 
> Which is exactly the same as it was with 0x12.

Assuming the microcode was *really* applied by the early microcode loader (please double-check):

This is really strange, since it was reported to remove the rtm and hle flags on some processors, and the processor errata text is very specific that Intel TSX is NOT supposed to be available (even when it is, in fact, reported to be available by CPUID).  Argh.  Conflicting/incomplete information, and conflicting runtime behavior.  Thank you, Intel!

Maybe this thing is related to a boot-locked MSR?  The microcode might change the default on power-up, but still respect writes to such a MSR by the BIOS.  That would mean you really want a BIOS update to get everything right.

So, it boils down to whether people with microcode 0x13 usually have RTM enabled or not, and whether glibc with lock elision works fine on heavy threaded workloads for those that do have RTM enabled on microcode 0x13...
Comment 67 Henrique de Moraes Holschuh 2015-10-15 13:19:20 UTC
(In reply to Fredrik Atmer from comment #64)
> I got it, I think. I moved all files in /lib/firmware/intel-ucode before
> running install.sh and now when I reboot I get the early update, from dmesg
> | grep early
> 
> [    0.000000] CPU0 microcode updated early to revision 0x13, date =
> 2015-08-03

Fredrik, could you post your /proc/cpuinfo with revision 0x13, please?
Comment 68 Adrienne Cohea 2015-10-15 13:29:28 UTC
I wasn't showing rtm or hle in my flags for /proc/cpuinfo on the microcode 0xd (with all of my MCE system halts) or in microcode (0x12) early applied. Is it possible the microcode do something else other than just disable TSX-NI? (Though, without rtm or hle showing in /proc/cpuinfo, that shouldn't have been my problem anyway?)

All I know is that microcode 0x12 is working just fine.
Comment 69 Henrique de Moraes Holschuh 2015-10-15 14:07:04 UTC
(In reply to Adrienne Cohea from comment #68)
> I wasn't showing rtm or hle in my flags for /proc/cpuinfo on the microcode
> 0xd (with all of my MCE system halts) or in microcode (0x12) early applied.

Adrienne, can you post the output of:
/proc/cpuinfo
cat /sys/devices/system/cpu/cpu0/microcode/processor_flags    (requires root)?

I'd also appreciate the information above from people that had RTM/HLE enabled and got it disabled by microcode 0x13 or by microcode 0x12...

> Is it possible the microcode do something else other than just disable
> TSX-NI? (Though, without rtm or hle showing in /proc/cpuinfo, that shouldn't
> have been my problem anyway?)

Yes, it is possible.  We know microcode 0x12 address issues that resulted in the hangs and MCEs, which appear to be errata BDD86/BDM101.  We don't know for sure which other errata it works around, and how.

And you're correct that, since your system never advertised HLE or RTM in the first place, Intel TSX should never be a problem for you.

> All I know is that microcode 0x12 is working just fine.

Indeed we don't really know what microcode 0x13 fixes that microcode 0x12 didn't already fix.

It looked like it was Intel TSX-NI (RTM), but now it looks like it is either something else, or that there are extra requirements we don't know about to get Intel TSX-NI disabled.
Comment 70 saunders.52 2015-10-15 17:04:05 UTC
(In reply to Henrique de Moraes Holschuh from comment #69)
> Adrienne, can you post the output of:
> /proc/cpuinfo
> cat /sys/devices/system/cpu/cpu0/microcode/processor_flags    (requires
> root)?
> 
> I'd also appreciate the information above from people that had RTM/HLE
> enabled and got it disabled by microcode 0x13 or by microcode 0x12...

I have an MSI laptop (PX60-2QD w/ i7 5700hq) with BIOS provided microcode 13.

The entire output of the processor flags file for me is "0x20".

/proc/cpuinfo is https://www.dropbox.com/s/dfjs8xbypx20vqm/msi-px60-2qd-microcode13-cpuinfo.txt?dl=0
flags list: "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap xsaveopt"
Comment 71 kernel@benjam.info 2015-10-15 18:06:36 UTC
(In reply to Henrique de Moraes Holschuh from comment #66)
> Assuming the microcode was *really* applied by the early microcode loader
> (please double-check):

It's really being applied by the early microcode loader:

---
[    0.000000] microcode: CPU0 microcode updated early to revision 0x13, date = 2015-08-03
[    0.078198] microcode: CPU1 microcode updated early to revision 0x13, date = 2015-08-03
[    0.096059] microcode: CPU2 microcode updated early to revision 0x13, date = 2015-08-03
[    0.113892] microcode: CPU3 microcode updated early to revision 0x13, date = 2015-08-03
[    0.590510] microcode: CPU0 sig=0x40671, pf=0x2, revision=0x13
[    0.590515] microcode: CPU1 sig=0x40671, pf=0x2, revision=0x13
[    0.590519] microcode: CPU2 sig=0x40671, pf=0x2, revision=0x13
[    0.590522] microcode: CPU3 sig=0x40671, pf=0x2, revision=0x13
[    0.590549] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
---

I don't understand it either, and I'm as frustrated as everyone else by the lack of communication from Intel.

Thanks for the warning about applying the updates after early boot. I still want to support the normal installation mechanism, because it's easier for some people, but I'll certainly look into adding more specific disclaimers and tweaking the way the installation works.
Comment 72 Henrique de Moraes Holschuh 2015-10-16 10:19:44 UTC
Well, FWIW, the processor flags (pf) reported are _different_.

So now, we need a bit more data from others, to check if processors with pf=0x20 ends up with RTM disabled, and pf=0x2 ends up with RTM enabled, or if there is no such connection.

I'd expect pf=0x20 to be the mobile processors (soldered), and pf=0x2 to be the desktop processors (socketed), BTW.

The "pf" can be read from either the kernel log (microcode: CPU0 sig=... line), or from "cat /sys/devices/system/cpu/cpu0/microcode/processor_flags".  It should be analyzed together with the output of "cat /proc/cpuinfo".
Comment 73 Alexey Dyachenko 2015-10-16 10:24:38 UTC
microcode updated early
pf=0x2
cpuinfo http://pastebin.com/92EjQxsM
Comment 74 Jinserk Baik 2015-10-16 11:51:28 UTC
Hello, nice to know this thread.

My system is i5-5675C, with ASUS H97I-PLUS. I'm using Debian strech (testing), applied to the experimental microcode 0x13. Here is my status:

---
$ dmesg | grep microcode
[    0.000000] microcode: CPU0 microcode updated early to revision 0x13, date = 2015-08-03
[    0.174403] microcode: CPU1 microcode updated early to revision 0x13, date = 2015-08-03
[    0.200376] microcode: CPU2 microcode updated early to revision 0x13, date = 2015-08-03
[    0.226294] microcode: CPU3 microcode updated early to revision 0x13, date = 2015-08-03
[    1.377455] microcode: CPU0 sig=0x40671, pf=0x2, revision=0x13
[    1.377467] microcode: CPU1 sig=0x40671, pf=0x2, revision=0x13
[    1.377485] microcode: CPU2 sig=0x40671, pf=0x2, revision=0x13
[    1.377503] microcode: CPU3 sig=0x40671, pf=0x2, revision=0x13
[    1.377591] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
---
$ cat proc/cpuinfo
....
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch arat epb pln pts dtherm intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt
...
---

It seems to be hle and rtm enabled.
Comment 75 Henrique de Moraes Holschuh 2015-10-16 13:19:08 UTC
(In reply to Alexey Dyachenko from comment #73)
> microcode updated early
> pf=0x2
> cpuinfo http://pastebin.com/92EjQxsM

HLE and RTM still enabled.  However, this is microcode 0x12, which we didn't expect to turn HLE and RTM off anyway from previous reports...

Alexey, maybe you could try microcode 0x13 which Benjan has added to his experimental branch?  It should not be any more dangerous than microcode 0x12, as you're using early updates...
Comment 76 Dimman 2015-10-16 15:43:16 UTC
My stats:

[    0.000000] DMI: ASUS All Series/Z97-PRO(Wi-Fi ac)/USB 3.1, BIOS 2401 04/27/2015

[    0.000000] microcode: CPU0 microcode updated early to revision 0x13, date = 2015-08-03
[    0.081596] microcode: CPU1 microcode updated early to revision 0x13, date = 2015-08-03
[    0.099552] microcode: CPU2 microcode updated early to revision 0x13, date = 2015-08-03
[    0.117352] microcode: CPU3 microcode updated early to revision 0x13, date = 2015-08-03
[    0.500798] microcode: CPU0 sig=0x40671, pf=0x2, revision=0x13
[    0.500802] microcode: CPU1 sig=0x40671, pf=0x2, revision=0x13
[    0.500807] microcode: CPU2 sig=0x40671, pf=0x2, revision=0x13
[    0.500811] microcode: CPU3 sig=0x40671, pf=0x2, revision=0x13
[    0.500816] microcode: CPU4 sig=0x40671, pf=0x2, revision=0x13
[    0.500820] microcode: CPU5 sig=0x40671, pf=0x2, revision=0x13
[    0.500824] microcode: CPU6 sig=0x40671, pf=0x2, revision=0x13
[    0.500829] microcode: CPU7 sig=0x40671, pf=0x2, revision=0x13
[    0.500856] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba

processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 71
model name	: Intel(R) Core(TM) i7-5775C CPU @ 3.30GHz
stepping	: 1
microcode	: 0x13
cpu MHz		: 3263.132
cache size	: 6144 KB
physical id	: 0
siblings	: 8
core id		: 3
cpu cores	: 4
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt
bugs		:
bogomips	: 6595.67
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

So HLE and RTM remains for me in 0x13. I ripped the microcode myself from a msi bios file. (E16J1IMS.110.zip)
Comment 77 kernel@benjam.info 2015-10-16 15:57:42 UTC
(In reply to Dimman from comment #76)
> So HLE and RTM remains for me in 0x13. I ripped the microcode myself from a
> msi bios file. (E16J1IMS.110.zip)

Awesome! Can you post the SHA-1 sum of the microcode you ripped, so people have an easy way to verify that the microcode I'm providing isn't malicious (though an unsigned or modified microcode file would do nothing anyways).
Comment 78 Dimman 2015-10-16 16:03:22 UTC
(In reply to kernel@benjam.info from comment #77)
> (In reply to Dimman from comment #76)
> > So HLE and RTM remains for me in 0x13. I ripped the microcode myself from a
> > msi bios file. (E16J1IMS.110.zip)
> 
> Awesome! Can you post the SHA-1 sum of the microcode you ripped, so people
> have an easy way to verify that the microcode I'm providing isn't malicious
> (though an unsigned or modified microcode file would do nothing anyways).

Sure, here you go b55c1f4d5858209b5a607377b83994c59c7fb0f3.
Comment 79 Henrique de Moraes Holschuh 2015-10-16 17:17:52 UTC
It is nearly impossible for someone not @intel to create malicious microcode.

It is trivial to lie to tools like iucode-tool and even the kernel... so you could create something that looks like microcode revision "A" but it is really microcode revision "B", or falsify the microcode date, so that it looks like it is newer or older.

But that doesn't make any difference to the processor. It doesn't care, in fact it doesn't even get sent that data by the kernel driver.  The processor validates by itself the microcode signature and pf mask, and it is all part of the signed data which only Intel can create.  You cannot force an Intel processor to install unsuitable microcode.  You *can* cause it to crash if you do it in a very specific way (that cannot happen by accident), but that's it.

And once a microcode update is accepted, the processor *will* report its real revision, and that's what /proc/cpuinfo and the log messages will show.

That said, the SHA256 of the relevant microcodes for this thread are (in iucode_tool "--write-named-to" filename notation).

HASWELL (likely MCE fix):
449641e821abceb4b7321a4374be2e136e3a6c474c8ca2855ee387c889db3201  s000306C3_m00000032_r0000001D.fw

BROADWELL (likely MCE fix, maybe Intel TSX fix):
fed4431ae91f19bd9346428cc33b9ac6d4364c26b1ef221427738fce53d51525  s000306D4_m000000C0_r00000021.fw

BROADWELL-H (MCE fix, maybe Intel TSX fix):
b2fa8638b92fd3b99e6f29e485da38c9abc009ae003918c679fcd428ae1b3c64  s00040671_m00000022_r00000012.fw
e80e12dd77551813253903fc6da068faad32abb78424bf96b9765141a8dab2a1  s00040671_m00000022_r00000013.fw

SKYLAKE (MCE fix, Intel TSX fix, other fixes):
4784f5feb717aef4d750315f7a4d2d4d6f8fc67562b864c1ec40393a0705ca7a  s000506E3_m00000036_r00000034.fw
b20822210a31529135106b8822e84973d1446159a7473f6d5f839d6aa5a6d2df  s000506E3_m00000036_r0000003A.fw
Comment 80 Henrique de Moraes Holschuh 2015-10-16 17:29:54 UTC
Well, so far all desktop Broadwell-H (pf=0x2) still have RTM enabled even with microcode 0x13.  We don't have enough data for mobile Broadwell-H (pf=0x20).

Adrienne, it could be really helpful if you could tell us more about your system, since it has always had RTM disabled, even with microcode 0xd... could you tell us what system is it (make, model), and the output of "dmesg | grep microcode" and /proc/cpuinfo, please?
Comment 81 Henrique de Moraes Holschuh 2015-10-16 17:43:31 UTC
Oh, and just in case a certain Skylake oddity might scare someone needlessly:

Skylake may report that the microcode revision running in the processor is one less than the version displayed by iucode-tool and any other such tool.  It does so when the new Intel SGX feature is enabled.

That's why that, although Skylake microcode files have even revision numbers, some (most?) Skylake systems will report that they're running odd microcode revisions.
Comment 82 Fredrik Atmer 2015-10-16 21:09:35 UTC
Finally back home again. I see others have beaten me to it. 

$ dmesg | grep microcode
[    0.000000] CPU0 microcode updated early to revision 0x13, date = 2015-08-03
[    0.080692] CPU1 microcode updated early to revision 0x13, date = 2015-08-03
[    0.098581] CPU2 microcode updated early to revision 0x13, date = 2015-08-03
[    0.116531] CPU3 microcode updated early to revision 0x13, date = 2015-08-03
[    0.492240] microcode: CPU0 sig=0x40671, pf=0x2, revision=0x13
[    0.492245] microcode: CPU1 sig=0x40671, pf=0x2, revision=0x13
[    0.492249] microcode: CPU2 sig=0x40671, pf=0x2, revision=0x13
[    0.492253] microcode: CPU3 sig=0x40671, pf=0x2, revision=0x13
[    0.492282] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba

$ cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 71
model name	: Intel(R) Core(TM) i5-5675C CPU @ 3.10GHz
stepping	: 1
microcode	: 0x13
cpu MHz		: 3044.296
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt
bugs		:
bogomips	: 6186.17
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:
Comment 83 Adrienne Cohea 2015-10-17 05:03:04 UTC
(In reply to Henrique de Moraes Holschuh from comment #80)
> Adrienne, it could be really helpful if you could tell us more about your
> system, since it has always had RTM disabled, even with microcode 0xd...
> could you tell us what system is it (make, model), and the output of "dmesg
> | grep microcode" and /proc/cpuinfo, please?

Hey, everyone, I'm sorry it took me so long! This issue is important to everyone. So what follows will be kind of a big dump of information, but I don't know what all will be meaningful to everyone. So I've got 1) General information, 2) BIOS-reported information, and 3) dmesg | grep microcode 4) cat /proc/cpuinfo.

1) General Information
OEM: CyberPowerPC
Model Marketing Name: Fangbook III BX6

2) BIOS-Reported Information

BIOS Version: E16J2ICP.10E
EC Version 16J2ECP1 Ver5.06
CPU: Intel Core i7-5700HQ
Stepping: 40671
Microcode patch: d (Note, before boot...)

3) dmesg | grep microcode

-- on microcode 0xd --
[    0.432493] microcode: CPU0 sig=0x40671, pf=0x20, revision=0xd
[    0.432498] microcode: CPU1 sig=0x40671, pf=0x20, revision=0xd
[    0.432503] microcode: CPU2 sig=0x40671, pf=0x20, revision=0xd
[    0.432508] microcode: CPU3 sig=0x40671, pf=0x20, revision=0xd
[    0.432513] microcode: CPU4 sig=0x40671, pf=0x20, revision=0xd
[    0.432518] microcode: CPU5 sig=0x40671, pf=0x20, revision=0xd
[    0.432522] microcode: CPU6 sig=0x40671, pf=0x20, revision=0xd
[    0.432528] microcode: CPU7 sig=0x40671, pf=0x20, revision=0xd
[    0.432559] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba

-- on microcode 0x12 --
[    0.000000] microcode: CPU0 microcode updated early to revision 0x12, date = 2015-06-19
[    0.076100] microcode: CPU1 microcode updated early to revision 0x12, date = 2015-06-19
[    0.083977] microcode: CPU2 microcode updated early to revision 0x12, date = 2015-06-19
[    0.091894] microcode: CPU3 microcode updated early to revision 0x12, date = 2015-06-19
[    0.434008] microcode: CPU0 sig=0x40671, pf=0x20, revision=0x12
[    0.434014] microcode: CPU1 sig=0x40671, pf=0x20, revision=0x12
[    0.434019] microcode: CPU2 sig=0x40671, pf=0x20, revision=0x12
[    0.434023] microcode: CPU3 sig=0x40671, pf=0x20, revision=0x12
[    0.434028] microcode: CPU4 sig=0x40671, pf=0x20, revision=0x12
[    0.434033] microcode: CPU5 sig=0x40671, pf=0x20, revision=0x12
[    0.434037] microcode: CPU6 sig=0x40671, pf=0x20, revision=0x12
[    0.434043] microcode: CPU7 sig=0x40671, pf=0x20, revision=0x12
[    0.434075] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba

4) cat /proc/cpuinfo (showing CPU 0 only)

-- on microcode 0xd --
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 71
model name	: Intel(R) Core(TM) i7-5700HQ CPU @ 2.70GHz
stepping	: 1
microcode	: 0xd
cpu MHz		: 2113.171
cache size	: 6144 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc a
perfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm intel_pt
tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap xsaveopt
bugs		:
bogomips	: 5389.86
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

-- on microcode 0x12 --
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 71
model name	: Intel(R) Core(TM) i7-5700HQ CPU @ 2.70GHz
stepping	: 1
microcode	: 0x12
cpu MHz		: 2701.054
cache size	: 6144 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc a
perfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm intel_pt
tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap xsaveopt
bugs		:
bogomips	: 5390.35
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

When I "diffed" the /proc/cpuinfo outputs I saved off, only differences were in microcode of course, bogomips, and cpu MHz. The following commands have always returned no hits for me, even on the 0xd microcode that came stock:

grep flags /proc/cpuinfo | grep tsx
grep flags /proc/cpuinfo | grep rtm
grep flags /proc/cpuinfo | grep hle

If you need other information like the bridge, let me know. Thanks everyone so much for explaining the problem and providing a workaround. 0x12 works brilliantly for me.
Comment 84 Fredrik Atmer 2015-10-18 18:23:14 UTC
I just did the craziest (probably) thing ever! 

(As per sac's comment https://bugzilla.kernel.org/show_bug.cgi?id=103351#c55)

I read out my BIOS with flashrom, updated the microcode in it to 0x13 using the UFU tool to (in a virtual XP machine), flashed the new BIOS file back using flashrom, and now I believe I'm actually running 0x13 "natively".

That was probably reckless, don't do it! I just had to tell someone who can appreciate the beauty of it.. =D
Comment 85 Alexey Dyachenko 2015-10-19 10:16:23 UTC
(In reply to Henrique de Moraes Holschuh from comment #75)
> (In reply to Alexey Dyachenko from comment #73)
> > microcode updated early
> > pf=0x2
> > cpuinfo http://pastebin.com/92EjQxsM
> 
> HLE and RTM still enabled.  However, this is microcode 0x12, which we didn't
> expect to turn HLE and RTM off anyway from previous reports...
> 
> Alexey, maybe you could try microcode 0x13 which Benjan has added to his
> experimental branch?  It should not be any more dangerous than microcode
> 0x12, as you're using early updates...

I did not know there was an 0x13 microcode for my CPU. Well, I tried it and cpuinfo diff shows no difference at all, only current MHz and microcode version are different, obviously.

On a releated note, I have been experiencing random instant reboots while running 0x12, it may be related to some other errata triggering or maybe other hardware is faulty, anyway, nothing can be captured in logs. I'll see how 0x13 fares.
Comment 86 Henrique de Moraes Holschuh 2015-10-20 12:11:33 UTC
Related information:

Debian will blacklist glibc lock elision (which uses Intel TSX-NI RTM, and does not use HLE) in Broadwell/Broadwell-H/Broadwell-DE processors.  This could change, but only if Intel itself decides to own up and publish correct, consistent information that we feel we can trust.

After all, even the current editions of the Broadwell-* specification updates are inconsistent with reality.  This makes it impossible to trust Intel TSX-NI on those processors ATM.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800574


Most of the SIGSEGVs observed by users in __lll_unlock_elision are software bugs (and the bug is NOT in glibc, either): code that attempts to unlock an already unlocked lock will crash when Intel TSX is being used.

(typical example: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=750792)

IMO, we should instrument glibc to complain loudly (or even crash) whenever anything attempts to unlock and unlocked mutex: it will smoke out applications and libraries with shoddy locking really fast, and shoddy locking is known to cause subtle, hard to diagnose problems.

The MCE issues are not fixable by anything other than a microcode update, so it boils down to distros being able to ship that microcode update.
Comment 87 muriloq 2015-10-21 20:49:42 UTC
Does Skylake (especifically i7-6700HQ) also have the same MCE issues? I was offered a replacement laptop with this processor (mine is an Eurocom M5 Pro 8A - a Clevo rebrand - with i7-5700HQ) and I don't know if I should accept it or not...
Comment 88 kernel@benjam.info 2015-10-21 20:56:46 UTC
murilog, Maybe: https://bugzilla.kernel.org/show_bug.cgi?id=103351#c79

That said, the microcode update and the glibc blacklist should solve all of the known issues.
Comment 89 kernel@benjam.info 2015-10-21 20:58:33 UTC
Correction: When I say the microcode update fixes things, I mean the microcode update on BDW solves it on BDW. The BDW microcode update won't do anything on SKL. I don't have a copy of the SKL microcode updates, but if someone has a UEFI binary and want me to, I could extract them.
Comment 90 matthew.dewitt 2015-10-21 23:00:49 UTC
Created attachment 190781 [details]
attachment-31416-0.html

Hello friends...

This still isn't working for me. Can you point out my mistake?

bash-4.2# cat /usr/src/linux/.config | grep MICRO
CONFIG_MICROCODE=y
CONFIG_MICROCODE_INTEL=y
CONFIG_MICROCODE_AMD=y
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_MICROCODE_INTEL_LIB=y
CONFIG_MICROCODE_INTEL_EARLY=y
CONFIG_MICROCODE_EARLY=y
CONFIG_NFC_MICROREAD=m
CONFIG_PATA_JMICRON=y
CONFIG_NET_VENDOR_STMICRO=y
CONFIG_HID_MICROSOFT=m
CONFIG_USB_MICROTEK=m
CONFIG_MEMSTICK_JMICRON_38X=m
bash-4.2#


downloaded the 5700hq-ucode.tar.gz file in this thread. This file contains
four .bin files.


bash-4.2# cd
bash-4.2# cd microcode/
bash-4.2# ls -al
total 172
drwxr-xr-x 2 root root 4096 Oct 21 18:53 .
drwx--x--- 23 root root 4096 Oct 21 18:58 ..
-rw-r--r-- 1 1000 1000 21504 Sep 30 01:32 2494656.bin
-rw-r--r-- 1 1000 1000 22528 Sep 30 01:32 2516160.bin
-rw-r--r-- 1 1000 1000 24576 Sep 30 01:32 2538688.bin
-rw-r--r-- 1 1000 1000 10240 Sep 30 01:32 2563264.bin
-rw-r--r-- 1 root root 78384 Oct 21 18:46 5700hq-ucode.tar.gz
bash-4.2# iucode_tool -Sl -K --overwrite
--write-earlyfw=/boot/intel-ucode.img *.bin
iucode_tool: system has processor(s) with signature 0x00040671
selected microcodes:
001: sig 0x00040671, pf mask 0x22, 2015-03-27, rev 0x000d, size 10240
iucode_tool: Writing selected microcodes to: /boot/intel-ucode.img
iucode_tool: Writing microcode firmware file(s) into
/lib/firmware/intel-ucode
bash-4.2#


(I have also tried iucode_tool --scan-system
--write-earlyfw=/boot/ucode.cpio /lib/firmware/ )


bash-4.2# cat /etc/lilo.conf | grep intel
initrd = /boot/intel-ucode.img

bash-4.2# lilo
Warning: LBA32 addressing assumed
Added Linux + *
Added Recovery
One warning was issued.

bash-4.2#


And when I reboot....


bash-4.2# dmesg | grep early
bash-4.2# dmesg | grep micro
[ 0.598123] microcode: CPU0 sig=0x40671, pf=0x2, revision=0x19
[ 0.598315] microcode: CPU1 sig=0x40671, pf=0x2, revision=0x19
[ 0.598501] microcode: CPU2 sig=0x40671, pf=0x2, revision=0x19
[ 0.599050] microcode: Microcode Update Driver: v2.00 <
tigran@aivazian.fsnet.co.uk>, Peter Oruba
bash-4.2# grep microcode /proc/cpuinfo
microcode : 0x19
microcode : 0x19
microcode : 0x19
bash-4.2#

On Wed, Oct 21, 2015 at 4:58 PM, <bugzilla-daemon@bugzilla.kernel.org>
wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=103351
>
> --- Comment #89 from kernel@benjam.info <kernel@benjam.info> ---
> Correction: When I say the microcode update fixes things, I mean the
> microcode
> update on BDW solves it on BDW. The BDW microcode update won't do anything
> on
> SKL. I don't have a copy of the SKL microcode updates, but if someone has a
> UEFI binary and want me to, I could extract them.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>
Comment 91 Henrique de Moraes Holschuh 2015-10-22 11:32:55 UTC
Matthew,

You need to load the early initramfs *and* a standard initramfs.  If Lilo can only handle one, append the standard initramfs to the early initramfs (i.e. the early initramfs must come first), and tell LILO to load the result.

It doesn't work without an initramfs.  You also need all initramfs support enabled in the kernel, of course.
Comment 92 kernel@benjam.info 2015-10-22 20:02:01 UTC
(In reply to matthew.dewitt from comment #90)
> downloaded the 5700hq-ucode.tar.gz file in this thread. This file contains
> four .bin files.

Use 0x13.bin from the GitHub repository. The old tar.gz file may not even contain a correct microcode update for the Broadwell processors. I've deleted the tarfile from my server to avoid further confusion in the future. All the files that were there are in the git history anyways.
Comment 93 Carlos Alberto Lopez Perez 2015-10-23 12:31:13 UTC
Hi,

I have an i7-5775C on an Asus Z97-P motherboard. I applied the 0x13 microcode:

# dmesg|grep -i microc
[    0.000000] microcode: CPU0 microcode updated early to revision 0x13, date = 2015-08-03
[    0.074647] microcode: CPU1 microcode updated early to revision 0x13, date = 2015-08-03
[    0.092415] microcode: CPU2 microcode updated early to revision 0x13, date = 2015-08-03
[    0.110143] microcode: CPU3 microcode updated early to revision 0x13, date = 2015-08-03
[    0.557526] microcode: CPU0 sig=0x40671, pf=0x2, revision=0x13
[    0.557531] microcode: CPU1 sig=0x40671, pf=0x2, revision=0x13
[    0.557534] microcode: CPU2 sig=0x40671, pf=0x2, revision=0x13
[    0.557538] microcode: CPU3 sig=0x40671, pf=0x2, revision=0x13
[    0.557543] microcode: CPU4 sig=0x40671, pf=0x2, revision=0x13
[    0.557547] microcode: CPU5 sig=0x40671, pf=0x2, revision=0x13
[    0.557552] microcode: CPU6 sig=0x40671, pf=0x2, revision=0x13
[    0.557557] microcode: CPU7 sig=0x40671, pf=0x2, revision=0x13
[    0.557585] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba


TSX is still there :(

# grep ^flags /proc/cpuinfo|head -1
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt

# cpuid | egrep '(HLE|RTM)'|head -2
      HLE hardware lock elision                = true
      RTM: restricted transactional memory     = true
Comment 94 Carlos Alberto Lopez Perez 2015-10-23 12:35:11 UTC
I was having crashes with the NVIDIA propietary driver.
After applying a patched glibc that disabled TSX on libpthread <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800574> the crashes are gone.

However, TSX is still reported as available on the CPU as previous comment
Comment 95 Jinserk Baik 2015-11-03 12:51:26 UTC
I'm using i5-5675C with ASUS H97I-PLUS mobo. I've found ASUS update new BIOS for this mobo, ver 2605, on their official site. I've update this new BIOS, and found that microcode is not early updated any more according to the dmesg as follows:

$ dmesg |grep microcode
[    1.414138] microcode: CPU0 sig=0x40671, pf=0x2, revision=0x13
[    1.414151] microcode: CPU1 sig=0x40671, pf=0x2, revision=0x13
[    1.414166] microcode: CPU2 sig=0x40671, pf=0x2, revision=0x13
[    1.414184] microcode: CPU3 sig=0x40671, pf=0x2, revision=0x13
[    1.414269] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba


but still hle and rtm is enabled in /proc/cpuinfo:
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch arat epb pln pts dtherm intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt
Comment 96 yjcoshc 2015-11-03 13:46:44 UTC
I am using i5-5700HQ with MSI GE62 2QD laptop. I have updated the new BIOS from MSI official website. It seems hle and rtm is disabled successfully and everything works well now.

Run dmesg |grep GE62 shows:
[    0.000000] DMI: Micro-Star International Co., Ltd. GE62 2QD/MS-16J2, BIOS E16J2IMS.114 09/23/2015

microcode:
microcode: CPU0 sig=0x40671, pf=0x20, revision=0x13

/proc/cpuinfo:
processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 71
model name      : Intel(R) Core(TM) i7-5700HQ CPU @ 2.70GHz
stepping        : 1
microcode       : 0x13
cpu MHz         : 2700.105
cache size      : 6144 KB
physical id     : 0
siblings        : 8
core id         : 3
cpu cores       : 4
apicid          : 7
initial apicid  : 7
fpu             : yes
fpu_exception   : yes
cpuid level     : 20
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap xsaveopt
bugs            :
bogomips        : 5387.54
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
Comment 97 okjwukwh 2015-11-09 12:10:25 UTC
Using a Broadwell i7-5600U on OpenSuse Leap 42.1 may also provoke the lock elision error.

At the moment I cannot tell how libc is built; from the contents of the rpms I would guess that --enable-lock-elision is used.

Crashes so far only happen, when using a NVIDIA proprietary driver (tried version 352* and 349* - same result). So it may be a problem with the driver. But as of the [1] it should be ok when using a version < 352* which it's not in my case.

The microcode version is reported to be 0x21 so the workaround with the MSI 0x13 update won't work. HLE is enabled as of /proc/cpuinfo, while TSX is not present.

Also note: It seemed to work with OpenSuse 13.2 - maybe hinting at the libc/lock-elision support.


processor       : 3 
vendor_id       : GenuineIntel 
cpu family      : 6 
model           : 61 
model name      : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz 
stepping        : 4 
microcode       : 0x21 
cpu MHz         : 2053.492 
cache size      : 4096 KB 
physical id     : 0 
siblings        : 4 
core id         : 1 
cpu cores       : 2 
apicid          : 3 
initial apicid  : 3 
fpu             : yes 
fpu_exception   : yes 
cpuid level     : 20 
wp              : yes 
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse
36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constan
t_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni
 pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 s
se4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dno
wprefetch ida arat epb pln pts dtherm intel_pt tpr_shadow vnmi flexpriority ept vpid fs
gsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt 
bugs            : 
bogomips        : 5187.70 
clflush size    : 64 
cache_alignment : 64 
address sizes   : 39 bits physical, 48 bits virtual 
power management:


[1] https://www.phoronix.com/scan.php?page=news_item&px=NVIDIA-SKL-Latest-Woes
Comment 98 Carlos Alberto Lopez Perez 2015-11-09 12:39:39 UTC
(In reply to okjwukwh from comment #97)
> Using a Broadwell i7-5600U on OpenSuse Leap 42.1 may also provoke the lock
> elision error.
> 
> At the moment I cannot tell how libc is built; from the contents of the rpms
> I would guess that --enable-lock-elision is used.
> 
> Crashes so far only happen, when using a NVIDIA proprietary driver (tried
> version 352* and 349* - same result). So it may be a problem with the
> driver. But as of the [1] it should be ok when using a version < 352* which
> it's not in my case.
> 

I'm inclined to believe this is a bug on the proprietary driver.
You probably can workaround it, by building libc without HLE support.

I reported this to NVIDIA two weeks ago after the discussion at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800574#77

No reply from NVIDIA so far.

Question Reference #151027-000203
---------------------------------------------------------------
  Product Level 1: GeForce graphics
     Date Created: 10/27/2015 09:29 AM
     Last Updated: 10/27/2015 09:29 AM
           Status: New
        Choose OS: Linux/Other Unix
     Product Name: NVIDIA Driver Linux x64
   Driver Version: 352.55
             Rank: 1
         Escalate: 
---------------------------------------------------------------
The driver crashed on EGL programs when glibc is built with support for RLE and
a CPU with TSX-NI enabled instructions is used (for example: i7-5775C).

The crash seems to be caused because the driver is trying to unlock an already
unlocked lock. This is undefined behavior and a bug on the driver.
When TSX-NI is enabled on glibc it leads to a crash.

How to reproduce it?
Try to run an EGL program, for example: es2_info from mesa-demos (debian/ubuntu
package: mesa-utils-extra:) on a Linux system (ADM64) with a glibc that enables
the usage of HLE (Hardware Lock Elision) by default for libpthread when the CPU
supports that (for example  i7-5775C).
Comment 99 Henrique de Moraes Holschuh 2015-11-09 19:05:29 UTC
Intel has issued a new public microcode update datafile, which should address the broadwell MCE errata.

I will update and upload the Debian "intel-microcode" package in Debian unstable tonight, Canonical will either use that for Ubuntu, or update directly.  Eventually, these updates will be backported to our stable branches.

The new Intel microcode update package modifies these microcodes (compared to the update package from 2015-01-21):

+ sig 0x000306a9, pf mask 0x12, 2015-02-26, rev 0x001c, size 12288
+ sig 0x000306c3, pf mask 0x32, 2015-08-13, rev 0x001e, size 21504
+ sig 0x000306d4, pf mask 0xc0, 2015-09-11, rev 0x0022, size 16384
+ sig 0x000306f2, pf mask 0x6f, 2015-08-10, rev 0x0036, size 30720
+ sig 0x000306f4, pf mask 0x80, 2015-07-17, rev 0x0009, size 14336
+ sig 0x00040651, pf mask 0x72, 2015-08-13, rev 0x001d, size 20480
+ sig 0x00040671, pf mask 0x22, 2015-08-03, rev 0x0013, size 11264

If anyone has any sort of problems with these microcode updates, it would be really helpful if you report it to the distro's bugtracker (or even here, although really, this is way off-topic for the kernel bugzilla :-p).

BTW: it looks like Skylake users need to get up-to-date BIOSes from their vendors: they're still not covered by the public microcode update distribution...
Comment 100 tcl_de 2015-11-14 13:43:51 UTC
(In reply to sac from comment #54)
> Unfortunately not even the mainboard vendor BIOS changelogs are reliable. My
> Asrock H97 Performance lists an BIOS update with 2.40 "Update Microcode 19."
> (0x13h), however if you extract the microcode update with MMTool you get
> Microcode 16. (0x10h). Exactly the same that was already included in BIOS
> 2.30.

Just a quick remark: 19 in Update Microcode 19 is already hexadecimal and refers to the Haswell Microcode revision
Comment 101 sac 2015-11-15 15:35:24 UTC
(In reply to tcl_de from comment #100)
> (In reply to sac from comment #54)
> > Unfortunately not even the mainboard vendor BIOS changelogs are reliable.
> My
> > Asrock H97 Performance lists an BIOS update with 2.40 "Update Microcode
> 19."
> > (0x13h), however if you extract the microcode update with MMTool you get
> > Microcode 16. (0x10h). Exactly the same that was already included in BIOS
> > 2.30.
> 
> Just a quick remark: 19 in Update Microcode 19 is already hexadecimal and
> refers to the Haswell Microcode revision

Thanks, but for 1150 Asrock H97 Performance users it would be great to get an Microcode update as well (and not to be stuck with 0671/10 from 2015/05/07 that suffers not only from these MCE but also elision lock crashes).

BTW: thanks to UBU I was able to inject Microcode 0671/13 into the UEFI. MCE still fixed, but still elision lock crashes (as confirmed earlier in this thread for this Microcode).
Comment 102 tcl_de 2015-11-15 15:52:54 UTC
(In reply to sac from comment #101)
> Thanks, but for 1150 Asrock H97 Performance users it would be great to get
> an Microcode update as well (and not to be stuck with 0671/10 from
> 2015/05/07 that suffers not only from these MCE but also elision lock
> crashes).

I fully agree and suggest that you contact Asrock Support about it. My experience with them has been quite good.
BTW: I'm not affiliated with Asrock, I just happen to own an Asrock H97 Haswell system.
Comment 103 Alessandro Belloni 2015-11-22 21:43:54 UTC
have an i7-5775c with an asrock z97 pro4 with latest bios 2.20, i'm unable to make it work with any linux except fedora 22 that works and with Windows 10.

Would like to use my preferred distro that is Ubuntu 15.10 but it seems impossible, does anyone has any suggestion? i did tried the FIX OC Core shared by Phoronix but won't work.

http://www.phoronix.com/scan.php?page=news_item&px=core-i7-5775c-oc-fixed-mode

please help me.
Comment 104 kernel@benjam.info 2015-11-22 23:34:39 UTC
(In reply to Alessandro Belloni from comment #103)
> have an i7-5775c with an asrock z97 pro4 with latest bios 2.20, i'm unable
> to make it work with any linux except fedora 22 that works and with Windows
> 10.
> 
> Would like to use my preferred distro that is Ubuntu 15.10 but it seems
> impossible, does anyone has any suggestion? i did tried the FIX OC Core
> shared by Phoronix but won't work.

Assuming you can at least get it to boot, I'd recommend either:

a) Manually download and install the version of intel-microcode from Ubuntu 16.04: http://packages.ubuntu.com/xenial/intel-microcode
b) Follow the Linux instructions here: https://github.com/bgw/bdw-ucode-update-tool

Normally, I wouldn't recommend installing Ubuntu 16.04 packages on Ubuntu 15.10, but intel-microcode is just some firmware, shell scripts, and configuration files, so it should be pretty safe.
Comment 105 Alessandro Belloni 2015-11-25 18:49:49 UTC
after some email to asrock support they provided me the 2.40 version, and i was able to disable intel speedstep an turbo technology and to enable oc fixed mode, then i will test ubuntu, i was able to to see it working at least in usb key
Comment 106 Bluhd 2016-01-09 23:40:17 UTC
Confirmed working for i7-5700HQ.
Thank you all so much!
Comment 107 Zhang Rui 2016-05-09 05:57:21 UTC
As there is quite a lot of discussion in this thread, and the original bug reporter has no response since last Oct, according to the previous commment, my questions is "do we still have any other open issues? can we close this bug report?"
Comment 108 kernel@benjam.info 2016-05-10 04:26:59 UTC
(In reply to Zhang Rui from comment #107)
> As there is quite a lot of discussion in this thread, and the original bug
> reporter has no response since last Oct, according to the previous commment,
> my questions is "do we still have any other open issues? can we close this
> bug report?"

I think this bug was fixed by the microcode updates, which have now been officially packaged and released by Intel, and are being picked up by distributions. There was also some discussion about avoiding the bug with some glibc patches to avoid the suspected bad CPU instructions.

I don't think this was ever a kernel bug, and I think the problem and it's solutions are pretty well documented at this point. I believe it's safe to close this issue.
Comment 109 Zhang Rui 2016-05-10 05:38:08 UTC
Great, thanks for your clarification.
Comment 110 Henrique de Moraes Holschuh 2016-11-22 16:52:27 UTC
(In reply to matthew.dewitt from comment #90)

> bash-4.2# dmesg | grep micro
> [ 0.598123] microcode: CPU0 sig=0x40671, pf=0x2, revision=0x19
> [ 0.598315] microcode: CPU1 sig=0x40671, pf=0x2, revision=0x19
> [ 0.598501] microcode: CPU2 sig=0x40671, pf=0x2, revision=0x19

Matthew, I know this is an unexpected reply to an old bug report, but I just now noticed something weird in your debug log.

Intel released microcode revision 0x16 for sig=0x40671 in 2016-04-29, and they have always issued microcode releases incrementally in the past.

However, your kernel log claims that you have processors running microcode revision 0x19 for sig=0x40671.  This microcode is *not supposed to exist* in 2015-10, as far I can tell.

Can you tell us what computer, motherboard and BIOS is this?

Were you running that Linux kernel in a virtual machine?
Comment 111 matthew.dewitt 2016-11-22 17:02:02 UTC
Created attachment 245701 [details]
attachment-5053-0.html

Hello,

Unfortunately, I have replaced the machine with a Skylake processor, so I
cannot check.

However, the machine in question was a Sager/Clevo. I don't remember the
exact model, but i think it was the first iteration of the Sager NP8657-S
(Clevo P650RE3) (might have been the P650RE (not the RE3).

On Tue, Nov 22, 2016 at 11:52 AM, <bugzilla-daemon@bugzilla.kernel.org>
wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=103351
>
> --- Comment #110 from Henrique de Moraes Holschuh <hmh@hmh.eng.br> ---
> (In reply to matthew.dewitt from comment #90)
>
> > bash-4.2# dmesg | grep micro
> > [ 0.598123] microcode: CPU0 sig=0x40671, pf=0x2, revision=0x19
> > [ 0.598315] microcode: CPU1 sig=0x40671, pf=0x2, revision=0x19
> > [ 0.598501] microcode: CPU2 sig=0x40671, pf=0x2, revision=0x19
>
> Matthew, I know this is an unexpected reply to an old bug report, but I
> just
> now noticed something weird in your debug log.
>
> Intel released microcode revision 0x16 for sig=0x40671 in 2016-04-29, and
> they
> have always issued microcode releases incrementally in the past.
>
> However, your kernel log claims that you have processors running microcode
> revision 0x19 for sig=0x40671.  This microcode is *not supposed to exist*
> in
> 2015-10, as far I can tell.
>
> Can you tell us what computer, motherboard and BIOS is this?
>
> Were you running that Linux kernel in a virtual machine?
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>
Comment 112 matthew.dewitt 2016-11-22 17:07:12 UTC
Created attachment 245711 [details]
attachment-5257-0.html

Hello,

The original machine was a SAGER NP8651-S (CLEVO P650SE).

http://forum.notebookreview.com/threads/sager-np8651-clevo-p650se-with-gtx-970m-htwingnuts-review.765376/

These are the detailed specs.

Hope this helps

On Tue, Nov 22, 2016 at 12:01 PM, Matthew DeWitt <matthew.dewitt@gmail.com>
wrote:

> Hello,
>
> Unfortunately, I have replaced the machine with a Skylake processor, so I
> cannot check.
>
> However, the machine in question was a Sager/Clevo. I don't remember the
> exact model, but i think it was the first iteration of the Sager NP8657-S
> (Clevo P650RE3) (might have been the P650RE (not the RE3).
>
> On Tue, Nov 22, 2016 at 11:52 AM, <bugzilla-daemon@bugzilla.kernel.org>
> wrote:
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=103351
>>
>> --- Comment #110 from Henrique de Moraes Holschuh <hmh@hmh.eng.br> ---
>> (In reply to matthew.dewitt from comment #90)
>>
>> > bash-4.2# dmesg | grep micro
>> > [ 0.598123] microcode: CPU0 sig=0x40671, pf=0x2, revision=0x19
>> > [ 0.598315] microcode: CPU1 sig=0x40671, pf=0x2, revision=0x19
>> > [ 0.598501] microcode: CPU2 sig=0x40671, pf=0x2, revision=0x19
>>
>> Matthew, I know this is an unexpected reply to an old bug report, but I
>> just
>> now noticed something weird in your debug log.
>>
>> Intel released microcode revision 0x16 for sig=0x40671 in 2016-04-29, and
>> they
>> have always issued microcode releases incrementally in the past.
>>
>> However, your kernel log claims that you have processors running microcode
>> revision 0x19 for sig=0x40671.  This microcode is *not supposed to exist*
>> in
>> 2015-10, as far I can tell.
>>
>> Can you tell us what computer, motherboard and BIOS is this?
>>
>> Were you running that Linux kernel in a virtual machine?
>>
>> --
>> You are receiving this mail because:
>> You are on the CC list for the bug.
>>
>
>
Comment 113 Henrique de Moraes Holschuh 2016-11-22 17:55:23 UTC
Thanks for the fast reply!

Unfortunately, I could not find a BIOS image for that machine to download, anywhere.  It looks like we will be left wondering what the heck was that strange microcode revision...

For the record, the reason why you could not update your microcode was that the processor was already reporting that it had a higher revision (0x19) -- be it a bug or the real deal -- and Linux won't even *attempt* to downgrade microcode, so it would skip the microcode update process.
Comment 114 matthew.dewitt 2016-11-22 17:57:25 UTC
Created attachment 245721 [details]
attachment-2487-0.html

Np, and thank you for the answer!

If I ever find more information on the BIOS, I'll let you know.

On Tue, Nov 22, 2016 at 12:55 PM, <bugzilla-daemon@bugzilla.kernel.org>
wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=103351
>
> --- Comment #113 from Henrique de Moraes Holschuh <hmh@hmh.eng.br> ---
> Thanks for the fast reply!
>
> Unfortunately, I could not find a BIOS image for that machine to download,
> anywhere.  It looks like we will be left wondering what the heck was that
> strange microcode revision...
>
> For the record, the reason why you could not update your microcode was
> that the
> processor was already reporting that it had a higher revision (0x19) -- be
> it a
> bug or the real deal -- and Linux won't even *attempt* to downgrade
> microcode,
> so it would skip the microcode update process.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>
Comment 115 muriloq 2016-11-22 18:16:38 UTC
Henrique, please post here if you find anything. I've bricked a Clevo P650SG after flashing the wrong microcode update, and after getting a replacement I gave up trying to run Linux on it.

Note You need to log in before you can comment on or make changes to this bug.