BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
03:00.0 Network controller: Broadcom Corporation BCM4331 802.11a/b/g/n (rev 02)
Using latest Apple OS X and firmware results in memory corruption, symptoms are many errors, including:
- 2 file system corruptions leaving the root fs unbootable in 1 month
- processes randomly segfault
- memory errors, especially at shutdown or reboot, resulting in hangs (postgresql "PANIC: stuck spinlock" etc.)
- BUG: Bad page map in process sshd (and others)
- BUG: Bad rss-counter state
- BUG: unable to handle kernel NULL pointer dereference
- swap_free: Bad swap file entry
- Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
After enabling IOMMU (iommu=force intel_iommu=on) it reports DMAR errors on most boots:
[ 41.971617] DMAR: DRHD: handling fault status reg 3
[ 41.971629] DMAR: DMAR:[DMA Write] Request device [03:00.0] fault addr 85229000
DMAR:[fault reason 01] Present bit in root entry is clear
One boot log shows this error occurring 0.7 seconds in to the boot so there is a strong possibility of memory corruption occurring before the wifi driver can turn off the DMA.
Apparently this bug was found before in 2012 - https://mjg59.dreamwidth.org/11235.html - comments suggest that Apple fixed it since then, so this could be a regression. On the other hand, maybe nobody noticed it since the bugs are so random and Intel IOMMU isn't enabled by default.
Obviously this is a bug in the firmware so the options are limited, though enabling IOMMU by default would help. Maybe there should be a policy to taint the kernel when a known bad firmware version is detected.
New problem though if this isn't true anymore "Yes, this seems to stop once the driver is loaded."
It stops when the driver is loaded but that can take several seconds and until then memory can be corrupted. There is also the issue of people who blacklist or don't install the driver for whatever reason.
Fixed in Linux 4.7 with:
Fixed in stable kernels 4.6.6, 4.4.17, 4.1.30, 3.18.39
Fixed in upcoming stable kernels 3.16.39, 3.2.84