Bug 18742 - PROBLEM: Kernel panic on 2.6.36-rc4 when loading intel_ips on Core i3 laptop
PROBLEM: Kernel panic on 2.6.36-rc4 when loading intel_ips on Core i3 laptop
Status: CLOSED CODE_FIX
Product: Platform Specific/Hardware
Classification: Unclassified
Component: i386
All Linux
: P1 normal
Assigned To: platform_i386
:
Depends on:
Blocks: 16444
  Show dependency treegraph
 
Reported: 2010-09-18 19:24 UTC by Maciej Rutecki
Modified: 2010-10-17 19:47 UTC (History)
8 users (show)

See Also:
Kernel Version: 2.6.36-rc4
Tree: Mainline
Regression: Yes


Attachments
don't toggle CPU turbo bit if not supported (2.15 KB, patch)
2010-09-23 22:18 UTC, Jesse Barnes
Details | Diff

Description Maciej Rutecki 2010-09-18 19:24:42 UTC
Subject    : PROBLEM: Kernel panic on 2.6.36-rc4 when loading intel_ips on Core i3 laptop
Submitter  : infernix <infernix@infernix.net>
Date       : 2010-09-15 14:35
Message-ID : 4C90D998.6050103@infernix.net
References : http://marc.info/?l=linux-kernel&m=128456187928496&w=2

This entry is being used for tracking a regression from 2.6.35. Please don't
close it until the problem is fixed in the mainline.
Comment 1 infernix 2010-09-18 21:29:03 UTC
Here's the output with i915 driver loaded first:



[   18.643282] [drm] Initialized drm 1.1.0 20060810
[   18.833429] i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[   18.928573] mtrr: no more MTRRs available
[   18.928712] [drm] MTRR allocation failed.  Graphics performance may suffer.
[   18.928866] [drm] detected 127M stolen memory, trimming to 32M
[   18.929075] [drm] set up 32M of stolen space
[   19.051336] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[   19.953428] Console: switching to colour frame buffer device 170x48
[   19.958369] fb0: inteldrmfb frame buffer device
[   19.958430] drm: registered panic notifier
[   19.979870] acpi device:01: registered as cooling_device1
[   19.981027] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input8
[   19.981222] ACPI: Video Device [GFX0] (multi-head: yes  rom: no post: no)
[   19.981553] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[   38.496996] intel ips 0000:00:1f.6: Warning: CPU TDP doesn't match expected value (found 11, expected 18)
[   38.497142] intel ips 0000:00:1f.6: PCI INT A -> GSI 21 (level, low) -> IRQ 21
[   38.497768] intel ips 0000:00:1f.6: IPS driver initialized, MCP temp limit 65535
[   43.693383] general protection fault: 0000 [#1] PREEMPT SMP
[   43.693496] last sysfs file: /sys/bus/pci/drivers/intel ips/uevent
[   43.693571] CPU 0
[   43.693613] Modules linked in: intel_ips i915 drm_kms_helper drm i2c_algo_bit nfsd exportfs nfs lockd fscache auth_rpcgss sunrpc ipv6 fuse loop uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 option usb_wwan usb_storage usbserial arc4 snd_hda_codec_intelhdmi snd_hda_codec_realtek ecb snd_hda_intel snd_hda_codec iwlagn snd_hwdep iwlcore ehci_hcd mac80211 snd_pcm_oss snd_mixer_oss snd_pcm snd_timer cfg80211 usbcore snd psmouse i2c_i801 acer_wmi soundcore video evdev processor i2c_core serio_raw led_class pcspkr intel_agp battery backlight output ac nls_base snd_page_alloc rfkill button thermal wmi thermal_sys agpgart
[   43.703302]
[   43.705501] Pid: 952, comm: ips-adjust Not tainted 2.6.36-rc4 #4 Base Board Product Name/Aspire 1830T
[   43.707810] RIP: 0010:[<ffffffffa00750e9>]  [<ffffffffa00750e9>] do_disable_cpu_turbo+0x29/0x30 [intel_ips]
[   43.710193] RSP: 0018:ffff8801354f7e78  EFLAGS: 00010002
[   43.712561] RAX: 0000000100000009 RBX: ffffffffa00750c0 RCX: 0000000000000199
[   43.714967] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88013693c480
[   43.717384] RBP: ffff88013693c480 R08: ffffffff81655120 R09: 0000000000000000
[   43.719819] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   43.722266] R13: 0000000080000000 R14: 000000000000347c R15: 00000000000001ac
[   43.724719] FS:  0000000000000000(0000) GS:ffff880001800000(0000) knlGS:0000000000000000
[   43.727197] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   43.729687] CR2: 0000000001c32008 CR3: 0000000136861000 CR4: 00000000000006f0
[   43.732223] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   43.734780] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   43.737306] Process ips-adjust (pid: 952, threadinfo ffff8801354f6000, task ffff880134c23330)
[   43.739850] Stack:
[   43.742359]  ffffffff8104538c ffff88013693c480 ffff88013693c4c0 0000000000000000
[   43.742455] <0> ffffffffa00756ca ffff8801354f7f00 ffff88013693c480 ffffffffa0075410
[   43.745066] <0> ffff8801355a7c98 ffff8801354f7f00 ffff88013693c480 ffffffffa0075410
[   43.750191] Call Trace:
[   43.752711]  [<ffffffff8104538c>] ? on_each_cpu+0x3c/0x80
[   43.755232]  [<ffffffffa00756ca>] ? ips_adjust+0x2ba/0x3f0 [intel_ips]
[   43.757763]  [<ffffffffa0075410>] ? ips_adjust+0x0/0x3f0 [intel_ips]
[   43.760281]  [<ffffffffa0075410>] ? ips_adjust+0x0/0x3f0 [intel_ips]
[   43.762767]  [<ffffffff8105a846>] ? kthread+0x96/0xa0
[   43.765217]  [<ffffffff81003b94>] ? kernel_thread_helper+0x4/0x10
[   43.767642]  [<ffffffff8105a7b0>] ? kthread+0x0/0xa0
[   43.770054]  [<ffffffff81003b90>] ? kernel_thread_helper+0x0/0x10
[   43.772442] Code: 00 00 b9 99 01 00 00 0f 32 48 c1 e2 20 89 c0 48 09 c2 48 b8 00 00 00 00 01 00 00 00 48 85 c2 75 0c 48 09 d0 48 89 c2 48 c1 ea 20 <0f> 30 f3 c3 0f 1f 00 0f b7 c6 66 d1 ef 6b c0 64 d1 f8 8d 04 07
[   43.778529] RIP  [<ffffffffa00750e9>] do_disable_cpu_turbo+0x29/0x30 [intel_ips]
[   43.781293]  RSP <ffff8801354f7e78>
[   43.784029] ---[ end trace 43421f031df51cfd ]---
[   43.786774] note: ips-adjust[952] exited with preempt_count 1
Comment 2 infernix 2010-09-18 21:30:13 UTC
Note that comment #1 is with maxcpus=1; with SMP it panics.
Comment 3 infernix 2010-09-21 11:48:10 UTC
I don't think I have mentioned this, but I'm testing on amd64. I have not tried an i386 kernel build. And it's still unresolved, there's nothing changed in 2.6.36-rc5 related to the intel_ips code.
Comment 4 Chuck Ebbert 2010-09-23 14:58:11 UTC
The failing insn is:
   0:	0f 30                	wrmsr

So it looks like it's trying to write to an MSR (0x199) that doesn't exist on that CPU.
Comment 5 Jesse Barnes 2010-09-23 20:40:12 UTC
Just sent this to you via email; hope it works (testing on my machine now to make sure it doesn't disable turbo unnecessarily).

diff --git a/drivers/platform/x86/intel_ips.c b/drivers/platform/x86/intel_ips.c
index ec72e80..dfa1587 100644
--- a/drivers/platform/x86/intel_ips.c
+++ b/drivers/platform/x86/intel_ips.c
@@ -1347,7 +1347,7 @@ static struct ips_mcp_limits *ips_detect_cpu(struct ips_dr
         * turbo will still be available.
         */
        if (!(misc_en & IA32_MISC_TURBO_EN))
-               ; /* add turbo MSR write allowed flag if necessary */
+               return NULL;
 
        if (strstr(boot_cpu_data.x86_model_id, "CPU       M"))
                limits = &ips_sv_limits;
Comment 6 Jesse Barnes 2010-09-23 22:18:19 UTC
Created attachment 31112 [details]
don't toggle CPU turbo bit if not supported

Here's the patch I posted to the x86 platform driver list
Comment 7 infernix 2010-09-23 22:24:18 UTC
The patch in comment 5 just gives:

[  354.575841] intel ips 0000:00:1f.6: IPS not supported on this CPU

That's not right I think, so reverted that.

The patch in comment 6 gives the following result when applied:

[  597.819403] intel ips 0000:00:1f.6: Warning: CPU TDP doesn't match expected value (found 11, expected 18)
[  597.819454] intel ips 0000:00:1f.6: PCI INT A -> GSI 21 (level, low) -> IRQ 21
[  597.822532] intel ips 0000:00:1f.6: IPS driver initialized, MCP temp limit 65535
[  603.019781] intel ips 0000:00:1f.6: MCP power or thermal limit exceeded

No kernel panic in SMP mode. So I suppose the problem is solved now. 

Thanks :)
Comment 8 Rafael J. Wysocki 2010-09-23 22:48:36 UTC
Handled-By : Jesse Barnes <jbarnes@virtuousgeek.org>
Patch : https://bugzilla.kernel.org/attachment.cgi?id=31112
Comment 9 Rafael J. Wysocki 2010-10-17 19:47:47 UTC
Fixed by commit 96f3823f537088c13735cfdfbf284436c802352a .

Note You need to log in before you can comment on or make changes to this bug.