Latest working kernel version: 2.6.26-rc5-git6 (maybe) Earliest failing kernel version: 2.6.26-rc6-git2 (problem encountered here) Distribution: Ubuntu 8.04 Hardware Environment: Lenovo 3000 Y200 (6469-22R) (more info in attachments) Software Environment: Linux version 2.6.26-rc6-git2-puuh (t-om@puuh) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #1 SMP PREEMPT Sat Jun 14 23:56:01 MYT 2008 (more info in attachments) Problem Description: Jun 15 22:49:34 puuh kernel: ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16 Jun 15 22:49:34 puuh kernel: NetworkManager: page allocation failure. order:4, mode:0x40d0 Jun 15 22:49:34 puuh kernel: Pid: 14742, comm: NetworkManager Not tainted 2.6.26-rc6-git2-puuh #1 Jun 15 22:49:34 puuh kernel: Jun 15 22:49:34 puuh kernel: Call Trace: Jun 15 22:49:34 puuh kernel: [__alloc_pages_internal+0x38f/0x480] __alloc_pages_internal+0x38f/0x480 Jun 15 22:49:34 puuh kernel: [fuse:__get_free_pages+0x15/0x7e0] __get_free_pages+0x15/0x60 Jun 15 22:49:34 puuh kernel: [iwl3945:iwl3945_tx_queue_init+0x94/0x1d0] :iwl3945:iwl3945_tx_queue_init+0x94/0x1d0 Jun 15 22:49:34 puuh kernel: [iwl3945:iwl3945_txq_ctx_reset+0x372/0x580] :iwl3945:iwl3945_txq_ctx_reset+0x372/0x580 Jun 15 22:49:34 puuh kernel: [iwl3945:iwl3945_hw_nic_init+0xe7f/0x1360] :iwl3945:iwl3945_hw_nic_init+0xe7f/0x1360 Jun 15 22:49:34 puuh kernel: [iwl3945:__iwl3945_up+0x135/0xe30] :iwl3945:__iwl3945_up+0x135/0xe30 Jun 15 22:49:34 puuh kernel: [iwl3945:iwl3945_isr+0x0/0x380] :iwl3945:iwl3945_isr+0x0/0x380 Jun 15 22:49:34 puuh kernel: [firewire_ohci:request_irq+0xd5/0x9f40] request_irq+0xd5/0xf0 Jun 15 22:49:34 puuh kernel: [iwl3945:iwl3945_mac_start+0x940/0xdb0] :iwl3945:iwl3945_mac_start+0x940/0xdb0 Jun 15 22:49:34 puuh kernel: [source_load+0x34/0x70] source_load+0x34/0x70 Jun 15 22:49:34 puuh kernel: [acpi_cpufreq:__next_cpu+0x1a/0x30] __next_cpu+0x1a/0x30 Jun 15 22:49:34 puuh kernel: [find_busiest_group+0x1da/0x820] find_busiest_group+0x1da/0x820 Jun 15 22:49:34 puuh kernel: [mac80211:ieee80211_open+0x1a0/0x5c0] :mac80211:ieee80211_open+0x1a0/0x5c0 Jun 15 22:49:34 puuh kernel: [af_packet:local_bh_disable+0xe/0x20] local_bh_disable+0xe/0x20 Jun 15 22:49:34 puuh kernel: [mac80211:dev_open+0x59/0x710] dev_open+0x59/0xb0 Jun 15 22:49:34 puuh kernel: [dev_change_flags+0x9f/0x1d0] dev_change_flags+0x9f/0x1d0 Jun 15 22:49:34 puuh kernel: [do_setlink+0x304/0x3f0] do_setlink+0x304/0x3f0 Jun 15 22:49:34 puuh kernel: [cfg80211:nla_parse+0x2f/0xb10] nla_parse+0x2f/0xe0 Jun 15 22:49:34 puuh kernel: [rtnl_setlink+0xe1/0x130] rtnl_setlink+0xe1/0x130 Jun 15 22:49:34 puuh kernel: [rtnetlink_rcv_msg+0x0/0x240] rtnetlink_rcv_msg+0x0/0x240 Jun 15 22:49:34 puuh kernel: [netlink_rcv_skb+0x6c/0x90] netlink_rcv_skb+0x6c/0x90 Jun 15 22:49:34 puuh kernel: [rtnetlink_rcv+0x18/0x20] rtnetlink_rcv+0x18/0x20 Jun 15 22:49:34 puuh kernel: [cfg80211:netlink_unicast+0x298/0x1ec0] netlink_unicast+0x298/0x2d0 Jun 15 22:49:34 puuh kernel: [netlink_sendmsg+0x1fd/0x2b0] netlink_sendmsg+0x1fd/0x2b0 Jun 15 22:49:34 puuh kernel: [sock_sendmsg+0xcb/0x100] sock_sendmsg+0xcb/0x100 Jun 15 22:49:34 puuh kernel: [<ffffffff8024a3a0>] autoremove_wake_function+0x0/0x30 Jun 15 22:49:34 puuh kernel: [__link_path_walk+0xd75/0xf40] __link_path_walk+0xd75/0xf40 Jun 15 22:49:34 puuh kernel: [enqueue_hrtimer+0x6e/0x100] enqueue_hrtimer+0x6e/0x100 Jun 15 22:49:34 puuh kernel: [fget_light+0xbd/0xf0] fget_light+0xbd/0xf0 Jun 15 22:49:34 puuh kernel: [move_addr_to_kernel+0x2e/0x40] move_addr_to_kernel+0x2e/0x40 Jun 15 22:49:34 puuh kernel: [verify_iovec+0x2f/0xa0] verify_iovec+0x2f/0xa0 Jun 15 22:49:34 puuh kernel: [sys_sendmsg+0x20e/0x360] sys_sendmsg+0x20e/0x360 Jun 15 22:49:34 puuh kernel: [snd_hda_intel:_spin_unlock_irqrestore+0x12/0x40] _spin_unlock_irqrestore+0x12/0x40 Jun 15 22:49:34 puuh kernel: [netlink_insert+0x9a/0x160] netlink_insert+0x9a/0x160 Jun 15 22:49:34 puuh kernel: [fget_light+0xbd/0xf0] fget_light+0xbd/0xf0 Jun 15 22:49:34 puuh kernel: [move_addr_to_user+0x5d/0x70] move_addr_to_user+0x5d/0x70 Jun 15 22:49:34 puuh kernel: [sys_getsockname+0xd9/0xe0] sys_getsockname+0xd9/0xe0 Jun 15 22:49:34 puuh kernel: [system_call_after_swapgs+0x7b/0x80] system_call_after_swapgs+0x7b/0x80 Jun 15 22:49:34 puuh kernel: Jun 15 22:49:34 puuh kernel: Mem-info: Jun 15 22:49:34 puuh kernel: DMA per-cpu: Jun 15 22:49:34 puuh kernel: CPU 0: hi: 0, btch: 1 usd: 0 Jun 15 22:49:34 puuh kernel: CPU 1: hi: 0, btch: 1 usd: 0 Jun 15 22:49:34 puuh kernel: DMA32 per-cpu: Jun 15 22:49:34 puuh kernel: CPU 0: hi: 186, btch: 31 usd: 0 Jun 15 22:49:34 puuh kernel: CPU 1: hi: 186, btch: 31 usd: 0 Jun 15 22:49:34 puuh kernel: Active:169972 inactive:26410 dirty:280 writeback:0 unstable:0 Jun 15 22:49:34 puuh kernel: free:5323 slab:37217 mapped:20025 pagetables:4871 bounce:0 Jun 15 22:49:34 puuh kernel: DMA free:3984kB min:40kB low:48kB high:60kB active:4kB inactive:0kB present:11044kB pages_scanned:0 all_unreclaimable? no Jun 15 22:49:34 puuh kernel: lowmem_reserve[]: 0 984 984 984 Jun 15 22:49:34 puuh kernel: DMA32 free:17308kB min:3992kB low:4988kB high:5988kB active:679884kB inactive:105640kB present:1008352kB pages_scanned:0 all_unreclaimable? no Jun 15 22:49:34 puuh kernel: lowmem_reserve[]: 0 0 0 0 Jun 15 22:49:34 puuh kernel: DMA: 24*4kB 24*8kB 7*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3984kB Jun 15 22:49:34 puuh kernel: DMA32: 3863*4kB 90*8kB 22*16kB 4*32kB 3*64kB 2*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 17356kB Jun 15 22:49:34 puuh kernel: 64999 total pagecache pages Jun 15 22:49:34 puuh kernel: Swap cache: add 41222, delete 27361, find 19238/22059 Jun 15 22:49:34 puuh kernel: Free swap = 1918280kB Jun 15 22:49:34 puuh kernel: Total swap = 2000052kB Jun 15 22:49:34 puuh kernel: 259698 pages of RAM Jun 15 22:49:34 puuh kernel: 5172 reserved pages Jun 15 22:49:34 puuh kernel: 107775 pages shared Jun 15 22:49:34 puuh kernel: 13861 pages swap cached Jun 15 22:49:43 puuh kernel: ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16 (more info in attachments) Steps to reproduce: Still working on it. More detailed information later.
Created attachment 16488 [details] Hardware info
Created attachment 16489 [details] Software info
Created attachment 16490 [details] /var/log/messages From boot until what was saved before lockup.
Created attachment 16491 [details] /var/log/kern.log From boot until what was saved before lockup.
This entry is being used for tracking a regression from 2.6.25. Please don't close it until the problem is fixed in the mainline.
Er.. just remembered that the kernel was not completely vanilla mainline; there was this patch from teknohog to modify acpi-cpufreq into a phc capable one for current kernels [0] applied and the voltages were set to "10 1" from the default "25 19" (due to apparently buggy ACPI BIOS there are only two very low frequency possibilities). These settings have not caused any trouble for a pretty long sequence of kernels. [0] http://koti.tnnet.fi/teknohog/hacks/linux-phc-kernel-vanilla-2.6.25-teknohog.patch It would have been very natural that the encountered problems were due to too low voltage settings that would have somehow been triggered in the most recent kernel. However, stress testing using two burnP6s and stress -vm 2 with voltage table settings stepped down over one hour time to even lower "0 0" did not bring up any similar page allocation or other problems.
Sigh, the previous stress test was done with the 2.6.26-rc5-git6 kernel and not on the problematic rc6-git2. Anyhow now it is more likely that rc5-git6 indeed is more stable one. Managed to trigger the problem two times again in rc6-git2 through similar normal use as on the first time. Investigating.. (As a side track, there is something odd going on in ACPI thermal probes. Sometimes on both rc5-git6 and rc6-git2 acpi -t shows 144°C while the computer feels normal temperature wise. In Ubuntu's precompiled kernel (2.6.24-19) acpi -t shows similarly odd 1°C. Normal values are around 50°C. This is correctly shown by the coretemp module. Mentioning this since the page-allocation related problem could also be because of overheating but this does not seem to the case.)
Please raise a separate report against acp for the thermal problems?
OK. The ACPI thermal reporting problem is in #10934 [0]. Have not seen those thermal reporting problems since though. [0] http://bugzilla.kernel.org/show_bug.cgi?id=10934 Could not reproduce the page allocation related problem in vanilla rc6-git3 (without phc patch) but tested without power saving optimizations in /etc/rc.local. Currently testing vanilla rc6-git5 (without phc patch) with power saving optimizations in /etc/rc.local which are as follows: -- echo 1 > /sys/devices/system/cpu/sched_mc_power_savings echo 1500 > /proc/sys/vm/dirty_writeback_centisecs echo 5 > /proc/sys/vm/laptop_mode for f in /sys/bus/usb/devices/*/power/autosuspend; do if [ -f "$f" ]; then echo 0 > $f; fi; done for f in /sys/bus/usb/devices/*/power/level; do if [ -f "$f" ]; then echo "auto" > $f; fi; done [ -d /sys/bus/pci/drivers/iwl3945 ] && echo 5 > /sys/bus/pci/drivers/iwl3945/0000:01:00.0/power_level [ -d /sys/module/snd_hda_intel ] && echo 10 > /sys/module/snd_hda_intel/parameters/power_save [ ! -z "`hciconfig`" ] && hciconfig hci0 down modprobe -Qr hci_usb firewire-ohci firewire-sbp2 b44 -- One hour gone and no similar problems have occurred.
Five hours gone, running vanilla 2.6.26-rc6-git5 with above power consumption optimizations. No similar problems have occurred. Tomorrow will test with the phc patch to see if the low voltage settings were the culprit after all.
Two sessions with 2.6.26-rc6-git6 with above power consumption optimizations and phc patch with voltage table settings at "0 0" gone without similar page allocation related problems even with stress tests alongside with normal use. I am inclined to feel that the cause for the problem has either been fixed or I am unable to reproduce the exact conditions for it to occur again. (As a side track, got a new kind of firmware error from iwl3945 but it was able to reset back to normal. It is now documented at #1686 of vendor's bugzilla [0].) [0] http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=1686
Unable to reproduce, closing.