Latest working kernel version: 2.6.24.4 (from Debian) Earliest failing kernel version: 2.6.25 (vanilla) Distribution: Debian Problem Description: While booting the new 2.6.25 Kernel, it enters an infinite looping displaying "b44: eth0: powering down PHY". The system isn't freezed as magick SysRq keys works, but it just stay displaying those messages. I am unable to dump any information using SysRq, however (as the b44(...) messages are too fast). I will attach lspci output and my .config
Created attachment 15794 [details] Current .config file
Created attachment 15795 [details] lspci output
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Thu, 17 Apr 2008 17:08:27 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10473 > > Summary: Infinite loop "b44: eth0: powering down PHY" > Product: Drivers > Version: 2.5 > KernelVersion: 2.6.25 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: Network > AssignedTo: jgarzik@pobox.com > ReportedBy: naoliv@gmail.com > > > Latest working kernel version: 2.6.24.4 (from Debian) > Earliest failing kernel version: 2.6.25 (vanilla) > Distribution: Debian > Problem Description: > > While booting the new 2.6.25 Kernel, it enters an infinite looping displaying > "b44: eth0: powering down PHY". > The system isn't freezed as magick SysRq keys works, but it just stay > displaying those messages. I am unable to dump any information using SysRq, > however (as the b44(...) messages are too fast). > > I will attach lspci output and my .config > Apparently a regression.
CCed Gary (the b44 maintainer). Not sure why I am actually CCed :) On Friday 18 April 2008 02:12:15 Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Thu, 17 Apr 2008 17:08:27 -0700 (PDT) > bugme-daemon@bugzilla.kernel.org wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=10473 > > > > Summary: Infinite loop "b44: eth0: powering down PHY" > > Product: Drivers > > Version: 2.5 > > KernelVersion: 2.6.25 > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: high > > Priority: P1 > > Component: Network > > AssignedTo: jgarzik@pobox.com > > ReportedBy: naoliv@gmail.com > > > > > > Latest working kernel version: 2.6.24.4 (from Debian) > > Earliest failing kernel version: 2.6.25 (vanilla) > > Distribution: Debian > > Problem Description: > > > > While booting the new 2.6.25 Kernel, it enters an infinite looping > displaying > > "b44: eth0: powering down PHY". > > The system isn't freezed as magick SysRq keys works, but it just stay > > displaying those messages. I am unable to dump any information using SysRq, > > however (as the b44(...) messages are too fast). > > > > I will attach lspci output and my .config > > > > Apparently a regression. Can you add a dump_stack() call to the b44_halt() function and post the resulting logs?
Hi! On Fri, Apr 18, 2008 at 11:06 AM, Michael Buesch <mb@bu3sch.de> wrote: > > > While booting the new 2.6.25 Kernel, it enters an infinite looping > displaying > > > "b44: eth0: powering down PHY". > > > The system isn't freezed as magick SysRq keys works, but it just stay > > > displaying those messages. I am unable to dump any information using > SysRq, > > > however (as the b44(...) messages are too fast). > > > > > > I will attach lspci output and my .config > > > > > > > Apparently a regression. > > Can you add a dump_stack() call to the b44_halt() function and post the > resulting logs? What I get is: Pid: 4, comm: ksoftirqd/0 Tainted: GF 2.6.25-naoliv1 #2 [<f8992420>] [<b0231ffd>] [<b02380b9>] [<b011ed60>] [<b01060f2>] [<b011f059>] [<b011efe1>] [<b01297c8>] [<b0129790>] [<b010473f>] Something is saying to me that this won't help too much and that probably I need to enable something related with debug (what do I need to enable, please?) BTW, is it a way to "pause" the messages after dump_stack()? Thank you! Best regards, Nelson
On Friday 18 April 2008 17:23:06 Nelson A. de Oliveira wrote: > Hi! > > On Fri, Apr 18, 2008 at 11:06 AM, Michael Buesch <mb@bu3sch.de> wrote: > > > > While booting the new 2.6.25 Kernel, it enters an infinite looping > displaying > > > > "b44: eth0: powering down PHY". > > > > The system isn't freezed as magick SysRq keys works, but it just stay > > > > displaying those messages. I am unable to dump any information using > SysRq, > > > > however (as the b44(...) messages are too fast). > > > > > > > > I will attach lspci output and my .config > > > > > > > > > > Apparently a regression. > > > > Can you add a dump_stack() call to the b44_halt() function and post the > resulting logs? > > What I get is: > > Pid: 4, comm: ksoftirqd/0 Tainted: GF 2.6.25-naoliv1 #2 > [<f8992420>] [<b0231ffd>] [<b02380b9>] [<b011ed60>] [<b01060f2>] > [<b011f059>] [<b011efe1>] [<b01297c8>] [<b0129790>] [<b010473f>] Ehm, please enable CONFIG_KALLSYMS. > BTW, is it a way to "pause" the messages after dump_stack()? mdelay(1000) will delay one second. But it will kill the system, basically.
Hi! On Fri, Apr 18, 2008 at 12:32 PM, Michael Buesch <mb@bu3sch.de> wrote: > Ehm, please enable CONFIG_KALLSYMS. Right. Sorry. Here it is: Pid: 4, comm: ksoftirqd/0 Tainted: GF 2.6.25-naoliv #4 [<f899df84>] b44_halt+0x68/0x7f [b44] [<f899f432>] b44_poll+0x36a/0x405 [b44] [<b02393ad>] net_rx_action+0x63/0x131 [<b011ee60>] __do_softirq+0x5a/0xa5 [<b01061e2>] do_softirq+0x52/0x84 [<b011f159>] ksoftirqd+0x78/0x110 [<b011f0e1>] ksoftirqd+0x0/0x110 [<b01298d4>] kthread+0x38/0x60 [<b012989c>] kthread+0x0/0x60 [<b010474b>] kernel_thread_helper+0x7/0x10 Anything else that I can do to help, please? Thank you! Best regards, Nelson
On Friday 18 April 2008 19:12:34 Nelson A. de Oliveira wrote: > Anything else that I can do to help, please? Please apply this patch and send me the messages. Index: wireless-testing/drivers/net/b44.c =================================================================== --- wireless-testing.orig/drivers/net/b44.c 2008-04-15 12:40:17.000000000 +0200 +++ wireless-testing/drivers/net/b44.c 2008-04-18 19:18:02.000000000 +0200 @@ -866,6 +866,7 @@ static int b44_poll(struct napi_struct * if (bp->istat & ISTAT_ERRORS) { unsigned long flags; +printk(KERN_ERR "b44_poll: istat = 0x%08X\n", bp->istat); spin_lock_irqsave(&bp->lock, flags); b44_halt(bp); b44_init_rings(bp);
Hi! On Fri, Apr 18, 2008 at 2:19 PM, Michael Buesch <mb@bu3sch.de> wrote: > On Friday 18 April 2008 19:12:34 Nelson A. de Oliveira wrote: > > Anything else that I can do to help, please? > > Please apply this patch and send me the messages. b44_poll: istat = 0x00000400 b44: eth0: powering down PHY Pid: 0, comm: swapper Not tainted 2.6.25-naoliv1 #4 [<f899df84>] b44_halt+0x68/0x7f [b44] [<f88f4440>] b44_poll+0x378/0x415 [b44] [<b010453b>] common_interrupt+0x23/0x28 [<b02393ad>] net_rx_action+0x63/0x131 [<b011ee60>] __do_softirq+0x5a/0xa5 [<b01061e2>] do_softirq+0x52/0x84 [<b013d666>] handle_fasteoi_irq+0x0/0xad [<b011ed3e>] irq_exit+0x35/0x76 [<b01062ad>] do_IRQ+0x99/0xb0 [<b010453b>] common_interrupt+0x23/0x28 [<f886700b>] acpi_idle_enter_bm+0x28c/0x2fd6 [processor] [<b022602b>] cpuidle_idle_call+0x55/0x86 [<b0225fd6>] cpuidle_idle_call+0x0/0x86 [<b01028d3>] cpu_idle+0x8c/0xbc I have increased the delay now. This is the first message that appears. It seems that after some time it starts to display the other lines from my last email (Pid 4, comm: ksoftirqd/0 ...). Best regards, Nelson
On Friday 18 April 2008 19:43:57 Nelson A. de Oliveira wrote: > Hi! > > On Fri, Apr 18, 2008 at 2:19 PM, Michael Buesch <mb@bu3sch.de> wrote: > > On Friday 18 April 2008 19:12:34 Nelson A. de Oliveira wrote: > > > Anything else that I can do to help, please? > > > > Please apply this patch and send me the messages. > > b44_poll: istat = 0x00000400 Hm, a descriptor error. Smells like my DMA fix actually broke this, damit. On which architecture are you running? > I have increased the delay now. This is the first message that > appears. It seems that after some time it starts to display the other > lines from my last email (Pid 4, comm: ksoftirqd/0 ...). I'm always only interested in the first message of one type :)
On Fri, Apr 18, 2008 at 2:59 PM, Michael Buesch <mb@bu3sch.de> wrote: > > b44_poll: istat = 0x00000400 > > Hm, a descriptor error. Smells like my DMA fix actually broke this, damit. > On which architecture are you running? i386 here. > > I have increased the delay now. This is the first message that > > appears. It seems that after some time it starts to display the other > > lines from my last email (Pid 4, comm: ksoftirqd/0 ...). > > I'm always only interested in the first message of one type :) Right :-) Best regards, Nelson
On Friday 18 April 2008 20:09:36 Nelson A. de Oliveira wrote: > On Fri, Apr 18, 2008 at 2:59 PM, Michael Buesch <mb@bu3sch.de> wrote: > > > b44_poll: istat = 0x00000400 > > > > Hm, a descriptor error. Smells like my DMA fix actually broke this, damit. > > On which architecture are you running? > > i386 here. Hm, I tested my patch on i386. So I'm not sure what's going on, actually. And the patch was pretty trivial and I really can't find a bug in it. So you say 2.6.24 was still working?
Hi! On Fri, Apr 18, 2008 at 3:18 PM, Michael Buesch <mb@bu3sch.de> wrote: > On Friday 18 April 2008 20:09:36 Nelson A. de Oliveira wrote: > > On Fri, Apr 18, 2008 at 2:59 PM, Michael Buesch <mb@bu3sch.de> wrote: > > > > b44_poll: istat = 0x00000400 > > > > > > Hm, a descriptor error. Smells like my DMA fix actually broke this, > damit. > > > On which architecture are you running? > > > > i386 here. > > Hm, I tested my patch on i386. > So I'm not sure what's going on, actually. And the patch was pretty > trivial and I really can't find a bug in it. > So you say 2.6.24 was still working? Strange... compiled 2.6.24.4, 2.6.24 and 2.6.23 here and they are all stopping with this: b44: eth0: Link is up at 100 Mbps, full duplex. b44: eth0: Flow control is off for TX and off for RX. And it seems to keep waiting for something. The system isn't freezed (as CTRL+ALT+DEL kills the running processes and correctly reboots the machine). With Debian's 2.6.24.4 it is working. With vanilla 2.6.25 and my config it just enters an infinite loop of "b44: eth0: powering down PHY". Can different GCC versions cause this? Can a bad .config file cause things like that? (I am using this .config for a long time and it has always been working correctly, at least until now) Thank you! Best regards, Nelson
On Friday 18 April 2008 21:02:37 Nelson A. de Oliveira wrote: > Hi! > > On Fri, Apr 18, 2008 at 3:18 PM, Michael Buesch <mb@bu3sch.de> wrote: > > On Friday 18 April 2008 20:09:36 Nelson A. de Oliveira wrote: > > > On Fri, Apr 18, 2008 at 2:59 PM, Michael Buesch <mb@bu3sch.de> wrote: > > > > > b44_poll: istat = 0x00000400 > > > > > > > > Hm, a descriptor error. Smells like my DMA fix actually broke this, > damit. > > > > On which architecture are you running? > > > > > > i386 here. > > > > Hm, I tested my patch on i386. > > So I'm not sure what's going on, actually. And the patch was pretty > > trivial and I really can't find a bug in it. > > So you say 2.6.24 was still working? > > Strange... compiled 2.6.24.4, 2.6.24 and 2.6.23 here and they are all > stopping with this: > > b44: eth0: Link is up at 100 Mbps, full duplex. > b44: eth0: Flow control is off for TX and off for RX. > > And it seems to keep waiting for something. The system isn't freezed > (as CTRL+ALT+DEL kills the running processes and correctly reboots the > machine). Well. 2.6.24 didn't have this message. But it could still have the actual bug, of course. So can you try applying my printk patch to a broken 2.6.24 kernel and see whether it triggers the message or not? Under normal circumstances this codepath should never trigger. > With Debian's 2.6.24.4 it is working. > With vanilla 2.6.25 and my config it just enters an infinite loop of > "b44: eth0: powering down PHY". This message was added in 2.6.25. That doesn't mean the bug was also added in 2.6.25, of course. > Can different GCC versions cause this? Can a bad .config file cause > things like that? (I am using this .config for a long time and it has > always been working correctly, at least until now) Well, possible, although unlikely. Can you try bisecting the bug? Yeah, I know about the lwn article [1] that says bisecting is baaaaaad (tm), but my opinion is different. :) It's an excellent tool for efficiently finding patches that caused bugs. But take care to really check whether device _works_ or not. Just looking at the actual "powering down PHY" will _not_ be enough, as that was only recently added, as I said. [1] http://lwn.net/Articles/278137/
Hi! On Fri, Apr 18, 2008 at 4:19 PM, Michael Buesch <mb@bu3sch.de> wrote: > Well. 2.6.24 didn't have this message. But it could still have the actual > bug, of course. So can you try applying my printk patch to a broken 2.6.24 > kernel and see whether it triggers the message or not? Under normal > circumstances this codepath should never trigger. No b44_poll message printed when using your patch on 2.6.24, 2.6.23 and 2.6.21. > Can you try bisecting the bug? Yeah, I know about the lwn article [1] that > says bisecting is baaaaaad (tm), but my opinion is different. :) > It's an excellent tool for efficiently finding patches that caused bugs. > But take care to really check whether device _works_ or not. Just looking > at the actual "powering down PHY" will _not_ be enough, as that was only > recently added, as I said. Sure. I will do this when I arrive at home (Can you point me to some URL to read and do the bisections, please?). What I saw with 2.6.24, 2.6.23 and 2.6.21 is that the interface seems to be up, getting an IP via DHCP (I can ping from another machine), but it stays waiting for something after printing b44: eth0: Link is up at 100 Mbps, full duplex. b44: eth0: Flow control is off for TX and off for RX. Thank you! Best regards, Nelson
(In reply to comment #15) Hi Nelson, here is a git bisect guide from kernel.org: http://www.kernel.org/doc/local/git-quick.html#bisect Thanks, > Hi! > > On Fri, Apr 18, 2008 at 4:19 PM, Michael Buesch <mb@bu3sch.de> wrote: > > Well. 2.6.24 didn't have this message. But it could still have the actual > > bug, of course. So can you try applying my printk patch to a broken 2.6.24 > > kernel and see whether it triggers the message or not? Under normal > > circumstances this codepath should never trigger. > > No b44_poll message printed when using your patch on 2.6.24, 2.6.23 and > 2.6.21. > > > Can you try bisecting the bug? Yeah, I know about the lwn article [1] that > > says bisecting is baaaaaad (tm), but my opinion is different. :) > > It's an excellent tool for efficiently finding patches that caused bugs. > > But take care to really check whether device _works_ or not. Just looking > > at the actual "powering down PHY" will _not_ be enough, as that was only > > recently added, as I said. > > Sure. I will do this when I arrive at home (Can you point me to some > URL to read and do the bisections, please?). > What I saw with 2.6.24, 2.6.23 and 2.6.21 is that the interface seems > to be up, getting an IP via DHCP (I can ping from another machine), > but it stays waiting for something after printing > > b44: eth0: Link is up at 100 Mbps, full duplex. > b44: eth0: Flow control is off for TX and off for RX. > > Thank you! > > Best regards, > Nelson >
Hi! I have tried to do a bisect here (thank you Jike Song for the link). Marked 2.6.20 as good and master as bad. On the first test, I've got this: (...) BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: b01b6265 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: b44(F) mousedev(F) iwl3945(F) ehci_hcd(F) mac80211(F) snd_hda_intel(F) thermal(F) i2c_i801(F) ac(F) ssb(F) snd_pcm(F) snd_timer(F) uhci_hcd(F) psmouse(F) evdev(F) battery(F) button(F) processor(F) mii(F) usbcore(F) snd(F) snd_page_alloc(F) sg(F) sr_mod(F) cdrom(F) CPU: 0 EIP: 0060:[<b01b6265>] Tainted: GF VLI EFLAGS: 00010246 (2.6.23-naoliv1 #1) EIP is at strlen+0x8/0x11 eax: 00000000 ebx: f7429000 ecx: ffffffff edx: f76b6cb0 esi: 00000000 edi: 00000000 ebp: 00000000 esp: f76b6ca0 ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process modprobe (pid: 692, ti=f76b6000 task=f76c6000 task.ti=f76b6000) Stack: f75a2000 b01b3254 f785f200 b02d3e5b b02cb0da f7856200 b01b324a f7426688 b02cb0da f7856200 f88f61f0 b0310c8c f785f200 f88f8e28 b0207d4e f74266a8 f88f8d9c f7426600 f7426600 00000000 f7453400 b0206d8b f7426688 b02c4e14 Call Trace: [<b01b3254>] kobject_uevent_env+0x276/0x383 [<b01b324a>] kobject_uevent_env+0x26c/0x383 [<b0207d4e>] bus_add_device+0xad/0xdc [<b0206d8b>] device_add+0x2a0/0x45e [<f88f36f1>] ssb_attach_queued_buses+0x1a2/0x297 [ssb] [<f88f3b2f>] ssb_bus_register+0x120/0x185 [ssb] [<f88f4ac2>] ssb_pci_get_invariants+0x0/0x281 [ssb] [<f88f3bf3>] ssb_bus_pcibus_register+0x24/0x47 [ssb] [<b01bb856>] pci_set_master+0x54/0x58 [<f88f52b1>] ssb_pcihost_probe+0x5e/0x89 [ssb] [<b01bd0ff>] pci_device_probe+0x36/0x55 [<b020857e>] driver_probe_device+0xc5/0x148 [<b02890a5>] klist_next+0x58/0x6d [<b02086dc>] __driver_attach+0x49/0x7f [<b0207ba8>] bus_for_each_dev+0x35/0x57 [<b02083f2>] driver_attach+0x16/0x18 [<b0208693>] __driver_attach+0x0/0x7f [<b0207e56>] bus_add_driver+0x6d/0x17d [<b01bd249>] __pci_register_driver+0x55/0x81 [<f881d01f>] b44_init+0x1f/0x48 [b44] [<b013cdcc>] sys_init_module+0x1545/0x1619 [<b0103e9a>] sysenter_past_esp+0x5f/0x85 ======================= Code: f0 48 5e c3 56 89 d1 89 c6 83 ec 04 31 d2 89 c8 88 c4 ac 38 e0 75 03 8d 56 ff 84 c0 75 f4 5e 89 d0 5e c3 57 83 c9 ff 89 c7 31 c0 <f2> ae f7 d1 49 5f 89 c8 c3 57 89 c7 89 d0 31 d2 85 c9 74 0c f2 EIP: [<b01b6265>] strlen+0x8/0x11 SS:ESP 0068:f76b6ca0 hub 1-2:1.0: hub_port_status failed (err = -71) hub 1-2:1.0: hub_port_status failed (err = -71) hub 1-2:1.0: hub_port_status failed (err = -71) hub 1-2:1.0: hub_port_status failed (err = -71) Clocksource tsc unstable (delta = -162081422 ns) usb 5-2: new high speed USB device using ehci_hcd and address 2 usb 5-2: configuration #1 chosen from 1 choice hub 5-2:1.0: USB hub found hub 5-2:1.0: 4 ports detected sysfs: duplicate filename 'bInterfaceNumber' can not be created WARNING: at fs/sysfs/dir.c:425 sysfs_add_one() [<b018bebc>] sysfs_add_one+0x54/0xb8 [<b018ba00>] sysfs_add_file+0x42/0x6a [<b018d115>] sysfs_create_group+0x84/0xe7 [<b0206f3f>] device_add+0x454/0x45e [<f88ca72a>] usb_create_sysfs_intf_files+0x24/0x98 [usbcore] [<f88c7295>] usb_set_configuration+0x48f/0x4a9 [usbcore] [<f88cdcdb>] generic_probe+0x50/0x91 [usbcore] [<f88c8784>] usb_probe_device+0x32/0x37 [usbcore] [<b020857e>] driver_probe_device+0xc5/0x148 [<b02890a5>] klist_next+0x58/0x6d [<b0207aa8>] bus_for_each_drv+0x35/0x5c [<b020867f>] device_attach+0x5e/0x72 [<b0208601>] __device_attach+0x0/0x5 [<b0207a24>] bus_attach_device+0x26/0x75 [<b0206d92>] device_add+0x2a7/0x45e [<f88c2c1a>] usb_new_device+0x4d/0x8a [usbcore] [<f88c3746>] hub_thread+0x702/0xa8f [usbcore] [<b012fd84>] autoremove_wake_function+0x0/0x33 [<f88c3044>] hub_thread+0x0/0xa8f [usbcore] [<b012fcb7>] kthread+0x38/0x5d [<b012fc7f>] kthread+0x0/0x5d [<b0104abb>] kernel_thread_helper+0x7/0x10 ======================= (...) This one is probably 2.6.23. After some time the system continued to boot, but without network interface. So marked it as bad. The newer bisect failed to compile. Marked it bad. Bisect again, failed, again, failed :-( My git-bisect log is: git-bisect start # good: [62d0cfcb27cf755cebdc93ca95dabc83608007cd] Linux 2.6.20 git-bisect good 62d0cfcb27cf755cebdc93ca95dabc83608007cd # bad: [3925e6fc1f774048404fdd910b0345b06c699eb4] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 git-bisect bad 3925e6fc1f774048404fdd910b0345b06c699eb4 # bad: [3749c66c67fb5c257771815c186bc32290cacf44] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm git-bisect bad 3749c66c67fb5c257771815c186bc32290cacf44 # bad: [b11115c15351faba978ce1b9e75068e77f6ef48d] serial_core.h: include <linux/sysrq.h> git-bisect bad b11115c15351faba978ce1b9e75068e77f6ef48d # bad: [1936502d00ae6c2aa3931c42f6cf54afaba094f2] [NET_SCHED] qdisc: avoid transmit softirq on watchdog wakeup git-bisect bad 1936502d00ae6c2aa3931c42f6cf54afaba094f2 What else can I do, please? Thank you very much! Best regards, Nelson
On Monday 21 April 2008 20:14:44 Nelson A. de Oliveira wrote: > This one is probably 2.6.23. > After some time the system continued to boot, but without network interface. > So marked it as bad. That probably was a mistake > What else can I do, please? You can try latest git. I was told it has a feature to tell bisect "I don't know" instead of "good" or "bad". This can be used if a test kernel doesn't compile, or does fail because of some other bug. You can also manually bisect the stuff between your known-good version of b44 and the bad one. There were only a couple of patches. You can extract them with git and revert them one by one and see when it does start working again. I think it was something like 5 patches or so. Nothing too time consuming.
Hi! Maybe this can help: Using a new .config, I started to enable/disable options and test. What I found is that if I enable "3G/1G user/kernel split", the kernel works (it boots normally, the network interface works, etc). If I select "3G/1G user/kernel split (for full 1G low memory)" I get the infinite loop of "b44: eth0: powering down PHY". Working config file (on 2.6.25) is attached. Diff to the non-working is below: --- working_config 2008-04-21 23:42:40.000000000 -0300 +++ not_working_config 2008-04-21 23:55:28.000000000 -0300 @@ -1,7 +1,7 @@ # # Automatically generated make config: don't edit # Linux kernel version: 2.6.25 -# Mon Apr 21 23:28:49 2008 +# Mon Apr 21 23:43:04 2008 # # CONFIG_64BIT is not set CONFIG_X86_32=y @@ -228,12 +228,12 @@ # CONFIG_NOHIGHMEM is not set CONFIG_HIGHMEM4G=y # CONFIG_HIGHMEM64G is not set -CONFIG_VMSPLIT_3G=y -# CONFIG_VMSPLIT_3G_OPT is not set +# CONFIG_VMSPLIT_3G is not set +CONFIG_VMSPLIT_3G_OPT=y # CONFIG_VMSPLIT_2G is not set # CONFIG_VMSPLIT_2G_OPT is not set # CONFIG_VMSPLIT_1G is not set -CONFIG_PAGE_OFFSET=0xC0000000 +CONFIG_PAGE_OFFSET=0xB0000000 CONFIG_HIGHMEM=y CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_ARCH_SPARSEMEM_ENABLE=y Can this be the cause? Thank you! Best regards, Nelson
On Tuesday 22 April 2008 05:01:54 Nelson A. de Oliveira wrote: > Hi! > > Maybe this can help: > Using a new .config, I started to enable/disable options and test. > What I found is that if I enable "3G/1G user/kernel split", the kernel > works (it boots normally, the network interface works, etc). If I > select "3G/1G user/kernel split (for full 1G low memory)" I get the > infinite loop of "b44: eth0: powering down PHY". Ah, so this bug isn't actually caused by a patch but rather by a different config option. I think we can't do much about it, currently. The device has strange memory requirements and changing the split does actually break it. This cannot be fixed until andi kleen's mask-allocator is merged. This "bug" has always been there. > Working config file (on 2.6.25) is attached. > Diff to the non-working is below: > > --- working_config 2008-04-21 23:42:40.000000000 -0300 > +++ not_working_config 2008-04-21 23:55:28.000000000 -0300 > @@ -1,7 +1,7 @@ > # > # Automatically generated make config: don't edit > # Linux kernel version: 2.6.25 > -# Mon Apr 21 23:28:49 2008 > +# Mon Apr 21 23:43:04 2008 > # > # CONFIG_64BIT is not set > CONFIG_X86_32=y > @@ -228,12 +228,12 @@ > # CONFIG_NOHIGHMEM is not set > CONFIG_HIGHMEM4G=y > # CONFIG_HIGHMEM64G is not set > -CONFIG_VMSPLIT_3G=y > -# CONFIG_VMSPLIT_3G_OPT is not set > +# CONFIG_VMSPLIT_3G is not set > +CONFIG_VMSPLIT_3G_OPT=y > # CONFIG_VMSPLIT_2G is not set > # CONFIG_VMSPLIT_2G_OPT is not set > # CONFIG_VMSPLIT_1G is not set > -CONFIG_PAGE_OFFSET=0xC0000000 > +CONFIG_PAGE_OFFSET=0xB0000000 > CONFIG_HIGHMEM=y > CONFIG_ARCH_FLATMEM_ENABLE=y > CONFIG_ARCH_SPARSEMEM_ENABLE=y > > Can this be the cause? > > Thank you! > > Best regards, > Nelson >
I'm also getting this on 2.6.26-rc4 and -rc4-git3. I'm running on x86-64. 2.6.25 worked okay for me.
Created attachment 16339 [details] config from 2.6.26-rc4-git3
> I'm also getting this on 2.6.26-rc4 and -rc4-git3. I'm running on x86-64. > 2.6.25 worked okay for me. The same symptoms. I have tried to bisect the bug: > git bisect log git-bisect start # good: [4b119e21d0c66c22e8ca03df05d9de623d0eb50f] Linux 2.6.25 git-bisect good 4b119e21d0c66c22e8ca03df05d9de623d0eb50f # bad: [38e80121bd7d0c493072442ac7eddcba165a07a8] Merge git://git.infradead.org/battery-2.6 git-bisect bad 38e80121bd7d0c493072442ac7eddcba165a07a8 # bad: [7ae44cfa7ab29b277691327e8de790d7b880722f] [ALSA] snd-powermac: style awacs.s and awacs.h git-bisect bad 7ae44cfa7ab29b277691327e8de790d7b880722f # good: [7cea51be4e91edad05bd834f3235b45c57783f0d] security: fix up documentation for security_module_enable git-bisect good 7cea51be4e91edad05bd834f3235b45c57783f0d # good: [8f19ca1341a6d89bd96e2e69e6e10f46d3258089] x86: unify gfp masks git-bisect good 8f19ca1341a6d89bd96e2e69e6e10f46d3258089 # bad: [9a64388d83f6ef08dfff405a9d122e3dbcb6bf38] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc git-bisect bad 9a64388d83f6ef08dfff405a9d122e3dbcb6bf38 # bad: [85b375a613085b78531ec86369a51c2f3b922f95] Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm git-bisect bad 85b375a613085b78531ec86369a51c2f3b922f95 # good: [d1964dab60ce7c104dd21590e987a8787db18051] Merge branches 'arm', 'at91', 'ep93xx', 'iop', 'ks8695', 'misc', 'mxc', 'ns9x', 'orion', 'pxa', 'sa1100', 's3c' and 'sparsemem' into devel git-bisect good d1964dab60ce7c104dd21590e987a8787db18051 # good: [d1964dab60ce7c104dd21590e987a8787db18051] Merge branches 'arm', 'at91', 'ep93xx', 'iop', 'ks8695', 'misc', 'mxc', 'ns9x', 'orion', 'pxa', 'sa1100', 's3c' and 'sparsemem' into devel git-bisect good d1964dab60ce7c104dd21590e987a8787db18051 # good: [486fdae21458bd9f4e125099bb3c38a4064e450e] sched: build fix git-bisect good 486fdae21458bd9f4e125099bb3c38a4064e450e # good: [fd9be4ce2e1eb407a8152f823698cc0d652bbec8] Merge branch 'ro-bind.b6' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 git-bisect good fd9be4ce2e1eb407a8152f823698cc0d652bbec8 # good: [32ab2cb9415f341913e3f33ef7566ca6e92ef283] ARM: OMAP2: Move clock.h to clock24xx.h git-bisect good 32ab2cb9415f341913e3f33ef7566ca6e92ef283 # good: [3760d31f11bfbd0ead9eaeb8573e0602437a9d7c] ARM: OMAP2: New DPLL clock framework git-bisect good 3760d31f11bfbd0ead9eaeb8573e0602437a9d7c # good: [3760d31f11bfbd0ead9eaeb8573e0602437a9d7c] ARM: OMAP2: New DPLL clock framework git-bisect good 3760d31f11bfbd0ead9eaeb8573e0602437a9d7c # bad: [34d0559178393547505ec9492321255405f4e441] x86: UV startup of slave cpus git-bisect bad 34d0559178393547505ec9492321255405f4e441 # bad: [da60cab4dd922cd933e82bace490f6155a32a90e] x86: return conditional to mmu git-bisect bad da60cab4dd922cd933e82bace490f6155a32a90e It seems related to DMA. > git bisect visualize --pretty=oneline |cat da60cab4dd922cd933e82bace490f6155a32a90e x86: return conditional to mmu aa99b16faadcc9a5b6bd9550fda117a8e9e46d26 x86: remove kludge from x86_64
We have a bunch of fixes in the x86 tree that does not appear to be in linus. Since you are using git anyway, would you mind testing it? If it works, we might want to cherry-pick them for linus since they'll be associated with a regression. thanx
Fixed. Patch http://lkml.org/lkml/2008/6/12/227
> We have a bunch of fixes in the x86 tree that does not appear to be in > linus. Since you are using git anyway, would you mind testing it? If > it works, we might want to cherry-pick them for linus since they'll be > associated with a regression. to check that, pick up tip/master, as per: http://people.redhat.com/mingo/tip.git/README
Fedora 9, kernel 2.6.27-rc6 x86. Same problem with Broadcom NIC.