Bug 11157 - R8102-E (Intel Atom, D945GCLF) and r8169 fails network boot - cannot get DHCP lease
Summary: R8102-E (Intel Atom, D945GCLF) and r8169 fails network boot - cannot get DHCP...
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Francois Romieu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-24 14:15 UTC by Alex Howells
Modified: 2008-10-11 10:12 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.26
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
kernel configuration, 2.6.26 (x86_64) (43.68 KB, application/octet-stream)
2008-07-24 14:17 UTC, Alex Howells
Details
boot output, 2.6.26 (x86_64) (26.10 KB, application/octet-stream)
2008-07-24 14:18 UTC, Alex Howells
Details
kernel configuration, 2.6.26 (x86) (43.28 KB, application/octet-stream)
2008-07-24 14:19 UTC, Alex Howells
Details
boot output, 2.6.26 (x86_64) (28.40 KB, application/octet-stream)
2008-07-24 14:19 UTC, Alex Howells
Details
hardware configuration (14.09 KB, application/octet-stream)
2008-07-24 14:25 UTC, Alex Howells
Details
boot output, 2.6.26-git (x86_64) (37.74 KB, text/plain)
2008-07-25 04:31 UTC, Alex Howells
Details

Description Alex Howells 2008-07-24 14:15:19 UTC
Operating System:  Both Debian GNU/Linux and Gentoo Linux
Latest working:    Any kernel compiled for i386 (x86) works fine
                   yet 2.6.25/2.6.26 fail miserably when x86_64

More information over at -- http://bugs.gentoo.org/show_bug.cgi?id=232857

Board is an Intel D945GCLF with Atom 230, 1 x 2GB DIMM of DDR2, 2 x 100GB SATA. Will attach kernel configurations for working 2.6.26 kernel (x86) and the configuration for our amd64 kernel which fails to acquire DHCP lease, plus output from the boot sequence of the box on both kernels.

Our environment is pretty simple. Network boot a box via PXE and send kernel via TFTP then mount "rescue environment" (simple nfsroot) as root filesystem. From there we do recovery stuff and operating system installations.

Happy to take any suggestions or patches and test them out.
Comment 1 Alex Howells 2008-07-24 14:17:18 UTC
Created attachment 16967 [details]
kernel configuration, 2.6.26 (x86_64)
Comment 2 Alex Howells 2008-07-24 14:18:52 UTC
Created attachment 16968 [details]
boot output, 2.6.26 (x86_64)
Comment 3 Alex Howells 2008-07-24 14:19:21 UTC
Created attachment 16969 [details]
kernel configuration, 2.6.26 (x86)
Comment 4 Alex Howells 2008-07-24 14:19:49 UTC
Created attachment 16970 [details]
boot output, 2.6.26 (x86_64)
Comment 5 Alex Howells 2008-07-24 14:25:45 UTC
Created attachment 16971 [details]
hardware configuration

taken from `lshw` on a working 32-bit installation of Debian Etch
Comment 6 Francois Romieu 2008-07-24 23:50:35 UTC
2.6.26-git contains some 810{1/2} related changes.

Can you give its r8169 driver a try ?

-- 
Ueimor
Comment 7 Alex Howells 2008-07-25 04:31:07 UTC
Created attachment 16984 [details]
boot output, 2.6.26-git (x86_64)

To confirm I've pulled the latest with:  git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6

It appears to get past the r8169 issue and *does* grab a DHCP lease which is significant progress but then we get a nasty trace involving SMP.  Log attached.

I'm going to try taking 2.6.26-release and merging your r8169 patches only rather than using the whole linux-2.6.git and see if that works?
Comment 8 Alex Howells 2008-07-25 04:44:16 UTC
Same issue with just the r8169.c merged and 2.6.26-release but the trace is a tiny bit different, still pretty fatal though :)

[   11.172641] VFS: Mounted root (nfs filesystem) readonly.
[   11.178161] Freeing unused kernel memory: 320k freed
[   11.187383] Write protecting the kernel read-only data: 7108k
[   11.192652] BUG: unable to handle kernel paging request at ffffffff8101f239
[   11.192652] IP: [<ffffffff81019f43>] smp_call_function+0x1/0x1a
[   11.192652] PGD 1003067 PUD 1007063 PMD 7f239163 PTE 101f161
[   11.192652] Oops: 0003 [1] SMP
[   11.192652] CPU 0
[   11.192652] Modules linked in:
[   11.192652] Pid: 1, comm: swapper Not tainted 2.6.26-amd64 #1
[   11.192652] RIP: 0010:[<ffffffff81019f43>]  [<ffffffff81019f43>] smp_call_function+0x1/0x1a
[   11.192652] RSP: 0000:ffff81007f07fde0  EFLAGS: 00010246
[   11.192652] RAX: ffffffff81009000 RBX: 0000000000000000 RCX: 0000000000000001
[   11.192652] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff8101f239
[   11.192652] RBP: ffffffff81009000 R08: ffff810000000000 R09: 00003ffffffff000
[   11.192652] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[   11.192652] R13: ffffffff8101f239 R14: 0000000000000000 R15: 00000000000006f1
[   11.192652] FS:  0000000000000000(0000) GS:ffffffff8177e000(0000) knlGS:0000000000000000
[   11.192652] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[   11.192652] CR2: ffffffff8101f239 CR3: 0000000001001000 CR4: 00000000000006e0
[   11.192652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   11.192652] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   11.192652] Process swapper (pid: 1, threadinfo ffff81007f07e000, task ffff81007f07d750)
[   11.192652] Stack:  ffffffff81009000 ffffffff810345e7 0000000000000000 0000000000000000
[   11.192652]  0000000000000000 ffffffff8101fd9e ffffffff816fa000 0000000000000000
[   11.192652]  0000000000000002 0000000100000001 00000000000016f9 0000000000000246
[   11.192652] Call Trace:
[   11.192652]  [<ffffffff81009000>] ? run_init_process+0x0/0x1a
[   11.192652]  [<ffffffff810345e7>] ? on_each_cpu+0x10/0x22
[   11.192652]  [<ffffffff8101fd9e>] ? change_page_attr_set_clr+0x134/0x1ac
[   11.192652]  [<ffffffff81009000>] ? run_init_process+0x0/0x1a
[   11.192652]  [<ffffffff8101db64>] ? mark_rodata_ro+0x4a/0x6c
[   11.192652]  [<ffffffff817f0056>] ? ip_auto_config+0x0/0xde6
[   11.192652]  [<ffffffff8100902d>] ? init_post+0x13/0xf2
[   11.192652]  [<ffffffff817cd832>] ? kernel_init+0x281/0x292
[   11.192652]  [<ffffffff8100bdf8>] ? child_rip+0xa/0x12
[   11.192652]  [<ffffffff817cd5b1>] ? kernel_init+0x0/0x292
[   11.192652]  [<ffffffff8100bdee>] ? child_rip+0x0/0x12
[   11.192652]
[   11.192652]
[   11.192652] Code: 74 1d 48 8b 15 5f dd 78 00 48 63 c7 4c 8b 1d c5 3f 6e 00 48 8b 3c c2 4c 89 ca 41 58 41 ff e3 fa 48 89 d7 ff d6 fb 5a 31 c0 c3 48 <89> f8 48 89 f2 48 8b 3d e9 de 78 00 48 89 c6 4c 8b 1d 97 3f 6e
[   11.192652] RIP  [<ffffffff81019f43>] smp_call_function+0x1/0x1a
[   11.192652]  RSP <ffff81007f07fde0>
[   11.192652] CR2: ffffffff8101f239
[   11.192652] ---[ end trace b53d9db9c1b0ffdc ]---
[   11.192652] Kernel panic - not syncing: Attempted to kill init!
Comment 9 Alex Howells 2008-07-28 06:03:09 UTC
Any tips on debug options I could enable, or extra statements to insert into the code to try and debug this?
Comment 10 Francois Romieu 2008-08-11 13:25:39 UTC
It could be interesting to check this thread:
http://thread.gmane.org/gmane.linux.kernel/718708

Otherwise the boot log shows that your 810x chipset is not completely
identified. I'd suggest applying #0001 to #0006 from
http://userweb.kernel.org/~romieu/r8169/2.6.27-rc2/20080808 on top
of 2.6.27-rc2.

-- 
Ueimor
Comment 11 Calorì Alessandro 2008-08-12 01:07:24 UTC
Realtek released a driver for NIC based on RTL8100E/RTL8101E/RTL8102E-GR chips. The driver is available at http://www.realtek.com.tw/Downloads/downloadsView.aspx?Langid=1&PNid=14&PFid=7&Level=5&Conn=4&DownTypeID=3&GetDown=false. Driver v.1.009 cleanly compile against vanilla kernel 2.6.26 and it works good on my Gentoo box. I hope it helps.
Comment 12 Alex Howells 2008-08-12 02:40:17 UTC
Hi Calori :)  The problem is fairly specific, that the system fails to boot on x86_64 *only* (works great on x86) due to problems acquiring a DHCP lease after the kernel is sent over via TFTP but before userland has started such that an NFS root can be mounted and network boot can proceed!

Francois -- thanks for the new patches, I'll give those a shot today against 2.6.27-rc2 and report back if they resolved the issue. Should be in 3-4 hours.
Comment 13 Alex Howells 2008-09-02 06:19:13 UTC
2.6.27-rc4 with a 'make mrproper && make defconfig' then fresh configuration of the kernel seems to have solved the issue :)  Woot.

Thanks for your time Francois.
Comment 14 Francois Romieu 2008-09-02 13:27:56 UTC
I will wait a few days before closing the bug in order to be sure
that it is reliably fixed.

-- 
Ueimor
Comment 15 Alex Howells 2008-09-02 14:00:36 UTC
Just did a build of 2.6.27-rc5: works fine!

http://kerneltrap.org/mailarchive/linux-kernel/2008/8/30/3139664/thread

That looks "interesting" but I'm not sure if it's related.

Note You need to log in before you can comment on or make changes to this bug.