Bug 7378 - r8169 driver compiled in linux2.6.19-rc1 (staticly) hang when booting pxe on PCI ID 8136 10/100. It works for 10/1000 with out any problem.
Summary: r8169 driver compiled in linux2.6.19-rc1 (staticly) hang when booting pxe on ...
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Francois Romieu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-17 10:41 UTC by syed azam
Modified: 2006-10-20 16:04 UTC (History)
0 users

See Also:
Kernel Version: 2.6.19-rc1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
r1000 driver provided by realtek (17.70 KB, application/x-gzip-compressed)
2006-10-18 10:46 UTC, syed azam
Details
Modify the cache line size PCI register (490 bytes, patch)
2006-10-18 15:14 UTC, Francois Romieu
Details | Diff
There are 3 pictures, see them in order to get the boot sequence and failure (398.86 KB, application/x-gzip-compressed)
2006-10-19 08:43 UTC, syed azam
Details
Disable the pause capability (461 bytes, patch)
2006-10-19 15:14 UTC, Francois Romieu
Details | Diff
Reset the PHY at boot time (1.02 KB, patch)
2006-10-20 13:32 UTC, Francois Romieu
Details | Diff

Description syed azam 2006-10-17 10:41:20 UTC
Most recent kernel where this bug did not occur:
Distribution:
Hardware Environment: RTL8111B/RTL8168B
Software Environment: Boot linux PXE
Problem Description:  Hangs due to NIC eth0 down when 10/100 is used, times out

Steps to reproduce:   when using with 10/1000 it works with out any problem. 
r8169.c has pCI ID 10ec:8136

Realtek has provided r1000 alpha5 driver but its not a in kernel driver.
Comment 1 Francois Romieu 2006-10-17 11:55:47 UTC
The descritpion is a bit terse. Let's rephrase:
- the linux kernel is loaded via PXE
- when connected to a gigabit switch, the boot proceeds without error
- when connected to a different, fast 100 switch, the network adapter does not
negotiate the link correctly and the boot process takes age

Questions:
- does the 8168 chipset stand on a removable PCI card or is it a LOM ?
  Brand name will be welcome, both for the adapter/motherboard and for the switches
- is the kernel/the module given any extra parameter during boot ?
- are you able to log-in if you wait long enough ?
- if so, does it make a difference if you kick the adapter with mii-tool ?

Extra data:
- please send 'lspci -vvx'
Comment 2 syed azam 2006-10-17 15:15:01 UTC
Its not a removable PCI card. System does not boot at all. The NIC LED shuts 
off and message on screen says RPC error. The NIC vendor is REALTEK and its 
PCI ID 10ec:8136. 

pxe boot parameters used are as follows

DEFAULT bzImage nfsroot=ia32 apm=off reboot=c,b console=tty3
IPAPPEND 1



Comment 3 syed azam 2006-10-18 10:46:48 UTC
Created attachment 9298 [details]
r1000 driver provided by realtek

please adopt the new parameters into r8169 driver or create another entry into
drivers list with this driver. The chip is RTL8101E (PCI express)
Comment 4 Francois Romieu 2006-10-18 15:14:57 UTC
Created attachment 9302 [details]
Modify the cache line size PCI register
Comment 5 Francois Romieu 2006-10-18 15:20:31 UTC
Realtek's driver adds no significant difference when compared to the previous
version I have (read: nothing related to link management).

I'll work something: the in-kernel driver does not take MII_LPA into account.
It seems rather stupid.

-- 
Ueimor
Comment 6 syed azam 2006-10-19 08:43:55 UTC
Created attachment 9305 [details]
There are 3 pictures, see them in order to get the boot sequence and failure
Comment 7 syed azam 2006-10-19 08:45:17 UTC
See attached pictures. During pxe boot it finds the ip but link shuts down. so 
no further booting occurs. The LEDs on the NIC also turned OFF at this point. 
According to REALTEK r1000 driver works as a module(r1000.ko) in fedora 5.     
Comment 8 syed azam 2006-10-19 08:47:56 UTC
The pictures are taken after I applied the patch "Modify the cache line size 
PCI register".

Comment 9 syed azam 2006-10-19 11:53:14 UTC
Good news,

I have successfully booted both 10/100 and 10/1000 in test environment with 
the following steps.

Steps to Success:
1) I have taken the r1000_v1.05Beta3-1 (as atatched)driver and only extracted 
r1000_n.c and r1000_ioctl.c files.

2) modified the /usr/src/linux-2.6.19-rc2-git3/drivers/net/Makefile in my 
kernel tree to compile these 2 c files instead of r8169.c. I assume that same 
could be done with rc-1

3) This compiled clean except for one warning as follows:            

drivers/net/r1000_n.c: In function 'r1000_open':drivers/net/r1000_n.c:757: 
warning: passing argument 2 of 'request_irq' from incompatible pointer type

4) Other than one warning, driver seems to be loading and booting using pxe 
and I can see the network.

can we just starts using Realtek implmentation or separate the r8169 for non 
realtek Ids only, making them mutually exclusive.
Comment 10 Francois Romieu 2006-10-19 12:20:09 UTC
> can we just starts using Realtek implmentation 

In your own kernel ? Sure. In mainline ? I don't hope so.

> or separate the r8169 for non realtek Ids only, making them mutually exclusive.

See answer above and bug #7243.

-- 
Ueimor
Comment 11 Francois Romieu 2006-10-19 13:36:31 UTC
Can you send the output of 'mii-tool -vv ethX' for your device with r1000 driver ?

It would help to compare with different reports. Thanks in advance.

-- 
Ueimor
Comment 12 syed azam 2006-10-19 13:51:02 UTC
When mii-tool -vv eth0 is executed it says the following msg

SIOCGMIIPHY on 'eth0' failed: Operation not supported
Comment 13 Francois Romieu 2006-10-19 14:10:49 UTC
bugme-daemon@bugzilla.kernel.org <bugme-daemon@bugzilla.kernel.org> :
[...]
> When mii-tool -vv eth0 is executed it says the following msg
> 
> SIOCGMIIPHY on 'eth0' failed: Operation not supported


Sh*t, I forgot that Realtek's r1000 does not support the MII ioctl
(nor a lot of ethtool command either). Nevermind.

Comment 14 Francois Romieu 2006-10-19 15:10:54 UTC
Can you try the new attached patch ?
Comment 15 Francois Romieu 2006-10-19 15:14:13 UTC
Created attachment 9309 [details]
Disable the pause capability
Comment 16 syed azam 2006-10-19 15:44:49 UTC
no improvement with the last patch "Disable the pause capability". system 
still locks because NIC is shutting itself. NO LED activity after messages say 
eth0: link down. we fail for rpc error. 
Comment 17 Francois Romieu 2006-10-20 13:32:49 UTC
Created attachment 9312 [details]
Reset the PHY at boot time
Comment 18 Francois Romieu 2006-10-20 13:35:31 UTC
Can you add the patch above on top of yesterday's one ?

You should notice an extra infor or error message related to the eth device in
the console.

-- 
Ueimor
Comment 19 syed azam 2006-10-20 13:45:06 UTC
hurray,

It worked, I can boot using 2.6.19-rc2+3rdpatch. System boot to NFS and I can
see the network.

Thanks

If you need any logs let me know.
Comment 20 Francois Romieu 2006-10-20 15:38:44 UTC
syed.azam@hp.com :
[...]
> If you need any logs let me know.

lspci -vvx of the whole system + dmesg from boot + detailled dump of the
registers through mii-tool would be welcome to keep a trace.

--
Ueimor


Note You need to log in before you can comment on or make changes to this bug.