Bug 67431

Summary: rtl8111d: network transfers fail/corrupted
Product: Drivers Reporter: infove (info)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: NEW ---    
Severity: blocking CC: alan, romieu, szg00000
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.9 thru 3.11 Subsystem:
Regression: Yes Bisected commit-id:

Description infove 2013-12-20 15:47:45 UTC
All these issues cropped up after upgrading Fedora 14 64bit to Fedora 19 64bit  in a box with Gigabyte GA-890XA-UD3 motherboard with onboard Realtek 8111D NIC and Phenom II 955 CPU.

1. CIFS shares cannot close files. While copying files to Windows machines, the 0 size files get created, process then writes X number of bytes to the file and freezes so that machine has to be rebooted. I can reproduce this 100%. KIO seems to be a bit more stable in a sense that while writing to a network folder in KDE the process still freezes, but operation can be cancelled. Not 100% sure that it can be cancelled at all times, but I observed both behaviours with KDE: freezing to death and cancel button stopping w/o freezing.

At the same time the same files can be pulled from a Windows machine using a Samba share on F19 machine.

2. CUPS cannot print to the network printers. The JetDirect HP LJ 4000N that worked fine under F14 now prints one empty sheet and 2nd sheet with error message while LCD displays 'BAD TRANSMISSION' and error LED flashes. I can reproduce this 100%

3. Thunderbird version 17 thru 26 cannot send HTML emails and plain text emails longer than probably 1kB. New plain text emails go out, but replies to plain text emails seem to be affected too. I can reproduce this 100%.
Comment 1 infove 2013-12-20 15:50:45 UTC
Fedora bug was filed for this as well, but this is apparently more than just CIFS issue:

https://bugzilla.redhat.com/show_bug.cgi?id=1042778
Comment 2 infove 2013-12-20 15:52:51 UTC
Just to clarify: F19 was installed into its own physical HDD, not on top of F14. When booting machine into F14 everything mentioned above works fine.
Comment 3 infove 2013-12-21 00:56:41 UTC
Copy of the error message printed on the 2dn page of printouts by LJ4000N:

ERROR:
invalidaccess
OFFENDING COMMAND:
filter
STACK
/SubFileDecode
endstream
0
--nostringval--
--nostringval--
9
false
Comment 4 Alan 2013-12-21 01:17:13 UTC
There are lots of differences to consider here but firstly what are the printer, SBM box and the PC on the same network segment with just a bridge or is there stuff between them ?

If there is and you plug the two devices directly back to back does it then just work ?
Comment 5 infove 2013-12-21 01:21:40 UTC
The only difference I can see is that machine is booted into Fedora 14 with 2.6.35-14 kernel where everything works fine, or into Fedora 19 with 3.11.10-200 kernel where the above items not working.

The machines are on a 100mbit switch at present (1Gbit switch was tested with the same results).

Printer connected to the same switch.

SMTP was used on one of the machines connected to the same switch, as well as provider's SMTP over DSL with the same results.
Comment 6 infove 2013-12-21 01:40:05 UTC
Just copied a file over to Win7 machine mounted with mount.cifs. The copy process is not in 'disk sleep' and cannot be killed. Ultimately I will have to kill reset switch as the system will not shutdown or reboot either.

Getting this in dmesg:

[ 3697.500814] CIFS VFS: Server 10.10.10.2 has not responded in 120 seconds. Reconnecting...

When trying to access the same Win7 box via KDE dolphin now with a hung copy, I am getting timeout while normally it's instant.
Comment 7 infove 2013-12-21 15:08:08 UTC
Typos correction:

Just copied a file over to Win7 machine mounted with mount.cifs. The copy process is now in 'disk sleep' and cannot be killed. Ultimately I will have to hit reset switch as the system will not shutdown or reboot either.

Getting this in dmesg:

[ 3697.500814] CIFS VFS: Server 10.10.10.2 has not responded in 120 seconds. Reconnecting...

When trying to access the same Win7 box via KDE dolphin now with a hung copy, I am getting timeout while normally it's instant.
Comment 8 Alan 2013-12-21 16:56:58 UTC
This actually sounds like some kind of low level/networking problem rather than CIFS (CIFS is just showing up as a symptom)
Comment 9 infove 2013-12-23 15:46:05 UTC
Then I guess you can compare network code from kernel 2.6 with 3 and get to the bottom of it.
Comment 10 Alan 2013-12-23 17:04:57 UTC
Way too many differences for that. You've got a chance of beign able to do that if you can find which actual kernel version was the first non working and which is the first working, but otherwise no.
Comment 11 infove 2013-12-23 20:53:14 UTC
I'll put your answer on the wall of my cubicle and leave it at that.
Comment 12 Francois Romieu 2014-01-01 23:20:45 UTC
(In reply to infove from comment #11)
> I'll put your answer on the wall of my cubicle and leave it at that.

It's a bit sad because coercing rpm to install fedora own kernels between
F14 and F19 in your F14 disk could help.

(I only speak for the r8169 part - you may have both r8169 and CIFS problems)

A dmesg or its XID line to identify your 8168d chipset will be welcome too.

-- 
Ueimor
Comment 13 infove 2014-01-09 00:42:42 UTC
No, this is CIFS only. Everything else works fine.