Bug 10694

Summary: Sis 191 not responding when mount.cifs
Product: Drivers Reporter: Juan Jose Pablos (juanjo)
Component: NetworkAssignee: Francois Romieu (romieu)
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan, bunk, devzero, pterjan, romieu
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32 Subsystem:
Regression: No Bisected commit-id:
Attachments: EtherReal (Tshark) output with mtu 1500
sis190 noisy debug helper
sis190 noisy debug helper (against 2.6.25.3)
Dmesg output using default mtu until fail the mount command
Dmesg output using mtu 1492
More specific debug helper (against 2.6.25.3, on top of previous one)
Debug for the extra byte
Debug helper for the extra byte (against 2.6.25.3, on top of previous one)
Second debug for the extra byte
Second debug for the extra byte
Third debug for the extra byte

Description Juan Jose Pablos 2008-05-14 03:55:57 UTC
Latest working kernel version: non
Earliest failing kernel version: 2.6.24
Distribution: Other
Hardware Environment: Acer Extensa E261 (SIS191 on ISA bridge SIS968 with RLT8211BL transceiver)
Software Environment: linux boot to allow an unattended installation.
http://unattended.cvs.sourceforge.net/unattended/unattended/linuxboot/
Problem Description: mounting a network share with mount.cfis gets an error about the server not responding. Modifying the mtu of the ethernet interface make it to work.

Steps to reproduce:

mount.cifs \\ntinstall\install /z -o "username=guest,ro,nocase"

An error on the Screen:
--------------------------------------
CIFS VFS: server not responding 
CIFS VFS: No responde to cmd 46 mid 10
--------------------------------------
execute:
ifconfig eth0 mtu 1942

Then the network share is accessible. And the error disappear.
Comment 1 Anonymous Emailer 2008-05-14 10:26:38 UTC
Reply-To: akpm@linux-foundation.org


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 14 May 2008 03:55:57 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=10694
> 
>            Summary: Sis 191 not responding when mount.cifs
>            Product: Drivers
>            Version: 2.5
>      KernelVersion: 2.6.25.3
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Network
>         AssignedTo: jgarzik@pobox.com
>         ReportedBy: juanjo@apertus.es
> 
> 
> Latest working kernel version: non
> Earliest failing kernel version: 2.6.24
> Distribution: Other
> Hardware Environment: Acer Extensa E261 (SIS191 on ISA bridge SIS968 with
> RLT8211BL transceiver)
> Software Environment: linux boot to allow an unattended installation.
> http://unattended.cvs.sourceforge.net/unattended/unattended/linuxboot/
> Problem Description: mounting a network share with mount.cfis gets an error
> about the server not responding. Modifying the mtu of the ethernet interface
> make it to work.
> 
> Steps to reproduce:
> 
> mount.cifs \\ntinstall\install /z -o "username=guest,ro,nocase"
> 
> An error on the Screen:
> --------------------------------------
> CIFS VFS: server not responding 
> CIFS VFS: No responde to cmd 46 mid 10
> --------------------------------------
> execute:
> ifconfig eth0 mtu 1942
> 
> Then the network share is accessible. And the error disappear.
> 

How strange.  Good detective work, btw.

Did we a) change the MTU size with 2.6.25 or b) break larger MTUs in 2.6.25?

Can you find out what the MTU size was with 2.6.24?  And what size is
the MTU in 2.6.25 before you reset it?

Thanks.
Comment 2 Roland Kletzing 2008-05-14 11:06:21 UTC
what type of server is that?
(windows version/service pack?)

i had a very funny experience these days - some windows 2003 sp2 servers very working well if it was windows<->windows, but cifs connection stalled completely. i found it was due to sp2 feature "Scalable Network Pack" and after disabling, all was ok afterwards. 

anyway, ntinstall doesn`t sound like 2003 being used ;)
Comment 3 Juan Jose Pablos 2008-05-14 12:59:51 UTC
bugme-daemon@bugzilla.kernel.org escribi
Comment 4 Juan Jose Pablos 2008-05-14 13:03:05 UTC
> ------- Comment #2 from devzero@web.de  2008-05-14 11:06 -------
> what type of server is that?
> (windows version/service pack?)
> 

It it a debian 4.0 (etch) and it has samba 3.0.24-6etch9

> i had a very funny experience these days - some windows 2003 sp2 servers very
> working well if it was windows<->windows, but cifs connection stalled
> completely. i found it was due to sp2 feature "Scalable Network Pack" and
> after
> disabling, all was ok afterwards. 
> 
> anyway, ntinstall doesn`t sound like 2003 being used ;)
> 
it is a system to build up windows systems.
Comment 5 Francois Romieu 2008-05-14 14:50:54 UTC
Anonymous Emailer:
[...]
> Did we a) change the MTU size with 2.6.25 or b) break larger MTUs in 2.6.25 ?

I would expect either to cause large scale bug reports.

Juan, can you send a raw ethereal capture file from the server when the
MTU is 1500 ? It could help.

-- 
Ueimor
Comment 6 Juan Jose Pablos 2008-05-15 01:33:48 UTC
Created attachment 16154 [details]
EtherReal (Tshark) output with mtu 1500

The system boot up using dhcp. This is the output from the command on the server:
  tshark -i eth0 host 192.168.1.21 -w sis191-mtu1500
Comment 7 Francois Romieu 2008-05-15 13:46:07 UTC
Thanks Juan.

Can you send a capture file including the same sequence with a
working MTU ? It does not need to be too precise.

Reading your report I am not completely sure if you changed the MTU
of the client interface or the MTU of the server interface. Can you
enlighten me ?

-- 
Ueimor
Comment 8 Juan Jose Pablos 2008-05-15 15:26:47 UTC
The MTU change was on the client. After the interface was given using 
dhcp. I will forward a secuence when the inferface goes up, gives the 
error and then change the MTU to 1492. My problem now is that this 
secuence is too big for the report (17M).

Is there a way I can search for the important bits?
Comment 9 Francois Romieu 2008-05-16 14:52:47 UTC
juanjo@apertus.es  2008-05-15 15:26 :
> The MTU change was on the client. After the interface was given using 
> dhcp. I will forward a secuence when the inferface goes up, gives the 
> error and then change the MTU to 1492. My problem now is that this 
> secuence is too big for the report (17M).
> 
> Is there a way I can search for the important bits?

I have looked at the dump available at :
http://www.apertus.es/sis191-mtu1500-mtu1492.dat

Assuming a MTU at 1500, cant the server reach the client with a
'ping -c 1 -s 1472 192.168.1.21' ?
Comment 10 Juan Jose Pablos 2008-05-16 15:49:54 UTC
> Assuming a MTU at 1500, cant the server reach the client with a
> 'ping -c 1 -s 1472 192.168.1.21' ?
> 
> 
no,

cheche@ntinstall:~$ ping -c 1 -s 1472 192.168.1.21
PING 192.168.1.21 (192.168.1.21) 1472(1500) bytes of data.

--- 192.168.1.21 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

neither 1471 to 1469 but:

cheche@ntinstall:~$ ping -c 1 -s 1468 192.168.1.21
PING 192.168.1.21 (192.168.1.21) 1468(1496) bytes of data.
1476 bytes from 192.168.1.21: icmp_seq=1 ttl=64 time=0.640 ms

--- 192.168.1.21 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.640/0.640/0.640/0.000 ms
Comment 11 Francois Romieu 2008-05-20 14:08:47 UTC
I would have expected a 1464 bytes sized ping to be correctly
processed as it matches the 1492 MTU but 1468 is a bit surprising :
it would argue for a 1496 bytes MTU. Can you apply the debug patch
to the client and send the log for a 1500 bytes MTU and for the
(approximated) same sequence of packets witha 1492 bytes MTU ?

The plain output of 'ifconfig' for the client will be welcome to
figure the Rx/Tx client stats.

Out of curiosity: is the server an Intel e1000 ?

-- 
Ueimor
Comment 12 Francois Romieu 2008-05-20 14:09:25 UTC
Created attachment 16220 [details]
sis190 noisy debug helper
Comment 13 Juan Jose Pablos 2008-05-20 15:52:17 UTC
bugme-daemon@bugzilla.kernel.org escribi
Comment 14 Juan Jose Pablos 2008-05-20 15:55:18 UTC
bugme-daemon@bugzilla.kernel.org escribi
Comment 15 Francois Romieu 2008-05-21 12:50:38 UTC
Created attachment 16232 [details]
sis190 noisy debug helper (against 2.6.25.3)
Comment 16 Juan Jose Pablos 2008-05-22 02:01:31 UTC
Here the ifconfig :
eth0      Link encap:Ethernet  HWaddr 00:1C:25:2C:48:45
          inet addr:192.168.1.21  Bcast:192.168.1.254  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:29 errors:23 dropped:0 overruns:0 frame:23
          TX packets:32 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:4662 (4.5 KiB)  TX bytes:5181 (5.0 KiB)
          Interrupt:19 Base address:0xdead
later:

eth0      Link encap:Ethernet  HWaddr 00:1C:25:2C:48:45
          inet addr:192.168.1.21  Bcast:192.168.1.254  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1492  Metric:1
          RX packets:1551 errors:37 dropped:0 overruns:0 frame:37
          TX packets:1069 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1778371 (1.6 MiB)  TX bytes:131192 (128.1 KiB)
          Interrupt:19 Base address:0xdead
Comment 17 Juan Jose Pablos 2008-05-22 02:08:36 UTC
Created attachment 16242 [details]
Dmesg output using default mtu until fail the mount command

This is the dmesg  output since the driver is loaded up to when error is displayed.
Comment 18 Juan Jose Pablos 2008-05-22 02:12:31 UTC
Created attachment 16243 [details]
Dmesg output using mtu 1492

dmesg when I run "ifconfig eth0 mtu 1492". The command does not respond automatic, it takes a few seconds for the link to stop getting rx errors
Comment 19 Francois Romieu 2008-05-22 15:22:34 UTC
juanjo@apertus.es  2008-05-22 02:08 :
[...]
> Created an attachment (id=16242)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=16242&action=view)
> Dmesg output using default mtu until fail the mount command
> 
> This is the dmesg  output since the driver is loaded up to when error is
> displayed.

Interesting:

eth0: rx[16] 012005eb 76042000 349db010 00000600
                  ^^^
1515 bytes on the wire.
Comment 20 Francois Romieu 2008-05-23 13:56:20 UTC
Created attachment 16259 [details]
More specific debug helper (against 2.6.25.3, on top of previous one)
Comment 21 Francois Romieu 2008-05-23 14:02:34 UTC
Juan, can you try the attached patch with a 1500 bytes MTU ?

It should be easy to compare the received data against your previous
ethereal dump then.

Btw, it may help to process the kernel log asynchronously through syslog.

-- 
Ueimor
Comment 22 Juan Jose Pablos 2008-05-24 03:34:52 UTC
Created attachment 16264 [details]
Debug for the extra byte 

A single sequence when  PSize == 0x012005e
Comment 23 Francois Romieu 2008-05-26 14:24:45 UTC
Created attachment 16286 [details]
Debug helper for the extra byte (against 2.6.25.3, on top of previous one)

Juan, can you apply the included patch in place of the previous (broken)
one and send the result with a 1500 bytes MTU ?

Sorry for the tediousness.

-- 
Ueimor
Comment 24 Juan Jose Pablos 2008-05-26 17:33:51 UTC
Created attachment 16288 [details]
Second debug for the extra byte

No problem to report another sequency. I hope that this is enough for you.
Comment 25 Francois Romieu 2008-05-29 02:30:48 UTC
juanjo@apertus.es 2008-05-26 17:33 :
> Created an attachment (id=16288)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=16288&action=view)
> Second debug for the extra byte
> 
> No problem to report another sequency. I hope that this is enough for you.

It still reports like the previous helper.

You should apply on top of 2.6.25.3:
- http://bugzilla.kernel.org/attachment.cgi?id=16232
- http://bugzilla.kernel.org/attachment.cgi?id=16286
Comment 26 Juan Jose Pablos 2008-06-01 17:38:52 UTC
Created attachment 16355 [details]
Second debug for the extra byte

Second debug with the extra byte, I hope this is with the right format. This secuency is when the interface goes up.
Comment 27 Juan Jose Pablos 2008-06-01 17:40:01 UTC
Created attachment 16356 [details]
Third debug for the extra byte

This secuency is when the the mount cifs mount a network share.
Comment 28 Francois Romieu 2008-06-02 14:48:52 UTC
juanjo@apertus.es  2008-06-01 17:38 :
> Created an attachment (id=16355)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=16355&action=view)
> Second debug for the extra byte
> 
> Second debug with the extra byte, I hope this is with the right format. This
> secuency is when the interface goes up.

It is the right format.

Are Tx checksumming or segmentation offload disabled on the server ?
Comment 29 Juan Jose Pablos 2008-06-02 14:52:05 UTC
> 
> Are Tx checksumming or segmentation offload disabled on the server ?
> 
> 
how do I find out?
Comment 30 Francois Romieu 2008-06-03 00:02:24 UTC
juanjo@apertus.es  2008-06-02 14:52 :
> > 
> > Are Tx checksumming or segmentation offload disabled on the server ?
> > 
> > 
> how do I find out?

ethtool -k ethX
Comment 31 Juan Jose Pablos 2008-06-03 00:45:11 UTC
ntinstall:~# ethtool -k eth0
Offload parameters for eth0:
Cannot get device udp large send offload settings: Operation not supported
Cannot get device generic segmentation offload settings: Operation not 
supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
Comment 32 Alan 2009-03-24 07:37:21 UTC
Is this problem still present ?
Comment 33 Juan Jose Pablos 2009-05-13 11:48:53 UTC
yes, it still present, I have experiment same behavioiur with 2.6.29.1 but I am not able to use a work around "ifconfig eth0 mtu 1492" so I am stuck with this system.
The status is the you needinfo, please let me know what info do you need.
Comment 34 Juan Jose Pablos 2009-05-13 11:52:17 UTC
forget about the info of workaround. That still works. but the information on the eth0tool is a bit diferent:

ntinstall:/home/cheche/ethtool# ./ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off
Comment 35 Pascal Terjan 2009-05-29 17:07:02 UTC
I have this issue here, with MTU=1500 I get PSize=012005eb which triggers LIMIT error

By setting it to 1496 the PSize is 012005ea and everything work fine
Comment 36 Pascal Terjan 2009-05-29 17:14:03 UTC
oops I mean 010105ea
Comment 37 Juan Jose Pablos 2010-02-08 20:12:12 UTC
Hi,
I have test it on 2.6.32.7 and the problem persist. Is there anything that I can report to get some light?
Comment 38 Juan Jose Pablos 2010-02-18 21:22:13 UTC
Francois,
the bug is on Need Info state. Please let me know what can I provided to help more info.
Comment 39 Alan 2012-08-29 17:05:49 UTC
If this is still seen in a modern kernel please re-open/update thanks