Kernel Bug Tracker – Bug 3328
Killing process using ncpfs makes mount unusable
Last modified: 2007-04-04 00:18:12 UTC
Distribution: Suse 9.1 (actually all as far as I know)
Hardware Environment: i386
When KILLing a process in the middle of a ncpfs operation, the mount becomes
unusable. Subsequent file operations returns EIO.
Steps to reproduce:
One easy way to reproduce this is to run KDE 3.2 with the DCOP server authority
file on a ncpfs. The authority file can be set with the environment variable
DCOPAUTHORITY. When shutting down KDE, a process called dcopserver_shutdown will
busy-looping read(), which returns EIO. After killing dcopserver_shutdown with
SIGKILL, the NCP mount is in a non-working state.
The ncpfs maintainer, Petr Vandrovec, has confirmed that this problems exists:
>> * When shutting down KDE, the file system startas reporting EIO
>> (Input/output error). It's also very un-responsive; if one process runs
>> read() (returning EIO) in a tight loop, other processes hangs forever (well,
>> at least > 12 hours).
>Someone aborted process in the middle of ncpfs operation with SIGKIL.
>As ncp request/responses are handled in context of calling
>process, if you SIGKILL process after it sent request but before it
>received response, connection is invalidated. But it should not hang
>other processes which have nothing to do with ncpfs. If you run in a
>loop process which does some (failing) ncpfs operation, other processes
>which want to use that mountpoint of course blocks: Linux semaphores
are not fair. If you do not want SIGKILL to abort connection, replace
>sigmask(SIGKILL) with sigmask(0) in fs/ncpfs/sock.c. But then you may
>get unkillable process in the case TCP connected server dies, or if you'll
>set infinite timeout for UDP/IPX servers
He has also indicated that this problem might be solveable:
>Actually with 2.6.x fs/ncpfs/sock.c changing code to not call
>ncp_abort_connection() when something goes wrong should not be that
>hard. Only problem is that ncp_request_reply structure is allocated
>on caller stack, and so you must kill receiver (ncpdgram_rcv_proc/
>ncptcp_rcv_proc) before you release this structure.
This is still a problem for us. Any progress?
Cendio has provided a patch which has been merged upstream (will be in 2.6.21).