Bug 3328 - Killing process using ncpfs makes mount unusable
Summary: Killing process using ncpfs makes mount unusable
Status: CLOSED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Pierre Ossman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-09-01 13:27 UTC by Peter Astrand
Modified: 2007-04-04 00:18 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.8.1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Peter Astrand 2004-09-01 13:27:50 UTC
Distribution: Suse 9.1 (actually all as far as I know)
Hardware Environment: i386

Problem Description:
When KILLing a process in the middle of a ncpfs operation, the mount becomes
unusable. Subsequent file operations returns EIO. 

Steps to reproduce:
One easy way to reproduce this is to run KDE 3.2 with the DCOP server authority
file on a ncpfs. The authority file can be set with the environment variable
DCOPAUTHORITY. When shutting down KDE, a process called dcopserver_shutdown will
busy-looping read(), which returns EIO. After killing dcopserver_shutdown with
SIGKILL, the NCP mount is in a non-working state. 

The ncpfs maintainer, Petr Vandrovec, has confirmed that this problems exists:

>> * When shutting down KDE, the file system startas reporting EIO 
>> (Input/output error). It's also very un-responsive; if one process runs
>> read() (returning EIO) in a tight loop, other processes hangs forever (well,
>> at least > 12 hours). 
>
>Someone aborted process in the middle of ncpfs operation with SIGKIL.  
>As ncp request/responses are handled in context of calling
>process, if you SIGKILL process after it sent request but before it
>received response, connection is invalidated.  But it should not hang 
>other processes which have nothing to do with ncpfs.  If you run in a 
>loop process which does some (failing) ncpfs operation, other processes 
>which want to use that mountpoint of course blocks: Linux semaphores 
are not fair.  If you do not want SIGKILL to abort connection, replace
>sigmask(SIGKILL) with sigmask(0) in fs/ncpfs/sock.c.  But then you may
>get unkillable process in the case TCP connected server dies, or if you'll
>set infinite timeout for UDP/IPX servers

He has also indicated that this problem might be solveable:

>Actually with 2.6.x fs/ncpfs/sock.c changing code to not call
>ncp_abort_connection() when something goes wrong should not be that
>hard. Only problem is that ncp_request_reply structure is allocated
>on caller stack, and so you must kill receiver (ncpdgram_rcv_proc/
>ncptcp_rcv_proc) before you release this structure.
Comment 1 Peter Astrand 2005-11-10 06:39:10 UTC
This is still a problem for us. Any progress?
Comment 2 Pierre Ossman 2007-04-04 00:18:12 UTC
Cendio has provided a patch which has been merged upstream (will be in 2.6.21).
Closing bug.

Note You need to log in before you can comment on or make changes to this bug.