|Summary:||Killing process using ncpfs makes mount unusable|
|Product:||File System||Reporter:||Peter Astrand (astrand)|
|Component:||Other||Assignee:||Pierre Ossman (pierre-bugzilla)|
Description Peter Astrand 2004-09-01 13:27:50 UTC
Distribution: Suse 9.1 (actually all as far as I know) Hardware Environment: i386 Problem Description: When KILLing a process in the middle of a ncpfs operation, the mount becomes unusable. Subsequent file operations returns EIO. Steps to reproduce: One easy way to reproduce this is to run KDE 3.2 with the DCOP server authority file on a ncpfs. The authority file can be set with the environment variable DCOPAUTHORITY. When shutting down KDE, a process called dcopserver_shutdown will busy-looping read(), which returns EIO. After killing dcopserver_shutdown with SIGKILL, the NCP mount is in a non-working state. The ncpfs maintainer, Petr Vandrovec, has confirmed that this problems exists: >> * When shutting down KDE, the file system startas reporting EIO >> (Input/output error). It's also very un-responsive; if one process runs >> read() (returning EIO) in a tight loop, other processes hangs forever (well, >> at least > 12 hours). > >Someone aborted process in the middle of ncpfs operation with SIGKIL. >As ncp request/responses are handled in context of calling >process, if you SIGKILL process after it sent request but before it >received response, connection is invalidated. But it should not hang >other processes which have nothing to do with ncpfs. If you run in a >loop process which does some (failing) ncpfs operation, other processes >which want to use that mountpoint of course blocks: Linux semaphores are not fair. If you do not want SIGKILL to abort connection, replace >sigmask(SIGKILL) with sigmask(0) in fs/ncpfs/sock.c. But then you may >get unkillable process in the case TCP connected server dies, or if you'll >set infinite timeout for UDP/IPX servers He has also indicated that this problem might be solveable: >Actually with 2.6.x fs/ncpfs/sock.c changing code to not call >ncp_abort_connection() when something goes wrong should not be that >hard. Only problem is that ncp_request_reply structure is allocated >on caller stack, and so you must kill receiver (ncpdgram_rcv_proc/ >ncptcp_rcv_proc) before you release this structure.
Comment 1 Peter Astrand 2005-11-10 06:39:10 UTC
This is still a problem for us. Any progress?
Comment 2 Pierre Ossman 2007-04-04 00:18:12 UTC
Cendio has provided a patch which has been merged upstream (will be in 2.6.21). Closing bug.