Bug 10227

Summary: zisofs stuck on reading files
Product: File System Reporter: Christian Perle (chris)
Component: OtherAssignee: fs_other
Status: RESOLVED CODE_FIX    
Severity: normal CC: akpm, hidave.darkstar, jack
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: zisofs zisofs_readpage fix
zisofs zisofs_readpage invalid req fix
readahead ret value fix
Handle the over read of zisofs (Update)

Description Christian Perle 2008-03-11 13:36:33 UTC
Latest working kernel version: 2.6.22
Earliest failing kernel version: 2.6.23
Distribution: Ubuntu 7.04, but self-compiled vanilla 2.6.24 kernel, config
  is here: http://chris.silmor.de/config-2.6.24
Hardware Environment: Intel P4 1.8 GHz, 256MB ram (also happens in QEMU)
Software Environment: gcc 4.1.2
Problem Description: On reading certain files on a zisofs (iso9660 + rockridge
extensions and compression), the reading process first gets on I/O error. On
the second read attempt the process is stuck forever in read(). The problem
only happens on zisofs sizes around 30MB and larger. Afterwards, mount and
umount commands will also hang with status "D".

Steps to reproduce:
Create a zisofs containing the kernel source (this will be large enough to
trigger the bug):
  export LANG=C
  mkzftree linux-2.6.24 ztree
  mkisofs -R -z -hide-rr-moved -o testimg.iso ztree
  rm -rf ztree

Loop-mount the newly created image on a kernel with CONFIG_ZISOFS=y enabled:
  mount -o loop,ro testimg.iso /mnt/loop

Read all files on the mounted zisofs image:
  find /mnt/loop -type f -exec md5sum {} \;

At least one of the md5sum will fail to read. Now repeat the last step:
  find /mnt/loop -type f -exec md5sum {} \;

This time one md5sum process will hang in read() and stay in "D" status
forever.
Comment 1 Andrew Morton 2008-03-11 13:48:57 UTC
ug.  We haven't changed much in isofs in ages.

Can you please run sysrq-W when tasks are in D state so we can
see where they're stuck?  Either type alt-sysrq-W or run
`echo w > /proc/sysrq-trigger'

Thanks.
Comment 2 Christian Perle 2008-03-11 14:10:22 UTC
This is the output, produced on a live CD system i have made. It has the same
vanilla 2.6.24 and uses zisofs as its compressed loop image (instead of
cloop or squashfs -- i wanted something which is in the official kernel).
The kernel config is nearly the same as above:
http://chris.silmor.de/config-2.6.24-vdsl

 SysRq : Show Blocked State
   task                PC stack   pid father
 md5sum        D c481c5e4     0  1437   1339
        c3863560 00000086 c206be04 c481c5e4 0000000a c202500e c3863690 c3863690 
        c10865a0 c3913a00 c03b4020 c03b4020 c206be10 c380d120 c380d2a0 c03b4020 
        0000000a ffffb3fd c01762b0 c3707a08 c016f232 ffffffff c01d9cca c10865a0 
 Call Trace:
  [<c481c5e4>] isofs_dentry_cmp+0x0/0x14 [isofs]
  [<c01762b0>] dput+0x18/0xf0
  [<c016f232>] __link_path_walk+0x742/0xc44
  [<c01d9cca>] __copy_to_user_ll+0x3e/0x1a8
  [<c02cd9fe>] io_schedule+0x1a/0x24
  [<c0148bed>] sync_page+0x31/0x3c
  [<c02cdb4e>] __wait_on_bit_lock+0x3e/0x60
  [<c0148bbc>] sync_page+0x0/0x3c
  [<c0148bac>] __lock_page+0x58/0x60
  [<c0135134>] wake_bit_function+0x0/0x40
  [<c0149367>] do_generic_mapping_read+0x26f/0x46c
  [<c0148948>] file_read_actor+0x0/0xf0
  [<c014ad1b>] generic_file_aio_read+0xc7/0x1b4
  [<c0148948>] file_read_actor+0x0/0xf0
  [<c0166e59>] do_sync_read+0xd5/0x114
  [<c01350fc>] autoremove_wake_function+0x0/0x38
  [<c01573d6>] handle_mm_fault+0x15a/0x578
  [<c0167670>] vfs_read+0x88/0x120
  [<c0166d84>] do_sync_read+0x0/0x114
  [<c0167b00>] sys_read+0x3c/0x74
  [<c010454e>] syscall_call+0x7/0xb
  =======================
Comment 3 Anonymous Emailer 2008-03-11 14:17:51 UTC
Reply-To: akpm@linux-foundation.org

that trace is consistent with a device driver problem, or an interrupt
delivery problem, etc.

What makes you believe that it is due to isofs rather than to some
lower-level bug?

Thanks.
Comment 4 Christian Perle 2008-03-11 14:24:56 UTC
The problem is hardware inpendendent. It happens on real hardware as well
as in QEMU. When reading from a "normal" (uncompressed) iso9660+RR or ext2
or ext3 on the same physical block device, the problem does not occur.
Comment 5 Anonymous Emailer 2008-03-11 14:50:27 UTC
Reply-To: akpm@linux-foundation.org

ug, OK.

Could be that something broke around zisofs_readpage().

Are you able to prepare a simple step-by-step recipe which others
can use to reproduce this?

Thanks.
Comment 6 Christian Perle 2008-03-12 01:44:02 UTC
I thought the steps listed in Comment #0 would already provide that info.
What else is needed?
Comment 7 Andrew Morton 2008-03-12 01:59:12 UTC
Oh, OK, yes, I can reproduce it in 2.6.25-rc5-mm1.
Comment 8 Dave Young 2008-03-12 22:40:04 UTC
Created attachment 15234 [details]
zisofs zisofs_readpage fix

Hi, Could you try the attached patch, it works for me.

The problem is that the index could be >= maxpage, so we will get -EIO in zisofs_readpage

But there's still another problem, even if get -EIO we should not block there.
Comment 9 Christian Perle 2008-03-13 02:36:28 UTC
Yes, that fixes the problem. Thanks!

Although i don't understand why kernels up to 2.6.22 worked fine without this
patch.
Comment 10 Jan Kara 2008-03-13 03:15:48 UTC
Dave, I think when we took page after i_size, we never get to unlocking it, so next time someone tried to read it, he blocked on the page lock.

But what I don't understand in your fix is that you put requested page in pages[0] but then offset of the page doesn't correspond to the offset in the pages array. I'm not familiar with the code so it may work out fine in the and but it really looks strange. Shouldn't we rather bail out from readpage at the moment we see readpage request for page after i_size?
Comment 11 Dave Young 2008-03-13 18:29:15 UTC
(In reply to comment #10)
> Dave, I think when we took page after i_size, we never get to unlocking it,
> so
> next time someone tried to read it, he blocked on the page lock.
> 
> But what I don't understand in your fix is that you put requested page in
> pages[0] but then offset of the page doesn't correspond to the offset in the
> pages array. I'm not familiar with the code so it may work out fine in the
> and
> but it really looks strange. Shouldn't we rather bail out from readpage at
> the
> moment we see readpage request for page after i_size?

Sounds right, my patch might have problem.
I need to rethink about it.

> 
Comment 12 Dave Young 2008-03-14 02:43:03 UTC
Created attachment 15261 [details]
zisofs zisofs_readpage invalid req fix

Just unlock the page to fix it.

The -EIO happened with the file:
arch/mips/kernel/rtlx.c

This file i_size = 12288, after compressed size will be 3965
just cat it you could reproduce the bug

have some relation to readahead?
Comment 13 Dave Young 2008-03-14 03:39:09 UTC
Created attachment 15262 [details]
readahead ret value fix

Finally, I found the readahead function ret value is not correct if offset > end_index

This would be the right fix.
Comment 14 Dave Young 2008-03-14 17:11:49 UTC
Sorry please ignore the Comment #13, it's not right. Sorry...

The call path is like this:
(cat loop/arch/mips/kernel/rtlx.c)

##in __do_page_cache_readahead

Mar 14 18:34:39 darkstar kernel: [  117.837850] #0 page_offset : 0,
end_index : 2 ret: 0
Mar 14 18:34:39 darkstar kernel: [  117.837861] #0 page_offset : 1,
end_index : 2 ret: 1
Mar 14 18:34:39 darkstar kernel: [  117.837888] #0 page_offset : 2,
end_index : 2 ret: 2
Mar 14 18:34:39 darkstar kernel: [  117.837922] #0 page_offset : 3,
end_index : 2 ret: 3

##Above passed, then 

Mar 14 18:34:40 darkstar kernel: [  118.051620] #0 page_offset : 3,
end_index : 2 ret: 0

##page_offset as 3 __do_page_cache_readahead was called again

Mar 14 18:34:40 darkstar kernel: [  118.051626] zisofs : req invalid

##Here in isofs get invalid req

Actually, called from 
do_generic_file_read -> page_cache_sync_readahead -> __do_page_cahce_readahead

do_generic_file_read make it read a ppos beyond the i_size.

Any clue?
Comment 15 Dave Young 2008-03-17 03:22:02 UTC
Created attachment 15309 [details]
Handle the over read of zisofs (Update)

New patch for this bug, the read request over the i_size is normal, It will be dealed in do_generic_file_read. So we just return 0 to avoid getting -EIO as normal reading, let do_generic_file_read to do the rest.
Comment 16 Jan Kara 2008-03-17 06:02:26 UTC
Yes, this patch looks fine. You can add Acked-by: Jan Kara <jack@suse.cz>
when you submit it.