Bug 89131 - Hangs when checking torrent through libtorrent; kernel BUG at mm/iov_iter.c:219!
Summary: Hangs when checking torrent through libtorrent; kernel BUG at mm/iov_iter.c:219!
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: x86-64 Linux
: P1 blocking
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-01 13:19 UTC by kokoko3k@gmail.com
Modified: 2016-03-20 11:20 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.17.3
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Backtraces from both the systems (40.66 KB, text/plain)
2014-12-01 13:19 UTC, kokoko3k@gmail.com
Details
Kernel BUG dump (4.26 KB, text/plain)
2014-12-17 09:49 UTC, Lubos Dolezel
Details
dumpe2fs (1.91 KB, text/plain)
2014-12-19 16:37 UTC, Lubos Dolezel
Details
debugging patch (should apply for 3.17 and 3.18) (489 bytes, patch)
2015-01-19 17:36 UTC, Theodore Tso
Details | Diff

Description kokoko3k@gmail.com 2014-12-01 13:19:19 UTC
Created attachment 159351 [details]
Backtraces from both the systems

When i use qbittorrent (libtorrent) by explicitely disable the os cache and do the hash-check on a specific torrent data (it does not happen on all of the torrents), linux hangs leaving me no choice but to reboot somehow.

I managed to get two backtraces from completely different systems (an e7300 desktop and an ivybridge laptop), they're attached.

As a reference, i started to talk about the issue here:
https://github.com/qbittorrent/qBittorrent/issues/2200#issuecomment-64951598

...seems the issue is given by the use of the O_DIRECT flag
Comment 1 Lubos Dolezel 2014-12-17 09:49:38 UTC
Created attachment 160921 [details]
Kernel BUG dump

I'm seeing the same kind of crash, 5 times just yesterday!

I'm using libtorrent to download a huge torrent (almost 50GB) with many files inside (BluRay dump). I haven't seen it with other torrents.
Comment 2 kokoko3k@gmail.com 2014-12-17 11:08:51 UTC
Depending on the client you're using, you should be able to enable the kernel cache so that it doesn't crash anymore.
Just a workaround, of course.
Comment 3 Lubos Dolezel 2014-12-17 11:20:07 UTC
Yep, this workaround helps.

As a little side note, using direct I/O with ext4 has resulted in many other crashes on my system including softlockups and various other oopses. ext4's direct involvement could be seen in ~50% of them.

So this "kernel BUG" is not the only possible outcome.
Comment 4 kokoko3k@gmail.com 2014-12-17 11:36:58 UTC
I wasn't able to reproduce the problem using dd with conv=direct
How did you do that?
It may be handy for developers to have a quicker way to test the issue.
Comment 5 Lubos Dolezel 2014-12-18 09:49:20 UTC
Sorry for misunderstanding, I was only able to reproduce it with libtorrent.

Torrent checking resulted in aforementioned BUGs, whereas torrent downloading (i.e. write operations) resulted in much more serious problems - e.g. lockups in kworker in kthread_data().
Comment 6 Theodore Tso 2014-12-19 16:02:38 UTC
Couple of questions.   First of all, is there any chance you could give a try with 3.18.0, and see if it fails there?

Also, can you try to get line numbers out of the stack dump?   

Also, how long does it take before you trigger the failure?  I'm in particular interested in seeing where this comes from:

[ 4244.680376]  [<ffffffff81203313>] __blockdev_direct_IO+0x25e3/0x3630

Part of the problem here is that fs/direct_io.c has a bunch of inline functions, so it becomes tricky to figure out where the call to advance_iovec comes from.   This looks like a bug in the core direct I/O code, but we do a fair amount of regression testing and this is a new one; it's a bit surprising no once else found this when running xfstests.

To get the line numbers, compile the kernel with

CONFIG_DEBUG_INFO=y

(You can also set CONFIG_DEBUG_INFO_REDUCED to reduce the size of your kernel build.)

Then use the addr2line program to translate the address to a line number, i.e.:

addr2line -e /path/to/kernel/build/vmlinux -i -a ffffffff81203313

Finally, can you send me an output of dumpe2fs -h of your ext3/ext4 file system, just so I can understand what file system features are enabled with your file system.

Thanks!!
Comment 7 Lubos Dolezel 2014-12-19 16:30:46 UTC
I'd love to test it with 3.18.0, but I cannot due to #89881. But I'll try CONFIG_DEBUG_INFO=y.
Comment 8 Lubos Dolezel 2014-12-19 16:37:33 UTC
The kernel BUG typically happened pretty quickly when doing torrent checking (reading).

Other symptoms (more fatal crashes) when downloading and uploading happened within minutes/hours, but that's understandable, because the I/O is lower in this case.

Note that I haven't had this issue for weeks until I started downloading a particular torrent.
Comment 9 Lubos Dolezel 2014-12-19 16:37:50 UTC
Created attachment 161341 [details]
dumpe2fs
Comment 10 kokoko3k@gmail.com 2015-01-19 07:54:27 UTC
Unfortunately the crash still happens on 3.18.2
Comment 11 Theodore Tso 2015-01-19 17:36:59 UTC
Created attachment 163831 [details]
debugging patch (should apply for 3.17 and 3.18)

This BUG_ON is happening from the mm part of the direct_io code, which is supposed to be file system independent, and so I don't know that half of the DIO code as well.   Can you try applying the attached patch to see if it gets us any useful information when the bug triggers?

Also, at this point it would be *really* useful if we could get a reproduction case that doesn't require using rtorrent -- i.e., something that uses just dd, or maybe a simple C program.   Would it be possible to run the a tty-mode torrent on a serial console under strace, or some such, so we can get figure out what the system calls were that apparently triggered the BUG_ON?

Note You need to log in before you can comment on or make changes to this bug.