Bug 42872
Summary: | fstat()/ext3_iget() sometime takes over 2 minutes... | ||
---|---|---|---|
Product: | File System | Reporter: | Petr Vandrovec (petr) |
Component: | ext3 | Assignee: | fs_ext3 (fs_ext3) |
Status: | NEW --- | ||
Severity: | normal | CC: | jack, szg00000 |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.3.0-rc5 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | dmesg from boot + hang |
Description
Petr Vandrovec
2012-03-05 18:52:57 UTC
Hmm, we are waiting for buffer lock so likely someone is writing that inode out. It would be good to have stack traces of all hanging processes at that moment to tell what they are doing. So can you please do one more run and when things like suspiciously long fstat happen, run as fast as possible "echo w >/proc/sysrq-trigger" and attach here the output? Thanks Created attachment 72596 [details]
dmesg from boot + hang
I've uploaded kernel log from today's occurrence. Apparently building code together with find stresses system too much. 'free' says:
total used free shared buffers cached
Mem: 8149468 7922940 226528 0 415524 5283992
-/+ buffers/cache: 2223424 5926044
Swap: 2040248 59792 1980456
which does not look that bad...
I did some testing of read latencies under heavy write load. Switching to data=writeback helped a lot (not that data=ordered shouldn't be fixed but it's not so simple). So can you try whether it helps also your case? I've looked at my box, and it says: [ 9.329354] EXT3-fs (sda5): mounted filesystem with writeback data mode [ 69.269122] EXT3-fs (sda1): mounted filesystem with writeback data mode [ 69.606184] EXT3-fs (sdb2): mounted filesystem with writeback data mode [ 69.641482] EXT3-fs (sdf1): mounted filesystem with writeback data mode [ 69.677489] EXT3-fs (sdg1): mounted filesystem with writeback data mode [ 69.792985] EXT3-fs (sdh1): mounted filesystem with writeback data mode [ 69.883098] EXT3-fs (sdc1): mounted filesystem with writeback data mode So I'm already using writeback... But for unknown reason system did not reboot for last 6 days and 5 hours. |