Bug 16431
Summary: | does not seem to shut down one of two md raid-1 arrays on hibernation | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Martin Steigerwald (martin.steigerwald) |
Component: | MD | Assignee: | Neil Brown (neilb) |
Status: | CLOSED UNREPRODUCIBLE | ||
Severity: | normal | CC: | maciej.rutecki, neilb, rjw |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.34.1-workstation-toi-3.1.1.1-04990-g3a7d1f4 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 7216, 15310 | ||
Attachments: |
first occurence of the hang on shutting down one of the softraids
one other occurence of the hang where I waited a bit longer, an ext4 error occured |
Description
Martin Steigerwald
2010-07-21 13:08:48 UTC
Created attachment 27182 [details]
first occurence of the hang on shutting down one of the softraids
Some more information to add:
- Its a hang. The kernel does not skip shutting down the RAID array, but just hangs there then.
- It only seem to happen with md1.
- It does not happen all of the time, sometimes, the kernel just shut down immediately·
Created attachment 27183 [details]
one other occurence of the hang where I waited a bit longer, an ext4 error occured
After about one or two minutes ext4 showed an error. But a fsck.ext4 -f from a grml 2010.04 did find anything.
I just tested with 2.6.33.6 and it shut down both RAID arrays, i.e. put both of them to read only, also md1. Putting md1 to read only is missing in the cases where 2.6.34.1 hangs on shutdown. I'm guessing that it is hanging in md_update_sb. It would be good if you could confirm that, with by collecting the output of "alt-sysrq-T" or by adding some printk's to the code. Why it would hang there I don't know. Maybe the device under the array is still suspended and isn't responding to IO ?? The rare occasion that you don't get a hang would be explained by the 'safe_mode' timer having fired and the array being marked 'clean' before the suspend. I assume that suspend-to-disk image is not being written to the md array? Maybe tux-on-ice needs to sync the md array before entering suspend, maybe after making the filesystem read-only if it does that... unfortunately it isn't easy to do that. Maybe "mdadm -r /dev/md1 ; mdadm -w /dev/md1" would do it. Anyway, the first step is to confirm that md_update_sb is the culprit. Thanks, Neil. Alt-SysRQ-T prints quite a long output. And it will be just prior to shutting the machine off. The only thing I noticed after shutting down the MDs are the messages "Synchronising SCSI caches" - with the working 2.6.33.6 kernel. I don't think I have network at that time anymore and the machine has no serial port either. Is there some way to put some short "I am here" message into md_update_sb? Or a helpful SysRQ key combination that just outputs one page that I could take a photo of? TuxOnIce is using a Swap-RAID 0 for its hibernation image, it doesn't use SoftRAID for it. mango:~# swapon -s Filename Type Size Used Priority /dev/sda7 partition 7815580 8528 1 /dev/sdb7 partition 7815580 8448 1 I do not know whether it utilizes the Swap-RAID 0, AFAIK it should. I get the following image writing and reading speeds: mango:~# grep -i "I/O speed" /var/log/syslog Jul 20 16:37:04 mango kernel: - I/O speed: Write 73 MB/s, Read 207 MB/s. Jul 21 09:26:49 mango kernel: - I/O speed: Write 71 MB/s, Read 204 MB/s. Jul 21 09:42:03 mango kernel: - I/O speed: Write 65 MB/s, Read 188 MB/s. Jul 21 15:25:52 mango kernel: - I/O speed: Write 89 MB/s, Read 217 MB/s. Jul 26 08:21:34 mango kernel: - I/O speed: Write 78 MB/s, Read 216 MB/s. But anyway its not on the SoftRAID. Handled-By : Neil Brown <neilb@suse.de> Is the problem still present in 2.6.37? I didn't see this problem since I switched to in-mainline-kernel hibernation. Might have do something to do with TuxOnIce then or not. I do not remember the exactly what I did back then anymore. 2.6.36 is working fine here. Thus closing. Thanks for reminder. |