Bug 9419 - RAID5 writer hanging after minor abuse (35GB-125GB of writes)
Summary: RAID5 writer hanging after minor abuse (35GB-125GB of writes)
Status: CLOSED OBSOLETE
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: MD (show other bugs)
Hardware: All Linux
: P1 high
Assignee: io_md
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-11-20 09:14 UTC by Philip Copeland
Modified: 2012-05-17 15:08 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.23.8 (vs 2.6.22.13)
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Philip Copeland 2007-11-20 09:14:35 UTC
Most recent kernel where this bug did not occur:
2.6.22.13 (noapic acpi=off nmi_watchdog=1)

Distribution:
Mainline

Hardware Environment:
http://smolt.fedoraproject.org/show?UUID=7ed97b00-c21c-4892-94b5-0313c6649667

Software Environment:
F8

Problem Description:
*scratch* where to begin,...
Ok I have a 3.6TB FS on RAID5 (5x 1TB drives)
the WRITE portion of the raid code seems to lockup under load after writing anything from 35 - 125Gb of data

despite the lockup, anything READING the RAID5 data is fine eg you can be happily watching OLS video's (while straining to work out whats being said) all night despite the system locking up on writing.

Unfortunately when the lockup occurs, the keyboard works for a few minutes then get corrupted and refuses to respond anymore witha  stuck character in the buffer.

Steps to reproduce:
Anyway back to the WRITE issue, Basically if I run 'tar -C /dir -cf - . | tar -C /raid_dev/backup -xf -' things will proceed as expected,.. until sufficent amount of data has been copied that the writer softlock's

I do have a sysrq trace of the state of the box here
http://zeniv.linux.org.uk/~bryce/trace1.txt

Ideas? The main annoyance is that while the data isn't corrupted, after reboot, the raid will spend the next 3 hours resyncing itself

Phil
=--=
Comment 1 Andrew Morton 2007-11-20 13:12:30 UTC
marked as a regression
Comment 2 Philip Copeland 2007-11-21 09:25:29 UTC
Not 100% sure but I think this is actually the same issue as this thread

http://marc.info/?l=linux-raid&m=119502458615538&w=2
http://marc.info/?l=linux-raid&m=119503133922974&w=2
http://marc.info/?l=linux-raid&m=119503387426237&w=2
http://marc.info/?l=linux-raid&m=119503393926314&w=2
http://marc.info/?l=linux-raid&m=119508401413362&w=2
http://marc.info/?l=linux-raid&m=119508524014851&w=2
http://marc.info/?l=linux-raid&m=119508524014851&w=2

(which basically revolves around a never ending write issue and some bio* patches to raid5)

I'll poke someone on the list to get an opinion

Phil
=--=
Comment 3 Philip Copeland 2007-11-23 03:04:41 UTC
Actually the same problem shows up in 2.6.22.13 just nowhere near as often.
(damn,..)

Phil
=--=
Comment 4 Alan 2012-05-17 15:08:09 UTC
I'd blame Bryce personally ;-)

Closing as obsolete

Note You need to log in before you can comment on or make changes to this bug.