Bug 14893

Summary: Tasks stuck on I/O on armel with 2.6.32-rc8
Product: IO/Storage Reporter: Rafael J. Wysocki (rjw)
Component: OtherAssignee: io_other
Status: CLOSED UNREPRODUCIBLE    
Severity: normal CC: dkg
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32-rc8 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 14230    
Attachments: console log from second identical hang

Description Rafael J. Wysocki 2009-12-27 22:45:12 UTC
Subject    : scheduling failure on armel with 2.6.32-rc8
Submitter  : Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Date       : 2009-11-24 23:05
References : http://marc.info/?l=linux-kernel&m=125910438530554&w=4

This entry is being used for tracking a regression from 2.6.31.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Rafael J. Wysocki 2010-01-05 21:29:10 UTC
On Tuesday 05 January 2010, Daniel Kahn Gillmor wrote:
> Hi Rafael--
> 
> On 12/29/2009 10:28 AM, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.31 and 2.6.32.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.31 and 2.6.32.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=14893
> > Subject             : Tasks stuck on I/O on armel with 2.6.32-rc8
> > Submitter   : Daniel Kahn Gillmor <dkg@fifthhorseman.net>
> > Date                : 2009-11-24 23:05 (36 days old)
> > References  : http://marc.info/?l=linux-kernel&m=125910438530554&w=4
> 
> The machine in question has been up continuously (though not under heavy
> load) since it was rebooted after the reported failure.  it has even
> withstood additional mpd index updates, all running the same kernel it
> crashed with before.
> 
> So i can't say that the problem is resolved, but i haven't replicated it
> either.  Wish i had more interesting details to report.
Comment 2 Daniel Kahn Gillmor 2010-01-19 19:57:25 UTC
Created attachment 24640 [details]
console log from second identical hang

well, the machine hung again.  this time it was in the middle of a fairly large "aptitude full-upgrade" run, and short on disk space.

it was still responding to pings over the network, but was not responsive for things like ssh (an active ssh session was totally hung, and new attempts to connect to port 22 yielded no response).

I connected to the console when this happened, and SysRq'ed it until i got some info.  attached is the console log.

Even the SysRq reset failed to fully reset the machine, and ultimately, i was forced to pull the power plug to get it restarted.

You can see that happen in the log where it says "SysRq: Resetting" followed by a null byte, followed by the redboot headers.

i'm unable to re-open this bug myself, but it has certainly been reproduced (though i haven't sorted out how to make it hang deliberately yet).