Bug 13841
Summary: | 2.6.31-rc4 boot failure | ||
---|---|---|---|
Product: | Other | Reporter: | Rafael J. Wysocki (rjw) |
Component: | Other | Assignee: | other_other |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | alan, gene.heskett |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.31-rc4 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 13615 |
Description
Rafael J. Wysocki
2009-07-26 21:41:58 UTC
On Monday 27 July 2009, Gene Heskett wrote:
> On Sunday 26 July 2009, Rafael J. Wysocki wrote:
> >This message has been generated automatically as a part of a report
> >of recent regressions.
> >
> >The following bug entry is on the current list of known regressions
> >from 2.6.30. Please verify if it still should be listed and let me know
> >(either way).
> >
> Yes. I have nuked the odd stuff in my rc.local file, and rebuilt this kernel
> with several mods that I thought might be related, but it is still failing in
> the same manner. The error messages (apparently from lp:, but long after
> cups
> has been started) do NOT make it to the messages log file either, so its
> totally blown up at that point.
>
> Note, that this is a regression from 2.6 31-rc3, which works fine. So the
> thing shouldn't be that hard to find. But in looking over the changelog,
> nothing obviously reaches out and grabs me.
>
> I'm not really equipt to do a bisect here either, my git from F10 is at least
> 2 versions old now. And I'm crippled by a way too small /boot partition
> which
> can't hold more than 12-14 kernels. The disk partitioning tool in F10 is
> nothing short of fscking broken IMO. But fedora isn't interested in that
> either, cuz its existed since at least Fedora 2. Here, fedora is on its way
> out, 64 bit mandriva sure looks nice. And DiskDrake Just Works(TM).
I have at least found the triggering script. It is actually a 2 piece setup, where the 2nd one gets called only if there is something to print. As its been working nicely for an extended period of time (at least 1.5 years), I did not initially suspect that it could be the problem. The 1st script runs as a background daemon, listening to /dev/ttyUSB1, which is an extension usb hub, with both a printer, and a serial adapter plugged into it. Both the printer, and the old computer on the other side of the FTDI rs232 adapter are powered down 99% of the time, which is the present condition. From the meager clues I obtained on this last boot, something has changed in how the kernel or the filesystem handles a query of the while [[ -f ${OutFile} ]] general syntax, returning an I/O error at startup, and the script is looping forever, creating, deleting and re-creating the 25 scratchpad files it uses on a round robin basis. The disk where /tmp lives is being 'exercised' noticeably. That may not be where the error really lives, but its the best I can deduce from the clues I have ATM. With the script killed, the machine is otherwise happily running 2.6.31-rc4 right now. There may be an error in the script, but from 2.6.25 or so, it has been working flawlessly, until 2.6.31-rc4. That tends to make me think something a lot closer to the filesystem core than bash is has changed how it works. More if I get it figured out. Thanks. If some bash script guru wants to look at it, yelp at me. Here is the progenitor line of my script, and an echo statement before it, that results in the I/O error that kills it, only for 2.6.31-rc4, rc3 & many previous kernels over the last 2 years work fine. From the script, lines 37-38: ---------- echo $InDev exec 0< ${InDev} # changes input stream for while read inp below ---------- 'inp' is the bash variable that holds the data captured from $InDev when it comes in, and is supposedly empty/null at that point. Started without the daemonizing '&' as a line terminator: ---------------------- [root@coyote libexec]# /usr/local/libexec/cocod /dev/ttyUSB1 Brother-HL2140 /dev/ttyUSB1 /usr/local/libexec/cocod: line 38: /dev/ttyUSB1: Input/output error ---------------------- So $InDev is valid. The device exists, this listing obtained while booted to 2.6.31-rc4: [root@coyote amanda]# ls -l /dev/ttyUSB* crw-rw---- 1 root uucp 188, 0 2009-07-27 23:06 /dev/ttyUSB0 crw-rw---- 1 root uucp 188, 1 2009-07-27 22:28 /dev/ttyUSB1 This is also an accepted way to permanently redirect an I/O stream in bash, has been used since forever, but 2.6.31-rc4 broke it. Whatever changed that is the regression. Cheers, Gene. Dup of 13821 I think Ok, then if I do a git bisect reset master; git pull, I should have a fix? And indeed it is fixed, its running now. Many thanks, this opne I believe, can be closed. s/opne/one/g |