Bug 9631

Summary: 2.6.24-rc6 2.6.24-rc3 qstor timeouts during probe
Product: IO/Storage Reporter: Alan Young (ayoung)
Component: Serial ATAAssignee: Mark Lord (mlord)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: bunk, mingo, mlord
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24-rc6 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9243    
Attachments: dmesg from 2.6.24-rc2 boot
-rc3 dmesg log
-rc6 dmesg log
sata_qstor-use-hrst.patch

Description Alan Young 2007-12-24 15:17:39 UTC
Most recent kernel where this bug did not occur: 2.6.24.rc2

Distribution: fedora

Hardware Environment: AMD Athlon 64x2, Qmaster 2 port (U-30300)

Software Environment: 32 bit mode

Problem Description: Qstor fails to find all devices during boot.  There are three Maxtor 320GB drives on three of the four connectors.  The driver finds one drive, but the other two ports fail to probe correctly and error out with timeouts.

My production 2.6.22 kernel and a build of the 2.6.24-rc2 kernel finds all three devices.  It fails with -rc3 and -rc6.

I'll add dmesgs from the various tests.
Comment 1 Alan Young 2007-12-24 15:18:45 UTC
Created attachment 14174 [details]
dmesg from 2.6.24-rc2 boot
Comment 2 Alan Young 2007-12-24 15:19:04 UTC
Created attachment 14175 [details]
-rc3 dmesg log
Comment 3 Alan Young 2007-12-24 15:19:27 UTC
Created attachment 14176 [details]
-rc6 dmesg log
Comment 4 Tejun Heo 2008-01-01 23:50:04 UTC
Created attachment 14259 [details]
sata_qstor-use-hrst.patch

Please try this patch.
Comment 5 Ingo Molnar 2008-01-02 02:45:50 UTC
in case Tejun's patch does not do the trick.

there are 650 commits between rc2 and rc3, so you might be able to pinpoint the exact commit that causes the problem, by doing a bisection run of about ~10 kernel rebuilds and reboots.

git-bisection can take quite some time though. Here's an link that explains it:

http://kernel.org/pub/software/scm/git/docs/v1.4.4.4/howto/isolate-bugs-with-bisect.txt

here's a quickstart:

git-clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
linux-2.6.git
cd linux-2.6.git
git-bisect start
git-bisect good v2.6.24-rc2
git-bisect bad v2.6.24-rc3

then build the kernel that is suggested and boot into it - and check whether the
kernel worked fine or not. If the kernel worked then do this:

  git-bisect good

if it does not work then do this:

  git-bisect bad

repeat this, then after 10-15 rebuilds and reboots [ ouch! :-/ ] you'll finally
arrive to a point where git-bisect outputs a "bad commit" message to you. Paste
that result into this bugzilla. (also paste the contents of "git-bisect log" so
that we can see your bisection results)

you can then quit bisection via:

  git-bisect reset

and can utilize the git tree to track the latest upstream kernel by doing "git-pull"
in the linux-2.6.git directory.

NOTE: if you start seeing a long stream of 'good' or 'bad' bisection points,
chance is that you mis-identified one of the earlier bisection points. In that case
you can always repeat the 'git-log' output up to the suspected mis-identification
point - no need to redo the whole bisection.
Comment 6 Mark Lord 2008-01-02 07:12:29 UTC
git bisect would be a horrible waste of time for the bug reporter on this one.

There are only about 4 updates to sata_qstor, and Tejun has already identified the only one that could cause this problem.

Try the patch from Tejun, and report back ASAP, please!
Comment 7 Alan Young 2008-01-02 16:36:33 UTC
Tejun's patch does do the trick.  The system boots without any time outs.  All three drives are recognized.  The drives are actually a software RAID-5 array which mdadm assembles ok.  And a fsck -f of the array's partition shows no errors.  Please let me know if you need anything else.

I'll save that git-bisect procedure away for future use. :-)

Thanks!
Comment 8 Tejun Heo 2008-01-03 07:24:00 UTC
Patch posted.  Feel free to close.  Thanks.
Comment 9 Adrian Bunk 2008-01-03 07:26:56 UTC
Please keep it open until the patch is in Linus' tree (this should enable more people to note in case it would be forgotten).
Comment 10 Mark Lord 2008-01-03 07:32:40 UTC
Changing status to resolved, but leaving it open until we see the patch in Linus's tree.

And.. wow.. somebody else out there actually uses a QStor card with Linux!
That makes (at least) two of us now.  :)
Comment 11 Adrian Bunk 2008-01-11 08:27:27 UTC
The fix is now commit b14dabcdb651ddd9f85c69c9042322c139e7da84 in Linus' tree.