Bug 14311 - Kernel panic with HighPoint RocketRaid 3120 and High Memory Support 64GB enabled
Kernel panic with HighPoint RocketRaid 3120 and High Memory Support 64GB enabled
Status: RESOLVED CODE_FIX
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA
All Linux
: P1 blocking
Assigned To: Jeff Garzik
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-10-03 06:46 UTC by maierp
Modified: 2012-08-08 09:42 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.32-rc1
Tree: Mainline
Regression: No


Attachments
Kernel config file (78.69 KB, text/plain)
2009-10-03 06:46 UTC, maierp
Details
Image 1 from the boot process (85.47 KB, image/jpeg)
2009-10-03 06:52 UTC, maierp
Details
Image 2 from the boot process (87.53 KB, image/jpeg)
2009-10-03 06:52 UTC, maierp
Details
Image 3 from the boot process (86.15 KB, image/jpeg)
2009-10-03 06:52 UTC, maierp
Details
Image 4 (last) from the boot process (82.62 KB, image/jpeg)
2009-10-03 06:53 UTC, maierp
Details
sb600-32bit-only.patch (415 bytes, patch)
2009-10-03 08:19 UTC, Tejun Heo
Details | Diff
Output of dmidecode from the 2.6.30 kernel with 4G HighMemorySupport (18.60 KB, text/plain)
2009-10-03 09:20 UTC, maierp
Details
sb600-32bit-only-by-default.patch (3.48 KB, patch)
2009-10-03 09:29 UTC, Tejun Heo
Details | Diff
Last screen from patched kernel boot (106.87 KB, image/jpeg)
2009-10-03 09:53 UTC, maierp
Details
Kernel-logs with dmesg (41.51 KB, application/octet-stream)
2009-10-03 10:27 UTC, maierp
Details
Output of "lspci -nn" (2.26 KB, application/octet-stream)
2009-10-03 10:27 UTC, maierp
Details
hptiop-32bit.patch (660 bytes, patch)
2009-10-04 01:20 UTC, Tejun Heo
Details | Diff
hptiop-no-64bit-dma.patch (898 bytes, patch)
2009-12-24 07:37 UTC, Tejun Heo
Details | Diff
Fix rr312x 64bit dma error (2.67 KB, patch)
2009-12-25 08:04 UTC, linux
Details | Diff

Description maierp 2009-10-03 06:46:44 UTC
Created attachment 23239 [details]
Kernel config file

Hi,
I've a HighPoint RocketRaid 3120 controller installed in my system with a raid 0 and 16GB of ram in the computer.
Kernels with the High Memory Support = 4G are working.
But when I build a custom kernel with High Memory Support = 64G enabled, the system crashes while booting with "Kernel panic".

I've attached the last screens of the boot progress and the kernel config file.

Here the ouput of lspci:
===
00:00.0 Host bridge: ATI Technologies Inc RD780 Northbridge only dual slot 
PCI-e_GFX and HT1 K8 part
00:02.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (external 
gfx0 port A)
00:03.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (external 
gfx0 port B)
00:0a.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI 
express gpp port F)
00:12.0 SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA
00:13.0 USB Controller: ATI Technologies Inc SB600 USB (OHCI0)
00:13.1 USB Controller: ATI Technologies Inc SB600 USB (OHCI1)
00:13.2 USB Controller: ATI Technologies Inc SB600 USB (OHCI2)
00:13.3 USB Controller: ATI Technologies Inc SB600 USB (OHCI3)
00:13.4 USB Controller: ATI Technologies Inc SB600 USB (OHCI4)
00:13.5 USB Controller: ATI Technologies Inc SB600 USB Controller (EHCI)
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 14)
00:14.1 IDE interface: ATI Technologies Inc SB600 IDE
00:14.3 ISA bridge: ATI Technologies Inc SB600 PCI to LPC Bridge
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control
01:00.0 VGA compatible controller: nVidia Corporation G72 [GeForce 7300 SE/7200 GS] (rev a1)
02:00.0 RAID bus controller: HighPoint Technologies, Inc. Device 3120 (rev 02)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)
===

Thanks a lot for helping
Patrick
Comment 1 maierp 2009-10-03 06:52:13 UTC
Created attachment 23240 [details]
Image 1 from the boot process
Comment 2 maierp 2009-10-03 06:52:33 UTC
Created attachment 23241 [details]
Image 2 from the boot process
Comment 3 maierp 2009-10-03 06:52:50 UTC
Created attachment 23242 [details]
Image 3 from the boot process
Comment 4 maierp 2009-10-03 06:53:10 UTC
Created attachment 23243 [details]
Image 4 (last) from the boot process
Comment 5 Tejun Heo 2009-10-03 08:19:29 UTC
Created attachment 23244 [details]
sb600-32bit-only.patch

Does this patch fix the problem?  Also, can you please post the output of dmidecode?
Comment 6 maierp 2009-10-03 09:20:09 UTC
Created attachment 23245 [details]
Output of dmidecode from the 2.6.30 kernel with 4G HighMemorySupport
Comment 7 Tejun Heo 2009-10-03 09:29:19 UTC
Created attachment 23247 [details]
sb600-32bit-only-by-default.patch

Please test this one instead.
Comment 8 maierp 2009-10-03 09:53:50 UTC
Created attachment 23248 [details]
Last screen from patched kernel boot
Comment 9 Tejun Heo 2009-10-03 10:04:36 UTC
Oh, this one is not about the sb600 controller.  My bad.  Can you please post successful kernel boot log with the 4G kernel?  Also, please post the output of "lspci -nn".  thanks.
Comment 10 maierp 2009-10-03 10:24:42 UTC
Sorry that patch did not work here.
But I'm now one week in vacation, so I can restart the server when I'm back next saturday.
I'll read this during the vacation and can give output from the running system, but can't reboot the system until I'm back.
For now many thanks for the fast response.

Greets
Patrick
Comment 11 maierp 2009-10-03 10:27:11 UTC
Created attachment 23250 [details]
Kernel-logs with dmesg
Comment 12 maierp 2009-10-03 10:27:37 UTC
Created attachment 23251 [details]
Output of "lspci -nn"
Comment 13 Tejun Heo 2009-10-04 01:20:41 UTC
Created attachment 23254 [details]
hptiop-32bit.patch

Please test this patch.
Comment 14 maierp 2009-10-05 19:37:55 UTC
This patch works for me.
Thank you very much!
Comment 15 Tejun Heo 2009-10-06 05:32:02 UTC
The question, now, is whether it's the motherboard or the controller.  Any chance you can try sil3132 or 3124 controller in the same slot?
Comment 16 maierp 2009-10-06 12:13:24 UTC
Sorry, but I've no such controller. The HighPoint is the only RAID controller I have.
Comment 17 Tejun Heo 2009-10-13 03:10:29 UTC
Eh... the problem is that I can't tell which part to blacklist.  Can you please attach the output of "dmidecode', "lspci -nnvvv" and "lspci -tnnv".

Also, can you be persuaded into buying a sil 3132 controller and try it in the same slot?  It'll cost between 20 and 30USD and I can pay you via paypal if you wish.

Shane, is there any known problem with 64bit DMA on these configurations?  Could we be looking at a bridge / host controller problem?

Thanks.
Comment 18 Shane Huang 2009-10-13 14:08:21 UTC
Tejun,

"dmidecode" was already provided in comment #6 by maierp.

Except for the SB600 SATA 64 DMA issue we discussed before:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2fcad9d27168b287e3db61f6694254e0afa32f8c
I do NOT know other 64 bit DMA problem on these configurations,
especially for High Point RAID controller.

I can not find such RAID controller here, neither the board
GA-MA790X-DS4. So I'm afraid I can not help much on this issue...

Shane
Comment 19 Tejun Heo 2009-10-13 14:31:57 UTC
Shane, thanks for the input.  One question tho.  Are you sure that SB600 ahci DMA problem is caused by the ahci controller not by the pci host or bridge controller?

Thanks.
Comment 20 Shane Huang 2009-10-14 05:42:17 UTC
Tejun,

No, I'm not sure.
As you know, although it is related to different BIOS releases on ASUS M2A-VM,
we do NOT find the root cause for the SB600 SATA 64 bit DMA issue, our HW
engineer told me that we didn't see any SB600 SATA 64bit DMA design issue.

As to potential bridge/host controller problem, after check with other guys,
we have NOT heard of such issue either.

So, trying one different RAID controller on the same platform should help more.
Comment 21 Tejun Heo 2009-10-14 05:50:44 UTC
Thanks for the comment, Shane.  maierp, can you please try another 64bit capable controller at that slot?
Comment 22 maierp 2009-10-21 17:29:34 UTC
Tejun,
is this a controller with the right chipset?
http://www.planet4one.de/planet/wbcdirect.php?pid=74799
(DeLock SATA II PCI Express Card, 2 Port (70137))
I think this has a SIL 3121 chipset.
Comment 23 Tejun Heo 2009-10-26 15:14:56 UTC
Yeap, that's a sil3132.
Comment 24 maierp 2009-12-10 21:28:17 UTC
So the last days I had time to check it again.
I booted from one hdd with the DeLock card with the sil3132 chip and 64GB high memory support and with 16GB RAM inserted. It worked. I than created a 10GB random file and copyed it 3 times to the same hdd. The files all have the same MD5 checksum. So I think this works.
With the HighPoint RocketRaid 3120, the system failed to boot.

BUT when replacing the 4x4GB RAM with 4x2GB = 8GB RAM the RocketRaid 3120 also boots without problems with the same kernel.

The RAM is working, I've made a MemTest.

Greets
Patrick
Comment 25 Tejun Heo 2009-12-15 04:39:07 UTC
maierp, can rocketraid copy large files without error on 8GB configuration too?
Comment 26 maierp 2009-12-16 09:00:23 UTC
Yes, there were no errors. All copied files have the same md5sum.
This check was done with 2x4GB = 8GB and 64GB high memory support and 2.6.32
Comment 27 Tejun Heo 2009-12-21 08:20:10 UTC
Thanks for verifying.  Pinging hpt again.
Comment 28 linux 2009-12-21 21:38:23 UTC
(In reply to comment #27)
> Thanks for verifying.  Pinging hpt again.

Dear Tejun Heo, please visit the HighPoint website, www.highpoint-tech.com, and download the firmware package (v1.2.25.8) for the RocketRAID 3120 controller from the Support section of the website.  Let us know the results are you have finished testing. 

Thank you

HighPoint
Comment 29 Tejun Heo 2009-12-22 00:01:36 UTC
maierp, can you please try the newer firmware?  Also which firmware version are you currently on?

Thanks.
Comment 30 maierp 2009-12-23 15:56:29 UTC
(In reply to comment #29)
> Also which firmware version are you currently on?

It already has this "new" firmware v1.2.25.8
Comment 31 linux 2009-12-24 02:59:03 UTC
Firmware v1.2.25.8 fixed 64 bit DMA issue. But this firmware can't support >12G memory if 64 bit DMA enabled.
Comment 32 Tejun Heo 2009-12-24 07:37:00 UTC
Created attachment 24282 [details]
hptiop-no-64bit-dma.patch

Then, the driver shouldn't mark the device as 64bit capable because it will break on larger machine which will become more and more common.  I guess something like this patch is in order?

Thanks.
Comment 33 linux 2009-12-25 08:04:23 UTC
Created attachment 24303 [details]
Fix rr312x 64bit dma error
Comment 34 linux 2009-12-25 08:06:32 UTC
Comment on attachment 24303 [details]
Fix rr312x 64bit dma error

Only RR312x has 64bit dma issue. Please use this patch.
Comment 35 Tejun Heo 2009-12-25 13:08:54 UTC
Looks good to me but you're the maintainer of the driver.  Can you guys please push the patch upstream and to -stable?

Thanks.
Comment 36 Florian Mickler 2012-08-04 19:07:03 UTC
A patch referencing this bug report has been merged in Linux v3.6-rc1:

commit 23f0bb47a4ec4c662b2bbf0221d6289e91b06ece
Author: HighPoint Linux Team <linux@highpoint-tech.com>
Date:   Thu Jun 14 08:47:07 2012 +0100

    [SCSI] hptiop: fix RR312x in hosts with >12GB

Note You need to log in before you can comment on or make changes to this bug.