Bug 8861 - Intel ICH9 + any version of amule = total freeze of Debian after several hours
Summary: Intel ICH9 + any version of amule = total freeze of Debian after several hours
Status: CLOSED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Greg Kroah-Hartman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-08-08 01:27 UTC by Guy Debord
Modified: 2007-08-21 07:29 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.23-rc2
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Guy's dmesg (30.18 KB, text/plain)
2007-08-08 11:38 UTC, Adrian Bunk
Details
From /var/log/Kern.log (59.09 KB, text/plain)
2007-08-17 01:42 UTC, Guy Debord
Details

Description Guy Debord 2007-08-08 01:27:34 UTC
Also happened with 2.6.22. Kernel 2.6.21 was not really compatible with ICH9 and G33 anyway.

Distribution: Debian Lenny
Hardware Environment: Seasonic S12 power supply (I tried another one before: changes nothing) - Asus P5K-VM (motherboard) - 4 GB - Core2Duo - D-Link DFE-530TX (ethernet card. I do not use the integrated one because it works very badly at the fsb speed of 400 I now use. I have no problem at this fsb speed: memtest all night long reports zero error. fsck of all my partitions: zero problem. Ktorrent for days: zero problem) - SATA Harddrives
Software Environment: KDE 3.5.7 - Amule
Problem Description: Debian totally freezes and I have to reboot several hours after starting amule. Did not happen with the same exact software and the same files on the exact same harddrive with my previous motherboard: Asus P5B-VM. I purged amule, installed other versions, Debian still freezes. The freezes come 8 times faster if I use tcp (edonkey protocol) than if I use only udp (Kademlia protocol), and also come faster if I have a lot of active downloads than if I have only a few. I talked with amule developers and we agreed that the problem does not come from amule.

Steps to reproduce: use an Asus P5K-VM motherboard (or a motherboard with Intel ICH9 maybe), launch amule, download 20 or more big files, and wait.
Comment 1 Adrian Bunk 2007-08-08 09:56:11 UTC
- are there BIOS updates for your motherboard available?
- 2.6.23-rc2 still fails?
- please attach the output of "dmesg -s 1000000" to this bug
- can you describe what "it works very badly at the fsb speed of 400 I now use" is?
- "totally freezes" = even the magic SysRq key no longer works?
Comment 2 Guy Debord 2007-08-08 11:34:41 UTC
Hello Adrian!

Thank you for your answer!

- I know there is at least one bios update available. I am going to install it in the next few hours, so I should have an update on my situation within 30 hours.

- Yes, 2.6.23-rc2 still fails with amule after some hours. It happened again several times.

- The integrated internet chip (Marvell 88E8056) works when it wants to at fsb 400. The internet connection does not work more than a few minutes, and even typing dhclient in a console to get the connection working again works randomly.

- Yes, totally freezes means all the famous combinations with the Syst key do not work.

There is the .txt file containing the result of dmesg -s 1000000:
http://www.keepmyfile.com/download/7d52111793285
Comment 3 Adrian Bunk 2007-08-08 11:38:32 UTC
Created attachment 12323 [details]
Guy's dmesg

Please always attach data here at the Bugzilla.
Comment 4 Guy Debord 2007-08-08 12:07:51 UTC
Sorry, I did not see it was possible to attach files here.
I updated the bios of my Asus P5K-VM and I started amule. Now I am waiting.
So far, the only change I saw is the Core2Duo temperature being 10°C lower.
Comment 5 Guy Debord 2007-08-09 02:46:32 UTC
I got a total freeze with the new bios too, after 13 hours of amule activity.
I get freezes whether I am in front of my PC or not.
The freeze also happens when I download 0 file (0 file in the Temp folder), just by sharing files.
Now I brought the fsb down to 266 to see what happens.
Comment 6 Guy Debord 2007-08-09 22:49:43 UTC
Same freeze at fsb 266, after less than 24 hours, as usual.
Comment 7 Adrian Bunk 2007-08-11 12:49:31 UTC
The dmesg you attached was from 2.6.23-rc2?

I don't have any real clue, just suggestions that might help with finding the source of the problem.

Does enabling as many debugging options as possible (in the "kernel hacking" menu when configuring the kernel) result in any information?

Does booting with "acpi=off noapic" make a difference?

Please try to narrow the source of the problem a bit down:
- does it still happen of all amule files are on a RAM filesystem?
- do you have a spare network card you can try instead of the D-Link one?
- does it help to use a partition with a different filesystem?
Comment 8 Guy Debord 2007-08-11 14:53:59 UTC
Thank you for helping me Adrian!

Yes the dmesg was from 2.6.23-rc2.

Setting apic to disabled in the bios results in a catastrophic boot sequence, with several error messages.

I will try to answer you other questions after some googling, tomorrow or on monday since I have a lot to do tomorrow.
Comment 9 Guy Debord 2007-08-15 04:37:21 UTC
I put the home/me/.aMule directory in tmpfs (/dev/shm) and thanks to a link in my home amule worked alright, I verified it was really in RAM, I saw the percentage of occupation of /dev/shm grow and I kept it under 100% (under 2 GB), and still I ended up with a total freeze of my Debian (when dev/shm was around 50% occupied), after less than 24 hours, as usual.
I also noted that by closing amule regularly, every 8 hours, my Debian lasted for 3 days without freeze, so it is clear that closing amule resets the cumulative problem to zero.
I am now trying amule with the integrated Marvell 88E8056 ethernet chip of my P5K-VM; I had to go back to kernel 2.6.21 to do that because it does not work with kernels 2.6.22 and 2.6.23-rc2. I must write to Stephen Humminger about that. I now realized that the connection did not last more than 30 minutes. With more recent kernels it lasts only 5 minutes. Very strange problem. So testing amule with this integrated ethernet chip may not be a good solution, which means I should buy another ethernet card...

Tomorrow I will enable more debugging options in the kernel, and I will try a partition wit a different filesystem.
Comment 10 Guy Debord 2007-08-15 15:19:41 UTC
I removed my PCI ethernet card, and this time my integrated Marvell ethernet card worked long enough for me to witness a total freeze an hour after starting amule.
So the problem does not come from the ethernet chips.
Comment 11 Adrian Bunk 2007-08-15 18:24:32 UTC
Greg, can you look at this bug?

Reproducible complete freeze after some time of amule usage.

Reproduced with data in tmpfs and reproduced with two different NICs.

Perhaps some kind of PCI problem?

dmesg is in comment #3.

TIA
Comment 12 Guy Debord 2007-08-15 23:56:12 UTC
I had another freeze during the night with the integrated NIC (with kernel 2.6.23-rc2; the previous one with this same integrated NIC was with kernel 2.6.22), and this time I saw there was no upload or download in amule at the time of the freeze. The files I was trying to download had a dozen sources but amule was not downloading from these sources at the time, there was zero client in the upload section, and the global download and upload speeds in amule were 0.
Comment 13 Guy Debord 2007-08-17 01:42:31 UTC
Created attachment 12415 [details]
From /var/log/Kern.log
Comment 14 Guy Debord 2007-08-17 02:08:45 UTC
There were new errors messages in kern.log (see the attachment in the previous message), when I was using the integrated Marvell NIC, probably linked to the two total freezes of Debian I experienced...without using amule. It never happened with the D-Link NIC when I had it in the PCI port. When I use one NIC I remove the other (physically or in the bios).
I went back to my PCI D-Link NIC because with the Marvell NIC my internet connection gets disconnected all the time...which does not happen with the latest Knoppix 5.2, which uses an old kernel 2.6.19.5: it recognizes neither my LCD screen, nor my integrated sound chip, nor my harddisks, but internet runs smoothly for hours with the Marvell NIC.
Going back to my D-Link PCI NIC, I noticed that once in KDE, it took at least 5 minutes for internet to work. First dhclient in a root console did not work. I unplugged my modem. Did a dhclient again after 2 minutes. Then some 2 minutes later internet worked. I saw that before.

With this Asus P5K-VM motherboard I experience the same kind of annoying bug I had with the P5B-VM for months before it got resolved by a recent kernel: at one point Debian believes there is an audio CD in my empty Benq DVD-burner, and then I cannot use this DVD-burner anymore, unless I reboot...and I have to use it soon after getting into KDE.

I am now using KMlDonkey instead of amule to see if I will experience the same total freezes of my Debian.

I also had this error a few times in kern.log:

attempt to access beyond end of device
sda5: rw=0, want=134230088, limit=19534977

Each time only the value for want= changed.

I wonder if my problems may come from my Corsair CAS5 memory. It is not in the list of the memory modules recommended by Asus for this motherboard. The CAS4 version is, though, but uses entirely different ram chips. I had no problem with that ram with my P5B-VM motherboard, and I did a memtest yesterday again, and still no errors on my current P5K-VM motherboard. But after some googling I read that someone, not with the same motherboad as mine, had lots of problems that were solved by replacing his ram modules (that showed no error in memtest) with recommended ones. I hesitate to spend some more money on that problematic motherboard. I may rather go back to a P5B-VM, instead of experiencing problems for many months.
Comment 15 Guy Debord 2007-08-17 22:46:33 UTC
I experienced a total freeze of my Debian by using KMLdonkey instead of amule, after less than 24 hours, as usual.
Comment 16 Guy Debord 2007-08-21 07:28:31 UTC
The problem seems to have solved itself with I do not know which software update I did in aptitude a couple of days ago.
amule crashed without crashing my Debian.
And now I have amule running for more than 24 hours.
Sorry for taking some of your time!

Note You need to log in before you can comment on or make changes to this bug.