Bug 12272 - at random rmmod/insmod corrupts filesystem
Summary: at random rmmod/insmod corrupts filesystem
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Other
Classification: Unclassified
Component: Modules (show other bugs)
Hardware: All Linux
: P1 low
Assignee: other_modules
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-22 06:01 UTC by Folkert van Heusden
Modified: 2010-01-25 14:18 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.26 and 2.6.27
Tree: Mainline
Regression: No


Attachments

Description Folkert van Heusden 2008-12-22 06:01:40 UTC
Latest working kernel version: 2.6.18
Earliest failing kernel version: 2.6.26
Distribution: Debian
Hardware Environment: P4 (with HT), IDE disk (PATA), 512MB ram
Software Environment: http://vanheusden.com/pyk/
Problem Description: filesystem corrupted:
[85202.195563] rtc0: alarms up to one month, y3k, hpet irqs
[85204.035802] journal_bmap: journal block not found at offset 2060 on dm-0
[85204.035818] Aborting journal on device dm-0.
[85311.242093] ext3_abort called.
[85311.242120] EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal
[85311.242154] Remounting filesystem read-only
[86331.285847] EXT3-fs error (device dm-0): htree_dirblock_to_tree: bad entry in directory #24568: rec_len % 4 != 0 - offset=0, inode=4098364138, rec_len=59542, name_len=76
[89931.353863] EXT3-fs error (device dm-0): htree_dirblock_to_tree: bad entry in directory #24568: rec_len % 4 != 0 - offset=0, inode=4098364138, rec_len=59542, name_len=76
                                                                                                                                                                                                   
[92499.263276] attempt to access beyond end of device
[92499.263296] dm-2: rw=17, want=4177066240, limit=6004736
[92499.263311] Buffer I/O error on device dm-2, logical block 522133279
[92499.263333] lost page write due to I/O error on dm-2
[92499.263341] Aborting journal on device dm-2.
[92499.263504] ext3_abort called.
[92499.263515] EXT3-fs error (device dm-2): ext3_journal_start_sb: Detected aborted journal
[92499.263543] Remounting filesystem read-only
                                                                                                                                                                                                   
[93531.419902] EXT3-fs error (device dm-0): htree_dirblock_to_tree: bad entry in directory #24568: rec_len % 4 != 0 - offset=0, inode=4098364138, rec_len=59542, name_len=76
 
Steps to reproduce: run pyk script and run something like a git clone of the mainline kernel git tree, rm -rf the tree, touch /forcefsk, reboot
Comment 1 Roland Kletzing 2008-12-26 04:11:07 UTC
please provide the output of the script and/or post a list of your modules. 

do you see any warnings/oopses in dmesg while running it ?
Comment 2 Theodore Tso 2008-12-26 06:16:21 UTC
What does e2fsck report when you try running e2fsck on the filesystem?   Most of the errors indicate filesystem corruption which e2fsck should have complained vociferously about, and which it should have been able to fix if run manually.

I can't tell how big the filesystem is (I'd need the output of dumpe2fs /dev/hdXXX) to detect that, but this:

[92499.263311] Buffer I/O error on device dm-2, logical block 522133279

Indicates either a hardware error, or a corrected journal inode.   In the latter case, e2fsck would have detected the problem, and offered to fix it.  In the former case, this isn't a kernel bug, but rather a hardware problem....
Comment 3 Folkert van Heusden 2008-12-28 06:47:35 UTC
(In reply to comment #1)
> please provide the output of the script and/or post a list of your modules. 
> do you see any warnings/oopses in dmesg while running it ?

No oopses, no warnings.
Only odd messages I see are:
[  487.377387] bio too big device hda5 (8 > 0)
[  487.377860] bio too big device hda5 (8 > 0)

will reboot tomorrow to see what filesystem errors there are
Comment 4 Folkert van Heusden 2009-01-05 11:52:04 UTC
(In reply to comment #1)
> please provide the output of the script and/or post a list of your modules. 

The system starts with the following modules:
ac                      3264  0 
battery                 6272  0 
ipv6                  234724  12 
loop                   12812  0 
snd_intel8x0           26332  0 
snd_ac97_codec         89220  1 snd_intel8x0
i2c_i801                8336  0 
ac97_bus                1728  1 snd_ac97_codec
snd_pcm                63108  2 snd_intel8x0,snd_ac97_codec
i2c_core               20692  1 i2c_i801
snd_timer              18056  1 snd_pcm
snd                    45828  4 snd_intel8x0,snd_ac97_codec,snd_pcm,snd_timer
soundcore               6528  1 snd
snd_page_alloc          7400  2 snd_intel8x0,snd_pcm
floppy                 47812  0 
pcspkr                  2432  0 
iTCO_wdt                9668  0 
rng_core                4004  0 
parport_pc             22660  0 
parport                31180  1 parport_pc
shpchp                 25204  0 
pci_hotplug            23680  1 shpchp
container               3488  0 
button                  6096  0 
intel_agp              22844  1 
agpgart                29800  1 intel_agp
evdev                   8416  0 
joydev                  8608  0 
ext3                  106024  6 
jbd                    40820  1 ext3
mbcache                 7268  1 ext3
dm_mirror              15264  0 
dm_log                  8516  1 dm_mirror
dm_snapshot            15140  0 
dm_mod                 46696  16 dm_mirror,dm_log,dm_snapshot
ide_cd_mod             27172  0 
ide_disk               10592  3 
cdrom                  30016  1 ide_cd_mod
piix                    5864  2 
ide_core               84468  3 ide_cd_mod,ide_disk,piix
usbhid                 36000  0 
hid                    33792  1 usbhid
ff_memless              4456  1 usbhid
ata_generic             4676  0 
libata                144480  1 ata_generic
scsi_mod              130412  1 libata
dock                    8368  1 libata
e1000                 104708  0 
ehci_hcd               29132  0 
uhci_hcd               18864  0 
usbcore               120176  4 usbhid,ehci_hcd,uhci_hcd
thermal                15388  0 
processor              33516  1 thermal
fan                     4356  0 
thermal_sys            10760  3 thermal,processor,fan


> do you see any warnings/oopses in dmesg while running it ?

No. After a short while I see the following output:

[  209.038312] attempt to access beyond end of device
[  209.038377] dm-0: rw=0, want=1279882228, limit=565248   
[  209.038458] Buffer I/O error on device dm-0, logical block 639941113
[  209.038525] attempt to access beyond end of device
[  209.038585] dm-0: rw=0, want=5069976612, limit=565248
[  209.038643] Buffer I/O error on device dm-0, logical block 2534988305
[  209.038710] attempt to access beyond end of device
[  209.038763] dm-0: rw=0, want=2559708832, limit=565248
[  209.038816] Buffer I/O error on device dm-0, logical block 1279854415
[  209.038873] attempt to access beyond end of device
[  209.038936] dm-0: rw=0, want=877454918, limit=565248
[  209.038989] Buffer I/O error on device dm-0, logical block 438727458
[  209.039049] attempt to access beyond end of device
[  209.039102] dm-0: rw=0, want=616859760, limit=565248
[  209.039188] attempt to access beyond end of device
[  209.039242] dm-0: rw=0, want=1279882228, limit=565248
[  209.039294] Buffer I/O error on device dm-0, logical block 639941113
[  209.039350] attempt to access beyond end of device
[  209.039403] dm-0: rw=0, want=5069976612, limit=565248
[  209.039465] Buffer I/O error on device dm-0, logical block 2534988305
[  209.039522] attempt to access beyond end of device
[  209.039575] dm-0: rw=0, want=2559708832, limit=565248
[  209.039629] Buffer I/O error on device dm-0, logical block 1279854415
[  209.039694] attempt to access beyond end of device
[  209.039747] dm-0: rw=0, want=877454918, limit=565248
[  209.039799] Buffer I/O error on device dm-0, logical block 438727458
[  209.041515] processor: Unknown symbol thermal_cooling_device_register
[  209.043820] processor: Unknown symbol thermal_cooling_device_unregister
[  209.110502] attempt to access beyond end of device
[  209.110571] dm-0: rw=0, want=1279882228, limit=565248
[  209.110656] Buffer I/O error on device dm-0, logical block 639941113
[  209.110723] attempt to access beyond end of device
[  209.110785] dm-0: rw=0, want=5069976612, limit=565248
[  209.110840] Buffer I/O error on device dm-0, logical block 2534988305
[  209.110908] attempt to access beyond end of device
[  209.110963] dm-0: rw=0, want=2559708832, limit=565248
[  209.111017] attempt to access beyond end of device
[  209.111073] dm-0: rw=0, want=877454918, limit=565248

the modules then loaded are:

floppy                 47812  0 
i2c_i801                8336  0 
snd_intel8x0           26332  0 
button                  6096  0 
dm_snapshot            15140  0 
usbhid                 36000  0 
shpchp                 25204  0 
pci_hotplug            23680  1 shpchp
netconsole              7360  0 
configfs               21944  2 netconsole
ipv6                  234724  12 
snd_ac97_codec         89220  1 snd_intel8x0
ac97_bus                1728  1 snd_ac97_codec
snd_pcm                63108  2 snd_intel8x0,snd_ac97_codec
i2c_core               20692  1 i2c_i801
snd_timer              18056  1 snd_pcm
snd                    45828  4 snd_intel8x0,snd_ac97_codec,snd_pcm,snd_timer
soundcore               6528  1 snd
snd_page_alloc          7400  2 snd_intel8x0,snd_pcm
iTCO_wdt                9668  0 
parport_pc             22660  0 
parport                31180  1 parport_pc
intel_agp              22844  1 
agpgart                29800  1 intel_agp
ext3                  106024  6 
jbd                    40820  1 ext3
mbcache                 7268  1 ext3
dm_mirror              15264  0 
dm_log                  8516  1 dm_mirror
dm_mod                 46696  16 dm_snapshot,dm_mirror,dm_log
ide_disk               10592  3 
piix                    5864  4294967295 
ide_core               84468  2 ide_disk,piix
hid                    33792  1 usbhid
ff_memless              4456  1 usbhid
libata                144480  0 
scsi_mod              130412  1 libata
dock                    8368  1 libata
e1000                 104708  0 
ehci_hcd               29132  0 
uhci_hcd               18864  0 
usbcore               120176  4 usbhid,ehci_hcd,uhci_hcd
Comment 5 Folkert van Heusden 2009-01-05 12:11:17 UTC
After that I tried creating a file in each filesystem. When I hit /var I got the following dmesg error:

[  335.006864] journal_bmap: journal block not found at offset 524 on dm-0
[  335.006927] Aborting journal on device dm-0.
[  353.624981] ext3_abort called.
[  353.625058] EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal
[  353.625211] Remounting filesystem read-only

After that no file was accessible. E.g.:
debian:/home/folkert# umount /var
bash: /bin/umount: cannot execute binary file
Comment 6 Roland Kletzing 2009-01-05 13:50:13 UTC
the perlscript does modprobe, rmmod in a loop. 
typically it`s insmod/rmmod  or modprobe/modprobe -r

don`t know, but i would try modprobe -r instead of rmmod - just to see if it makes a difference.

furthermore, any chance to dig out if there is one or more "offending" module, i.e. can you try to find out if this still happens with the right modules excluded?

if the perlscript + shellscript was done by yourself, i think you have some programming skills and can work out some strategy to find out which module causing this issue. (i`d bisect the lsmod output appropriately and let that run against pyk-perl.mod)
Comment 7 Folkert van Heusden 2009-01-06 14:21:41 UTC
Tried my script with only these modules:
ide_cd_mod
ide_disk
cdrom
piix
ide_core
ata_generic
libata
scsi_mod
dock
tried it with all modules but the ones listed above
tried it with only usb and without usb modules
Only when doing the script with all modules the problem arises.
Comment 8 Roland Kletzing 2009-01-06 15:57:26 UTC
so it does NOT happen with some modules excluded and it does also NOT happen when just trying the excluded modules ?

that`s weird.
Comment 9 Folkert van Heusden 2009-01-16 01:48:02 UTC
This problem is also reproducable with 2.6.28.
Comment 10 Folkert van Heusden 2009-01-16 11:37:39 UTC
Theodore,

> I can't tell how big the filesystem is (I'd need the output of dumpe2fs
> /dev/hdXXX) to detect that, but this:
> [92499.263311] Buffer I/O error on device dm-2, logical block 522133279

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1          31      248976   83  Linux
/dev/hda2              32        4865    38829105    5  Extended
/dev/hda5              32        4865    38829073+  8e  Linux LVM

38829073 < 522133279
so it tries to reach beyond the physical boundaries of the disk

> Indicates either a hardware error, or a corrected journal inode.

Possible as one of the errors that pop up is aborted journals, and journals take off-line and what not.

> In the latter case, e2fsck would have detected the problem, and offered to 
> fix it.

e2fsck finds millions of issues, I always give fixing a try but after minutes of pressing enter to fsck questions I give up and re-install debian (takes 15 minutes)
Comment 11 Folkert van Heusden 2009-01-16 13:24:11 UTC
Cannot reproduce the problem anymore with 2.6.26 but very easy for 2.6.27 and 2.6.28.
Comment 12 Theodore Tso 2009-01-16 18:23:08 UTC
It's still not clear to me what you do to trigger the corruption.  What modules, specifically, are you removing and inserting?   Can you narrow it down to a single module?

The messages

[  487.377387] bio too big device hda5 (8 > 0)
[  487.377860] bio too big device hda5 (8 > 0)

... indicates that the block queue data structure has gotten corrupted (since queue->max_hw_sectors should never be zero).   Bottom line is it sounds like *some* module is causing random memory corruption, leading to the kernel malfunctioning.   The bottom line is figuring out which kernel module or modules are involved.
Comment 13 Roland Kletzing 2009-01-17 02:29:44 UTC
i also think it`s memory corruption which leads to filesystem issues. please concentrate on developing a strateg to find the offending module(s).
Comment 14 Roland Kletzing 2009-01-17 02:34:15 UTC
besides the filesystem corruption, here is another sign of the memory corruption:


the modules then loaded are:

--snipp--
ide_disk               10592  3 
piix                    5864  4294967295    <-- !!!
ide_core               84468  2 ide_disk,piix
--snipp--
Comment 15 Folkert van Heusden 2009-01-17 04:59:55 UTC
Yes well do you guys have a suggestion? As each test-cycle takes at least half an hour as I need to reinstall debian each time.
Comment 16 Theodore Tso 2009-01-17 05:47:45 UTC
Well, the first thing I would do is optimize the test-cycle.   I would create partition the disk so you can install a stable debian system (I think you said you were stable with 2.6.26?), a fixed image of your test system (using 2.6.28 or 2.6.29-rc1), using the smallest posible system you can that still reproduces the problem.    Then copy, using dd, the fixed image of the test system to the scratch partition, and then rig up grub (where the menu.conf file is on your stable system) so you can boot the scratch partition.   Hopefully that way you can cut down your test cycle down to 5-10 minutes.
Comment 17 Folkert van Heusden 2009-01-17 06:12:28 UTC
Tried writing the modules that got rmmodded and insmodded to a file. Now since
ext3 fails to /root or any other filesystem fails massively. Inserted a memory
stick with after each write a sync. This killed 2 memory sticks. So I created a
vfat filesystem since fat is really trustworthy for this kind of tricks. And
now I got a list of modules! unfortunally i forgot to write to the file if it
got insmodded or rmmodded. Luckilly it'll be another 1,5 hour before
$girlfriend will be here so I'll try again.
...
here's the list of commands performed:
modprobe Module
rmmod shpchp
rmmod piix
rmmod loop
rmmod dm_log
rmmod ide_generic
modprobe snd_ac97_codec
modprobe scsi_mod
rmmod dm_snapshot
rmmod ide_gd_mod
rmmod ide_core
rmmod dm_snapshot
modprobe agpgart
rmmod i2c_i801
rmmod pci_hotplug
rmmod fan
rmmod i2c_core
modprobe Module
modprobe parport
modprobe dm_mirror
modprobe thermal_sys
modprobe snd_timer
modprobe snd_ac97_codec
modprobe nls_base
modprobe scsi_mod
rmmod snd_intel8x0
rmmod snd_intel8x0
modprobe i2c_core
rmmod snd_timer
modprobe sg
modprobe jbd
modprobe ide_core
modprobe uhci_hcd
rmmod usbcore
modprobe pci_hotplug
rmmod loop
modprobe snd_ac97_codec
rmmod processor
modprobe vfat
modprobe parport_pc
modprobe snd_intel8x0
rmmod dm_region_hash
rmmod thermal_sys
modprobe loop
modprobe sd_mod
modprobe ext3
rmmod ext3
rmmod hid
rmmod loop
rmmod ehci_hcd
modprobe evdev
rmmod iTCO_wdt
rmmod nls_cp437
modprobe dm_region_hash
modprobe shpchp
modprobe ac97_bus
modprobe snd_pcsp
modprobe ehci_hcd
modprobe rng_core
modprobe evdev
rmmod dm_log
modprobe ext3
rmmod Module
rmmod rng_core
modprobe thermal_sys
modprobe shpchp
modprobe snd_pcsp
rmmod fan
modprobe ac97_bus
modprobe ac97_bus
modprobe thermal_sys
modprobe fat
modprobe dm_mod
rmmod snd_pcsp
rmmod fan
modprobe piix
modprobe snd_timer
modprobe ide_generic
modprobe usbcore
modprobe thermal
modprobe ac97_bus
modprobe mbcache
rmmod fan
rmmod usb_storage
rmmod dm_mirror
modprobe button
rmmod sd_mod
rmmod dm_log
rmmod i2c_core
rmmod dm_mod
rmmod snd_timer
modprobe vfat
modprobe ac97_bus
modprobe vfat
modprobe vfat
rmmod sd_mod
rmmod nls_cp437
modprobe nls_cp437
rmmod soundcore
modprobe jbd
rmmod ide_gd_mod
modprobe ac97_bus
rmmod parport_pc
modprobe i2c_i801
modprobe dm_log
rmmod sr_mod
rmmod intel_agp
rmmod sd_mod
modprobe rng_core
modprobe piix
modprobe crc_t10dif
rmmod ext3
modprobe soundcore
rmmod ide_generic
modprobe i2c_i801
rmmod ac
modprobe dm_mod
modprobe crc_t10dif
rmmod snd_page_alloc
rmmod sd_mod
modprobe ide_cd_mod
rmmod snd_page_alloc
modprobe ac
rmmod snd_page_alloc
modprobe evdev
modprobe ide_gd_mod
modprobe ide_generic
modprobe container
modprobe i2c_core
modprobe agpgart
modprobe sr_mod
rmmod ide_cd_mod
rmmod loop
modprobe jbd
rmmod i2c_i801
rmmod shpchp
rmmod shpchp
modprobe ide_gd_mod
rmmod usb_storage
rmmod libata
rmmod evdev
modprobe snd_timer
rmmod nls_base
modprobe snd_pcm
rmmod ac
modprobe thermal
rmmod snd
modprobe snd_ac97_codec
rmmod uhci_hcd
rmmod dm_mod
modprobe ac
rmmod thermal_sys
modprobe agpgart
rmmod ata_generic
modprobe jbd
modprobe hid
rmmod i2c_i801
modprobe ide_core
modprobe evdev
modprobe ext3
modprobe battery
rmmod agpgart
modprobe snd_pcsp
rmmod i2c_core
rmmod ata_generic
modprobe usbhid
modprobe evdev
rmmod ac
rmmod hid
modprobe container
rmmod vfat
modprobe ide_generic
rmmod sd_mod
rmmod piix
rmmod ipv6
modprobe snd_timer
modprobe iTCO_wdt
rmmod processor
modprobe ide_cd_mod
modprobe ehci_hcd
rmmod sr_mod
rmmod shpchp
rmmod snd_pcm
modprobe container
modprobe i2c_i801
modprobe ata_generic
modprobe snd_pcm
modprobe snd_timer
modprobe ide_gd_mod
modprobe ext3
modprobe sg
modprobe nls_cp437
rmmod nls_cp437
rmmod jbd
rmmod i2c_i801
rmmod piix
rmmod thermal_sys
rmmod mbcache
modprobe nls_utf8
rmmod thermal
modprobe hid
rmmod snd_page_alloc
modprobe nls_base
modprobe dm_log
modprobe joydev
rmmod dm_mirror
modprobe ide_gd_mod
modprobe shpchp
modprobe scsi_mod
modprobe loop
modprobe ac
modprobe iTCO_wdt
rmmod container
rmmod crc_t10dif
modprobe ata_generic
rmmod hid
rmmod nls_cp437
rmmod rng_core
rmmod soundcore
rmmod dm_log
modprobe piix
modprobe loop
modprobe fan
modprobe mbcache
rmmod usbhid
modprobe crc_t10dif
modprobe soundcore
modprobe sd_mod
rmmod processor
rmmod parport
modprobe snd_intel8x0
rmmod jbd
rmmod fat
rmmod thermal_sys
modprobe usbhid
rmmod evdev
modprobe scsi_mod
rmmod Module
modprobe dm_snapshot
rmmod sr_mod
modprobe battery
rmmod parport_pc
modprobe agpgart
modprobe dm_mod
modprobe i2c_i801
modprobe ac97_bus
rmmod uhci_hcd
modprobe snd_ac97_codec
rmmod evdev
modprobe parport
modprobe snd_timer
modprobe scsi_mod
modprobe evdev
modprobe nls_base
modprobe hid
modprobe nls_base
modprobe sr_mod
rmmod ehci_hcd
modprobe snd_pcm
modprobe parport_pc
rmmod battery
modprobe container
rmmod i2c_i801
rmmod fan
modprobe sd_mod
modprobe parport_pc
rmmod i2c_core
rmmod parport_pc
rmmod fan
modprobe intel_agp
modprobe vfat
rmmod piix
rmmod fan
modprobe button
rmmod nls_cp437
rmmod hid
modprobe i2c_core
modprobe usb_storage
rmmod loop
modprobe cdrom
modprobe iTCO_wdt
rmmod thermal
rmmod dm_log
modprobe libata
modprobe i2c_i801
modprobe ide_gd_mod
modprobe rng_core
modprobe nls_base
modprobe Module
modprobe dm_mirror
modprobe intel_agp
modprobe fat
rmmod iTCO_wdt
rmmod rng_core
rmmod soundcore
rmmod ide_core
modprobe usb_storage
rmmod fat
rmmod fat
rmmod usb_storage
modprobe iTCO_wdt
rmmod nls_utf8
modprobe rng_core
rmmod jbd
rmmod usb_storage
rmmod ipv6
rmmod nls_utf8
modprobe i2c_i801
rmmod nls_base
modprobe pci_hotplug
rmmod evdev
rmmod piix
rmmod usbcore
modprobe nls_base
rmmod snd
rmmod loop
modprobe ehci_hcd
rmmod snd_pcm
rmmod dm_mod
modprobe snd_timer
rmmod i2c_core
modprobe thermal_sys
rmmod cdrom
rmmod snd_pcsp
rmmod intel_agp
rmmod Module
rmmod ide_core
modprobe shpchp
rmmod libata
rmmod dm_mirror
modprobe ide_cd_mod
rmmod snd_timer
rmmod rng_core
rmmod dm_log
modprobe uhci_hcd
modprobe agpgart
modprobe Module
modprobe parport
modprobe ide_gd_mod
modprobe snd
modprobe i2c_i801
modprobe ipv6
rmmod mbcache
modprobe snd_page_alloc
modprobe ide_core
rmmod snd
modprobe loop
rmmod i2c_core
modprobe intel_agp
rmmod nls_utf8
modprobe joydev
modprobe dm_log
rmmod nls_cp437
rmmod iTCO_wdt
rmmod rng_core
rmmod nls_cp437
rmmod usb_storage
modprobe piix
modprobe ata_generic
rmmod usb_storage
rmmod parport
rmmod fat
modprobe hid
rmmod crc_t10dif
modprobe hid
modprobe joydev
rmmod crc_t10dif
rmmod battery
rmmod nls_base
rmmod intel_agp
rmmod loop
rmmod hid
rmmod battery
rmmod dm_snapshot
rmmod dm_log
modprobe shpchp
rmmod snd_pcsp
rmmod mbcache
modprobe sg
modprobe rng_core
modprobe ide_gd_mod
rmmod ide_generic
modprobe dm_region_hash
modprobe pci_hotplug
modprobe crc_t10dif
rmmod snd_page_alloc
rmmod ac
modprobe shpchp
modprobe ata_generic
rmmod parport_pc
rmmod loop
rmmod ac
rmmod nls_cp437
rmmod button
modprobe thermal
modprobe usbcore
rmmod container
rmmod ext3
rmmod parport_pc
modprobe ac
modprobe snd_page_alloc
modprobe loop
rmmod Module
modprobe snd_intel8x0
modprobe Module
rmmod sr_mod
rmmod ipv6
rmmod rng_core
rmmod nls_utf8
rmmod i2c_core
modprobe vfat
rmmod evdev
modprobe snd_page_alloc
modprobe thermal
modprobe evdev
modprobe iTCO_wdt
rmmod snd_pcsp
rmmod sg
modprobe snd_pcm
rmmod ide_cd_mod
rmmod rng_core
modprobe sg
rmmod sr_mod
rmmod soundcore
modprobe thermal
modprobe fan
modprobe dm_mod
rmmod nls_utf8
rmmod libata
modprobe loop
rmmod nls_cp437
rmmod i2c_i801
modprobe soundcore
rmmod libata
modprobe snd
rmmod i2c_core
rmmod sr_mod
modprobe thermal
modprobe button
rmmod ide_cd_mod
rmmod hid
modprobe agpgart
modprobe snd_pcsp
modprobe evdev
modprobe dm_log
modprobe snd_intel8x0
modprobe snd_timer
rmmod Module
rmmod piix
modprobe ide_generic
modprobe libata
modprobe snd_timer

Hopefully this is of any help.
Comment 18 Roland Kletzing 2009-01-17 08:13:30 UTC
can you always reproduce the problem, if you let this run a second time, i.e. you load/unload the modules in the same order as listed ?
Comment 19 Theodore Tso 2009-01-17 10:24:54 UTC
Um, why are you loading and unloading so many modules?   Note that it is not necessarily guaranteed to be safe to be unloading modules.   In particular with network drivers, there are often race conditions that can crash your machine if you unload a module.   Part of the problem is that some kernel maintainers don't believe that it is valid/good thing to rmmod a kernel, and in practice, it is often impossible to make module remove race-free.  Some maintainers therefore don't take even basic precautions to avoid the most obvious race problems.

So if you have something which is automatically unloading modules --- don't.  It's not supported.   If you can narrow it down to a single module which is racy on unload, and you can reproduce it, and polite request help from the module maintainer to fix it, they might feel magnanimous and fix it for you --- but be warned there are some maintainers (davem comes to mind) who believe so strenuously that module unloading is evil and shouldn't be supported that even if you give them a patch to fix some module unload race condition, they may not accept it.
Comment 20 Theodore Tso 2009-01-17 10:27:49 UTC
P.S. There are a few modules that I manually unload, such as ehci_ucd and uhci_ucd for power management reasons, but that's basically because the USB folks haven't given us better ways of turning off USB or to better manage USB's power consumption when running on batteries. But it's better to consider that you have a safe list of modules that which can be unloaded safely, and to do so by hand, rather by some automatic program.

P.P.S. In any case, it's pretty clear this isn't a filesystem bug.
Comment 21 Folkert van Heusden 2009-01-17 10:40:31 UTC
(In reply to comment #18)
> can you always reproduce the problem, if you let this run a second time, i.e.
> you load/unload the modules in the same order as listed ?

Yes, most definately.
Comment 22 Folkert van Heusden 2009-01-17 10:43:33 UTC
(In reply to comment #20)
[ .. not guaranteed that removing a module works in all cases ]
> P.P.S. In any case, it's pretty clear this isn't a filesystem bug.

Ah ok. I thought it was supposed to work in all cases and that I just found a bug that should not be there.
Glad it is not a filesystem bug as I became a little afraid to upgrade to something more recent than 2.6.26.

Thanks
Comment 23 Roland Kletzing 2009-01-17 10:46:08 UTC
>Note that it is not necessarily guaranteed to be safe to be unloading modules. 

if it`s not safe, those unsafe modules should be marked appropriately that
they cannot be unloaded. at least they should spit out a warning that unloading
should be avoided. there are already lot`s of modules which cannot be
unloaded at all, so it`s just a matter of good will if the others being marked
appropriately. 

if a module is unloadable and this is unsafe and if that module doesn`t tell that to the user, then i`d call that a bug.
if it DOES tell that and the kernel crashes, then it`s user error.

why i think this way?
i already did lot`s of module testing like this user does, so i did lot`s of
automated module load unload and found the one or other bug with this, but it`s
the first time that someone is telling, that this should NOT be done because
there are developers who do not support this.

i think it`s good that we have people like folkert reporting such issues, because it enhances kernel quality.
Comment 24 Eric Sandeen 2009-01-17 10:56:14 UTC
(In reply to comment #21)
> (In reply to comment #18)
> > can you always reproduce the problem, if you let this run a second time,
> i.e.
> > you load/unload the modules in the same order as listed ?
> 
> Yes, most definately.
> 

Then I'd try to narrow down the simplest subset of that list that reproduces the problem...
Comment 25 Theodore Tso 2009-01-17 11:01:05 UTC
I agree this would be a desirable thing to do, but it takes time for this to
happen, and the people who are most interested in determining which modules are
runing into problems when unloaded frequently are the ones who need to do this
testing.   

I'll note there are also those modules which *can* be safely unloaded, as long
as you ifconfig down the interface, make sure there are no active programs
accessing them, wait a few seconds, (save your files just in case), and then
unload it, while crossing your fingers.  It's sufficiently useful to unload
this driver that even though you have to be ***very*** careful to unload it,
I'd prefer that it not be made completely impossible to remove.   Note that
there already is a config option to prevent rmmod from working at all, and
rmmod requires root privs, and we do expect root to have at least some
background skills....
Comment 26 Roland Kletzing 2009-08-21 19:47:15 UTC
ted, shouldn`t modules which cannot be safely unloaded either be unloadable or being at least marked appropriately (print warning on unload), so the users will know that they are doing that on their own risk ?

if module load works, so should unload, and if unload isn`t safe to use, then it`s a bug and the world (i.e. the end user) should know about that. so, such problematic modules should print appropriate message on unload,imho.
Comment 27 Roland Kletzing 2009-08-21 19:51:48 UTC
bullshit - double post. i`m repeating myself with something i already told months ago. i´m getting old.... :D 
can someone tell me how to delete my own post in bugzilla?
Comment 28 Roland Kletzing 2009-08-21 20:04:51 UTC
Folkert, we`d be still interested in which module is killing your system.

so, if you can provide more input here we should try finding the offending module.

if you don`t want to do that due to lack of time (which is understandable, since not everybody likes bug-hunting) we sould just close this bugreport, as it would be just another unresolved bug and you would get asked about the status every couple of months or so.

one more: does this ticket relate to this post ? http://marc.info/?l=linux-kernel&m=122841015111252&w=2

you are telling there, that the problem doesn`t exists with .28rc kernels, but in this bugtracker you say you could easy trigger it with .28 kernel.

so, can you still trigger the problem with recent .30 or .31rc kernels ?

one more recommendation: as you told, that you constantly were killing your system and need to reinstall, i´d relocate testing into a virtual machine. so, you could keep a snapshot of the working system and always revert back to a working state very quickly.
Comment 29 Alan 2010-01-25 14:17:51 UTC
No response, closing old stale bug

Note You need to log in before you can comment on or make changes to this bug.