Bug 16219
I have made a mistake in the bug description and can not edit it :-( Both HDs in 'System2 (old computer)' have MBR Partition tables! Please note: System1 (64 bits) 2 HD with GUID Partition tables System2 (32 bits) 2 HD with MBR Partition tables I have always had problems with 'System1 (new computer)' so I was not sure if it had a hardware or a software problem. I needed to find out to make use of the manufacturer (Motherboard?) guarantee in case of a hardware problem. I suspected the problem was in hard drives access and decided to use the minimum software neccessary to test it: A stress linux live CD created with ISO images (http://www.stresslinux.org/sl/downloads): 64 bits live CD: stresslinux_64bit.x86_64-0.5.113.iso.bz2 kernel 2.6.31.12.0.2-default 32 bits live CD: stresslinux_32bit.i686-0.5.113.iso.bz2 kernel 2.6.31.12.02-pae aio-stress. A program to test random writes in HDs which is used by phoronix-test-suite I have used version 0.21 of aio-stress compiled for x86_64 and x86_32. I have made attachments in case anyone wants to repeat my tests: aio-stress-bin_64bits & aio-stress-bin_32bits (rename to aio-stress-bin) Have used a shell script to run 'aio-stress-bin -s 1g -r 64k -t 3 -o 2' (same as phoronix-test-suite uses. random write to a 1 GB file) 5 times. Also attached: aio-stress aio-stress gives two numbers. I guess the first refers to the first time the 1 GB file is written and the second to several random writes to it. Here are the results of my tests: System1 64 bits /dev/sda System1 (SATA 2 TB, GUID Partition table, ReiserFs) Mb/s: 14.33; Mb/s totals: 6.34 /dev/sda System2 (SATA 4.7GB, MBR Partition table, ReiserFs) Mb/s: 144.20; Mb/s totals: 20.77 32 bits /dev/sda System1 (SATA 2 TB, GUID Partition table, ReiserFs) Mb/s: 2.01; Mb/s totals: 1.54 /dev/sda System2 (SATA 4.7GB, MBR Partition table, ReiserFs) Mb/s: 14.12; Mb/s totals: 11.66 System2 32 bits /dev/sda System1 (SATA 2 TB, GUID Partition table, ReiserFs) Mb/s: 2.05; Mb/s totals: 1.56 /dev/sda System2 (SATA 4.7GB, MBR Partition table, ReiserFs) Mb/s: 13.60; Mb/s totals: 11.11 Note: Other tests show that the results are similar for /dev/sda and /dev/sdb in both systems. I have also tested an external disk connected to an Usb port: System1 64 bits /dev/sdc External HD Usb (PATA 120GB, MBR Partition table, ReiserFs) Mb/s: 184.54; Mb/s totals: 26.36 System2 32 bits /dev/sdc External HD Usb (PATA 120GB, MBR Partition table, ReiserFs) Mb/s: 19.14; Mb/s totals: 15.56 The results are better than those of SATA drives probably because a buffer is used. Attached: ResultAverages.txt (totals and averages) Log01.txt .. Log13.txt outputs of aio-stress shell As you can see the results for the disk with GUID Partition tables are extremely bad compared to those of the disk with MBR Partition table. In both systems! My conclusion is that the linux kernel (2.6.31 & 2.6.32) has a very poor performance with GUID Partition table. When I was running the tests in tty1, I was checking the results in tty2 with 'cat aio-stress-log.txt'. When I was testing the disk with GUID Partition table, and the current run was midway, the 'cat' command sometimes was blocked. The obvious solution to my problem would be to reformat the drives in my new computer to MBR Partition tables (2TB is the limit for MBR) but I'd prefer if this kernel's bug is solved. Any help heartily welcome! Thanks in advance. Created attachment 26786 [details]
aio-stress 32 bits Please rename to aio-stress-bin to use with aio-stress shell
Created attachment 26787 [details]
aio-stress 64 bits Please rename to aio-stress-bin to use with aio-stress shell
Created attachment 26788 [details]
Totals and averages
Created attachment 26789 [details]
5 runs result (refer to ResultAverages.txt)
Created attachment 26790 [details]
Log02.txt 5 runs result (refer to ResultAverages.txt)
Created attachment 26791 [details]
Log03.txt 5 runs result (refer to ResultAverages.txt)
Created attachment 26792 [details]
Log04.txt 5 runs result (refer to ResultAverages.txt)
Created attachment 26793 [details]
Log05.txt 5 runs result (refer to ResultAverages.txt)
Created attachment 26794 [details]
Log06.txt 5 runs result (refer to ResultAverages.txt)
Created attachment 26795 [details]
Log07.txt 5 runs result (refer to ResultAverages.txt)
Created attachment 26796 [details]
Log08.txt 5 runs result (refer to ResultAverages.txt)
Created attachment 26797 [details]
Log09.txt 5 runs result (refer to ResultAverages.txt)
Created attachment 26798 [details]
Log10.txt 5 runs result (refer to ResultAverages.txt)
Created attachment 26799 [details]
Log11.txt 5 runs result (refer to ResultAverages.txt)
Created attachment 26800 [details]
Log12.txt 5 runs result (refer to ResultAverages.txt)
Created attachment 26801 [details]
Log13.txt 5 runs result (refer to ResultAverages.txt)
Created attachment 26802 [details]
aio-stress shell script
Can you please attach parted dump of the partition table? Please set the unit to sectors. Thanks. parted -l Model: ATA WDC WD20EARS-00S (scsi) Disk /dev/sda: 2000GB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 17.4kB 6000MB 6000MB linux-swap(v1) MainLinux swap 2 6000MB 26.0GB 20.0GB reiserfs MainLinux / bios_grub 3 26.0GB 46.0GB 20.0GB reiserfs MainLinux home 4 46.0GB 2000GB 1954GB reiserfs MainLinux Store Model: ATA WDC WD20EARS-00S (scsi) Disk /dev/sdb: 2000GB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 17.4kB 6000MB 6000MB linux-swap(v1) EmergencyLinux swap 2 6000MB 29.0GB 23.0GB reiserfs EmergencyLinux / 3 29.0GB 2000GB 1971GB reiserfs EmergencyLinux Store fdisk -l -u /dev/sda WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted. Disk /dev/sda: 2000.4 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sda1 1 3907029167 1953514583+ ee GPT fdisk -l -u /dev/sdb WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted. Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdb1 1 3907029167 1953514583+ ee GPT No edit option :-( I want to add something to my previous message "When I was running the tests in tty1, I was checking the results in tty2 with 'cat aio-stress-log.txt'. When I was testing the disk with GUID Partition table, and the current run was midway, the 'cat' command sometimes was blocked." In both systems! You can tell parted to use sectors by running # parted /dev/sda (parted) u s (parted) p Can you please get the output of the above command? parted /dev/sda GNU Parted 2.2 Using /dev/sda Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) u s (parted) p Model: ATA WDC WD20EARS-00S (scsi) Disk /dev/sda: 3907029168s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 34s 11718784s 11718751s linux-swap(v1) MainLinux swap 2 11718785s 50781285s 39062501s reiserfs MainLinux / bios_grub 3 50781286s 89843786s 39062501s reiserfs MainLinux home 4 89843787s 3907027380s 3817183594s reiserfs MainLinux Store (parted) q parted /dev/sdb GNU Parted 2.2 Using /dev/sdb Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) u s (parted) p Model: ATA WDC WD20EARS-00S (scsi) Disk /dev/sdb: 3907029168s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 34s 11718784s 11718751s linux-swap(v1) EmergencyLinux swap 2 11718785s 56640660s 44921876s reiserfs EmergencyLinux / 3 56640661s 3907027379s 3850386719s reiserfs EmergencyLinux Store (parted) q Thanks for helping Tejun! :-) Hmmm... WD20EARS is 4k physical sector drive and your partition table isn't aligned which will result in significantly lower write performance and the green drives are not the fastest drives to begin with. Can you please repartition so that sectors are properly aligned and see whether the problem goes away? Fore more info on 4k sector issues, http://ata.wiki.kernel.org/index.php/ATA_4_KiB_sector_issues Thanks. Thanks Tejun. I did the current partitioning with the Ubuntu9.10 distro Cd (64 bits). I don't think there was a sector-size option. I am not an expert and backing data and reformatting a 2 TB drive is not a trivial task. Would you please give me step by step instructions on how to do it? Now sector size seems to be 512 bytes, do you want me to change it to 4 KB? Should I use gparted/parted? Heh... I can't really help you with backing up data, sorry. :-P But if you're on ubuntu, the new release 10.04 should align partitions properly for 4k drives, so if you do clean install (including repartitioning) of 10.04, it should work automatically. I'll check whether it actually works. Thanks. I did not mean I needed help with backing up. I only wanted to make sure I did the partitioning correctly first time because it is so much work to back up and recover ;-) Are you going to check Ubuntu10.04 partitioning? Shall I wait until you give me the Ok? I read this in http://ata.wiki.kernel.org/index.php/ATA_4_KiB_sector_issues: "L-1. Kernel support ................... As of v2.6.33, Linux ATA drivers do not support drives with 4KiB logical sector size although there is a development branch containing experimental support[11]. For ATA drives connected via bridges to different buses - USB and IEEE 1394, as long as the bridges support 4KiB logical sector size correctly, the SCSI disk driver can handle them." Okay, just checked 10.04 and yeap it aligns partitions correctly, so if you delete existing partitions from 10.04 boot media and create new ones you should be fine. The L-1 is for logical sector size. There's no logical 4k drives on market now. They're all logical 512b, physical 4k ones. Thanks. Thanks a lot Tejun. Will test it right now and post results. Hi Tejun! Have reformatted /dev/sdb with Ubuntu10.04 CD (fresh install) and I have made some tests before restoring data: Before System1 64 bits /dev/sda System1 (SATA 2 TB, GUID Partition table, ReiserFs) Mb/s: 14.33; Mb/s totals: 6.34 Now System1 64 bits /dev/sdb System1 (SATA 2 TB, GUID Partition table, ReiserFs) Mb/s: 187.45; Mb/s totals: 29.18 As you can see there has been an enormous improvement :-) But sector size is still 512 Bytes: parted /dev/sdb GNU Parted 2.2 Using /dev/sdb Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) u s (parted) p Model: ATA WDC WD20EARS-00S (scsi) Disk /dev/sdb: 3907029168s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 2048s 11718655s 11716608s linux-swap(v1) 2 11718656s 56641535s 44922880s reiserfs 3 56641536s 3907028991s 3850387456s reiserfs (parted) q And I could not get rid of a 1 MB gap between sdb1 (swap partition) and sdb2. Do you think current partitioning is all right? Should I continue with restoring data and reformatting /dev/sda? Thanks again for your help! Oh, the visible block size is 512byte although they're backed by 4k physical sectors. All that's necessary is aligning partition tables to those 4k boundaries. parted now aligns everything to 1M boundary (2048 sectors) so that gap is expected. Yeap, everything looks good to me. Good luck. Thanks to Tejun repartitioning advice, access to both HDs in my system is much faster. This are the new aio-stress results: /dev/sdb2 (New), AHCI, StressLinux_LiveCd-tty1 (formatted with Ubuntu10.04) Mb/s Mb/s totals 176.61 24.33 96.37 23.78 126.21 19.94 125.57 22.02 412.49 55.81 Mb/s Mb/s totals Averages: 187.45 29.18 /dev/sda3 (New), AHCI, StressLinux_LiveCd-tty1 (formatted with Ubuntu10.04) Mb/s Mb/s totals 126.50 19.91 105.49 19.41 88.63 17.92 98.46 18.33 310.04 46.40 Mb/s Mb/s totals Averages: 145.82 24.39 sda seems to be slower than sdb. Could it be because sda has more partitions (sdb: swap+2; sda: swap+3)? It is too early to say but everything seems to go faster and I have not experienced any significant slowdown or freezes. :-D But there is still a problem: 'umount -l' sometimes is blocked for a long time. I use shell scripts to run some tasks periodically (Have used them without problems for many years). If I am using Linux on sda some of this tasks imply mounting sdb partitions, reading/writing in those partitions and then umounting them. As several processes may need to access the same partition at the same time they use a 'Semaphore' script to mount/umount the partitions. It is when these scripts invoke 'umount -l' when, sometimes, umount hangs. Last time it happened one of the scripts was calling 'umount -l' to unmount partition sdb2 while another was also calling 'umount -l' to umount sdb3. 'umount' took more than 10 minutes to complete both tasks! Here are the corresponding messages in kern.log: { INFO: task umount:13856 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. umount D 00000000ffffffff 0 13856 13855 0x00000000 ffff88008c841d98 0000000000000086 0000000000015bc0 0000000000015bc0 ffff8800986e1ab0 ffff88008c841fd8 0000000000015bc0 ffff8800986e16f0 0000000000015bc0 ffff88008c841fd8 0000000000015bc0 ffff8800986e1ab0 Call Trace: [<ffffffff811650d0>] ? bdi_sched_wait+0x0/0x20 [<ffffffff811650de>] bdi_sched_wait+0xe/0x20 [<ffffffff8153f40f>] __wait_on_bit+0x5f/0x90 [<ffffffff811650d0>] ? bdi_sched_wait+0x0/0x20 [<ffffffff8153f4b8>] out_of_line_wait_on_bit+0x78/0x90 [<ffffffff81085360>] ? wake_bit_function+0x0/0x40 [<ffffffff81165094>] ? bdi_queue_work+0xa4/0xe0 [<ffffffff8116640f>] bdi_sync_writeback+0x6f/0x80 [<ffffffff81166440>] sync_inodes_sb+0x20/0x30 [<ffffffff81169f12>] __sync_filesystem+0x82/0x90 [<ffffffff81169ff9>] sync_filesystems+0xd9/0x130 [<ffffffff8115ece1>] sys_umount+0xb1/0xd0 [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b INFO: task umount:19440 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. umount D 00000000ffffffff 0 19440 19439 0x00000000 ffff8800985ade88 0000000000000082 0000000000015bc0 0000000000015bc0 ffff8800986e31a0 ffff8800985adfd8 0000000000015bc0 ffff8800986e2de0 0000000000015bc0 ffff8800985adfd8 0000000000015bc0 ffff8800986e31a0 Call Trace: [<ffffffff8153f8c7>] __mutex_lock_slowpath+0xe7/0x170 [<ffffffff8153f7bb>] mutex_lock+0x2b/0x50 [<ffffffff81169f3f>] sync_filesystems+0x1f/0x130 [<ffffffff8115ecd7>] sys_umount+0xa7/0xd0 [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b INFO: task umount:19440 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. umount D 00000000ffffffff 0 19440 19439 0x00000000 ffff8800985ade88 0000000000000082 0000000000015bc0 0000000000015bc0 ffff8800986e31a0 ffff8800985adfd8 0000000000015bc0 ffff8800986e2de0 0000000000015bc0 ffff8800985adfd8 0000000000015bc0 ffff8800986e31a0 Call Trace: [<ffffffff8153f8c7>] __mutex_lock_slowpath+0xe7/0x170 [<ffffffff8153f7bb>] mutex_lock+0x2b/0x50 [<ffffffff81169f3f>] sync_filesystems+0x1f/0x130 [<ffffffff8115ecd7>] sys_umount+0xa7/0xd0 [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b } Before repartitioning I had also observed that sometimes 'umount -l' hangs for a long time when it is called by one of the mentioned processes while another is waiting for its chance (sleep loop) to mount a partition. All processes are spawned from a parent process (run by cron) and I am wondering if there is some scheduling problem that does not asign cpu time to 'umount -l' because another sibling process is running. This task running scheme (I call it CommonCron) runs perfectly on my old computer (2.6.31.17-generic kernel 32 bits). Any ideas or suggestions? (cc'd Andrew) The performance difference could be most likely due to different location and size of those partitions. ISTR recently reading something about fixing a bug which makes umount take a very long time but I haven't really followed it. Andrew, does anything ring a bell? Thanks. Faster access to HDs makes a big difference but I am disappointed that some of the above described problems still persist :-( This morning, soon after booting, the system froze. I had to wait a long time until I could switch to tty and the system was so slow that I could only check that CPUs had low loads (top). The system unfroze somewhat after a few minutes. Kern.log shows the usual 'blocked for more than 120 seconds' messages plus some dealing with DMA and swap memory. I don't know much about the Linux kernel and would be grateful if somebody could tell me what is going on. I'll attach the corresponding kern.log (Kern19.06.10.log), Thanks in advance. Created attachment 26861 [details]
kern.log after system freeze.
Does it still happen w/o amule running? It looks like amule is leaking memory very quickly filling up the memory and swap leading to massive thrashing. Some distros limit the amount of memory a process can consume these days. It looks like ubuntu isn't doing that. Anyways, w/ memory bomb running w/o any resource limit, the kernel can do only so much. Thanks. Thank you Tejun. I'll check if amule is running next time my system freezes. Normally amule does not use any swap memory. It is running now and this is the memory use: free total used free shared buffers cached Mem: 4055212 3984380 70832 0 123036 3178964 -/+ buffers/cache: 682380 3372832 Swap: 5858296 0 5858296 There has only been the freeze mentioned above since I repartitioned the HDs. I am happy now. Using the computer before repartitioning was a nightmare. Thank you again. *** Bug 16000 has been marked as a duplicate of this bug. *** |
Created attachment 26783 [details] kern.log May 20, 2010 after a bad system freeze. System1 (new computer) Motherboard: Asus P7P55D LE BIOS updated 2 times (Same problems with all versions) CPU: Intel Core i5 661 3.33Ghz 4 Gb Ram Graphics card: Asus Gforce GT 220, 1GB 2* SATA Hd Western Digital Caviar Green 2 Terabytes /dev/sda GUID partition table /dev/sda1 swap 6GB /dev/sda2 ReiserFs 20GB /dev/sda3 ReiserFs 20GB /dev/sda4 ReiserFs 1816GB /dev/sdb GUID partition table /dev/sdb1 swap 6GB /dev/sdb2 ReiserFs 23GB /dev/sdb3 ReiserFs 1833GB OS: Ubuntu10.04 2.6.32-22-generic kernel 64 bits (Same problems with 2.6.31) Desktop: Gnome System2 (old computer) (No problems used for testing) Motherboard: Abit IC7-G CPU: Pentium 4 Hyperthreading 1 Gb Ram Graphics card: Ati Radeon 920 2* SATA: Maxtor 300GB + Seagate 250 GB /dev/sda GUID partition table /dev/sda1 swap 1.86GB /dev/sda2 ReiserFs 9.4GB /dev/sda3 ReiserFs 4.7GB /dev/sda4 ReiserFs 283GB /dev/sdb GUID partition table /dev/sdb1 swap 0.5GB /dev/sdb2 ReiserFs 9.4GB /dev/sdb3 ReiserFs 224GB OS: Ubuntu10.04 2.6.31-17-generic kernel 32 bits Desktop: Gnome Problems only with System1 Temporary freezes: Opening applications Closing applications Opening application's dialogs (specially opening/saving documents) Opening and saving (gedit) Refreshing nautilus Audio playing stops temporarily Disk operations seem to take a long time, specially 'umount' Occasionally shutdown takes 10..15 minutes apparently waiting for 'umount' Freezes can be produced by copying big files -The slowdown of the system seems to be exponential, the longer it takes to copy the more the system slows down, the more the system slows down the longer it takes to copy... etc -When copying in nautilus progress dialog starts moving fast to slowdown and almost hang at the end. Sometimes when I double click on a text file in Nautilus both gedit and nautilus windows go grey. Sometimes when I double click on an audio file in Nautilus, and alsaplayer starts, I can not change focus to another file in nautilus window until alsaplayer finishes playing audio file Checked: RAM memory checked several times: Ok CPU checked: running several stress tests: Ok When problems appear, all 4 cpus (2 cores + hyperthreading): -Have low loads (10..20%) -Temperature (sensors) low: 32°C Changed some settings in BIOS without any improvement Drives set to AHCI or IDE Intel SpeedStep (CPU speed controlled by OS) disabled Virtualization tech. disabled Hard disks are Ok: Tested them with palimpsest and smartctl Same problem booting from both disks (both have Linux) Problems happen with both free and non-free nvidia drivers No error entries in logs except for blocked process warnings (Idle-priority scheduling bug?): INFO: task application-name:1878 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ......... Call Trace: ......... acpi does not seem to be functional acpi -V Cooling 0: Processor 0 of 0 Cooling 1: Processor 0 of 0 Cooling 2: Processor 0 of 0 Cooling 3: Processor 0 of 0 acpidump >/tmp/table.txt Wrong checksum for OEMB! Warning in kernel log at boot: ACPI Warning: Incorrect checksum in table [OEMB] - BA, should be B9 (20090903/tbutils-314) This 'wrong checksum' warning has happened with 3 different version of BIOS software. I have attached kernel messages corresponding to a day when there was a particularly bad freeze soon after boot (It usually happens soon after booting when anacron runs /etc/cron.daily).