Bug 2493
Summary: | hard freeze on heavy load with external firewire hard drive | ||
---|---|---|---|
Product: | Drivers | Reporter: | thomas |
Component: | IEEE1394 | Assignee: | Stefan Richter (stefanr) |
Status: | REJECTED INSUFFICIENT_DATA | ||
Severity: | high | CC: | bunk |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.4 and 2.6.5 | Subsystem: | |
Regression: | --- | Bisected commit-id: |
Description
thomas
2004-04-11 04:58:17 UTC
It's not the first time this happens also : the drive's node changes while it's mounted - maybe the two problems are related. I hope the following dmesg output helps : ieee1394: Error parsing configrom for node 0-00:1023 ieee1394: Node changed: 0-00:1023 -> 0-01:1023 ieee1394: Node added: ID:BUS[0-00:1023] GUID[0010b92000c66716] scsi1 : SCSI emulation for IEEE-1394 SBP-2 Devices ieee1394: sbp2: Logged into SBP-2 device ieee1394: sbp2: Node 0-00:1023: Max speed [S400] - Max payload [2048] Vendor: Maxtor Model: 5000DV Rev: 0100 Type: Direct-Access ANSI SCSI revision: 06 SCSI device sda: 320171008 512-byte hdwr sectors (163928 MB) sda: asking for cache data failed sda: assuming drive cache: write through /dev/scsi/host1/bus0/target1/lun0: p1 p2 p3 Attached scsi disk sda at scsi1, channel 0, id 1, lun 0 Attached scsi generic sg1 at scsi1, channel 0, id 1, lun 0, type 0 found reiserfs format "3.6" with standard journal Reiserfs journal params: device sda1, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 reiserfs: checking transaction log (sda1) for (sda1) reiserfs: replayed 22 transactions in 1 seconds Using r5 hash to sort names ieee1394: Node changed: 0-01:1023 -> 0-00:1023 ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[0010b92000c66716] ieee1394: sbp2: Logged out of SBP-2 device scsi1 (1:0): rejecting I/O to dead device Buffer I/O error on device sda1, logical block 6514 lost page write due to I/O error on sda1 Buffer I/O error on device sda1, logical block 6515 lost page write due to I/O error on sda1 Buffer I/O error on device sda1, logical block 6516 lost page write due to I/O error on sda1 Buffer I/O error on device sda1, logical block 6517 lost page write due to I/O error on sda1 Buffer I/O error on device sda1, logical block 6518 lost page write due to I/O error on sda1 Buffer I/O error on device sda1, logical block 6519 lost page write due to I/O error on sda1 Buffer I/O error on device sda1, logical block 6520 lost page write due to I/O error on sda1 Buffer I/O error on device sda1, logical block 6521 lost page write due to I/O error on sda1 Buffer I/O error on device sda1, logical block 6522 lost page write due to I/O error on sda1 Buffer I/O error on device sda1, logical block 6523 lost page write due to I/O error on sda1 journal-601, buffer write failed ------------[ cut here ]------------ kernel BUG at fs/reiserfs/prints.c:339! invalid operand: 0000 [#1] PREEMPT CPU: 0 EIP: 0060:[<c01a7f28>] Tainted: PF EFLAGS: 00010282 (2.6.5) EIP is at reiserfs_panic+0x38/0x70 eax: 00000024 ebx: d679a000 ecx: 00000001 edx: c04a10f8 esi: e2e3e5e4 edi: 00000000 ebp: c17f1cd4 esp: c17f1cc4 ds: 007b es: 007b ss: 0068 Process knodemgrd_0 (pid: 12, threadinfo=c17f0000 task=dfc0b160) Stack: c043ca6f c057c6a0 e2e3e5e4 d679a000 c17f1d18 c01b34de d679a000 c0448b60 00001000 c17f1d18 d58ab000 dec15ce0 0000197c 0000000e 0000000c 00000000 0000000b caa5e8b0 d679a000 00000001 00000004 c17f1d80 c01b7abc d679a000 Call Trace: [<c01b34de>] flush_commit_list+0x2ee/0x470 [<c01b7abc>] do_journal_end+0x60c/0xc50 [<c01b6d9d>] flush_old_commits+0x13d/0x1a0 [<c01a4bb1>] reiserfs_write_super+0x81/0x90 [<c015a3c7>] fsync_super+0xb7/0xc0 [<c015a3f5>] fsync_bdev+0x25/0x60 [<c0172948>] __invalidate_device+0x68/0x70 [<c0160a70>] bdev_set+0x0/0x20 [<c02bf43f>] invalidate_partition+0x3f/0x55 [<c018dc2c>] del_gendisk+0x2c/0xf0 [<c02f3fe0>] sd_remove+0x20/0x40 [<c02b66f6>] device_release_driver+0x66/0x70 [<c02b6870>] bus_remove_device+0x70/0xc0 [<c02b566f>] device_del+0x6f/0xb0 [<c02f13fe>] scsi_remove_device+0x5e/0xb0 [<c02f0834>] scsi_forget_host+0x44/0x90 [<c02ea56a>] scsi_remove_host+0x2a/0x60 [<c030d7e9>] sbp2_remove_device+0x1b9/0x1e0 [<c030cf85>] sbp2_remove+0x25/0x30 [<c02b66f6>] device_release_driver+0x66/0x70 [<c030343f>] nodemgr_suspend_ne+0x10f/0x140 [<c0303710>] nodemgr_probe_ne+0x60/0x90 [<c03037a6>] nodemgr_node_probe+0x66/0xb0 [<c0303b61>] nodemgr_host_thread+0x171/0x1a0 [<c03039f0>] nodemgr_host_thread+0x0/0x1a0 [<c01072c5>] kernel_thread_helper+0x5/0x10 Code: 0f 0b 53 01 58 0e 44 c0 c7 04 24 e0 75 44 c0 b8 71 0a 44 c0 initializing firmware upload I'm seeing this problem currently here too with 2.6.4 & 2.6.5 using an external ieee1394 hard drive via a "FireWire (IEEE 1394): Texas Instruments PCI4451 IEEE-1394 Controller". I'm trying to use rsync to backup my o/s but have been using nice -n 19 to no avail. Working on my laptop remotely and cannot connect to an external box for logging the dump to syslog. Can just say that I'm not seeing any "buffer i/o errors" -- just mearily the PREEMPT & nothing with rieserfs in it either. (I am using a rieserfs on the external hdd). And here too with Kernel (from kernel.org) 2.6.4 and gcc version 3.3.3 (Debian 20040401) WDIGTL Model: WDCFireWireHD-Ox Rev: 2.70 (120 GB ext. HDD) Debian 'unstable' when using rsync or unison-gtk the box freezes up completely after a view minutes. The problem persists with 2.6.8 and 2.6.9 kernels from debian. I have tried with and old 2.6.3 kernel and it works without any problem. Is this problem still present in a vanilla 2.6.12.2 ftp.kernel.org kernel without any xternal modules loaded since the last boot? To everyone who commented here: Is this problem still present with 2.6.12.x? Does one of the hints given at http://www.linux1394.org/faq.php#sbp2abort help? Although the problems reported by the commenters seem to be the same, they may actually have different causes. In case of Thomas' report, the first sign of trouble is: > ieee1394: Node changed: 0-01:1023 -> 0-00:1023 > ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[0010b92000c66716] > ieee1394: sbp2: Logged out of SBP-2 device This means that the FireWire controller told the drivers that the disk was disconnected. I assume you did not trip over the cable, so there may be a hardware problem like an overheated physical interface chip in the disk enclosure or in the PC. If so, then we can of course not fix it in software. But we have to fix the occurence of DoS situations at least. Do the other commenters see the "node changed"/"suspended"/"logged out" messages too? 2.6.14-rc3 contains two sbp2 fixes that may be related to the reports here. Please test. Please test Linux 2.6.14. |