Bug 3225 - sbp2 feature: integrate with scsi_wait_scan module? (was: Failure to re-scan SCSI devices after firewire modules loaded, doesn't see firewire disks)
Summary: sbp2 feature: integrate with scsi_wait_scan module? (was: Failure to re-scan ...
Status: REJECTED DOCUMENTED
Alias: None
Product: Drivers
Classification: Unclassified
Component: IEEE1394 (show other bugs)
Hardware: i386 Linux
: P2 low
Assignee: Stefan Richter
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-08-16 14:42 UTC by Don Krajewski
Modified: 2007-09-20 23:01 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.7-r14
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Don Krajewski 2004-08-16 14:42:14 UTC
Distribution: Gentoo
Hardware Environment:
Dell Inspiron 8200
1.9 GHz Pentium 4
30 GB Hard Disk (100 MB boot Ext2 partition, rest NTFS WindowsXP)
Western Digital 120 GB Firewire Drive (2 GB Swap, rest ext3 partitions - root,
var, user, home, data)
512 MB RAM
Nvidia Video (GeForce 440 2 Go 64MB)
Software Environment:
GCC 3.3.3r6
Gentoo-dev-sources (2.6.7-r14)

Problem Description:
Upon boot, modules are loaded SCSI first, then others as it finds them. This
fails to rescan SCSI devices, particularly for emulation and recognition of USB
or Firewire drives. Modules loaded but don't see disks off USB or firewire.
Note: you can "scsi add-single-device 0 0 0 0 > /proc/scsi/scsi" after boot or
run "rescan-scsi-bus.sh" after boot, and device works fine. The problem occurs
when booting to the firewire device before proc is mounted or bash is availble. 

I have attempted to modify the initrd.scripts so that it executes other scripts
(copied by gen_initrd.sh during gentoo's genkernel process) and executes as the
last step of "modules_scan" proceure at boot boot time. Nothing I have inserted
seems to work as the resources are only found after booting the system. Gentoo's
kernel team indicate the bug is upstream from them and to file a bug report at
kernel.org (here).

I am hoping that someone can guide me in the right direction for getting the
SCSI emulation module unloaded and reloaded if it found it before, to not load
it until the very last module so that all SCSI buses are scanned, or to come up
with a workaround that will work reliably.

An article at IBM's developerWorks describes the problem on the 2.4.X kernel:
http://www-106.ibm.com/developerworks/linux/library/l-fireboot.html?ca=dgr-lnxw06FireBoot


Steps to reproduce:
1) Install Linux with 100 MB boot partition on primary hard disk and remaining
linux partions on firewire drive.
2) Install kernel with initrd boot (initial ramdisk)
3) Install firewire drivers as modules for firewire (installing built-in to
kernel doesn't help, either)
4) Boot system - system loads firewire modules, but doesn't see any of the
/dev/sda disks. 
5) System reports as invalid sda disk( "the device /dev/sda1 is invalid" and
then requests for a root partition or go into shell). If SCSI emulation
rescanned the bus, it should see the disk.
Comment 1 Adrian Bunk 2006-12-06 01:48:45 UTC
Is this issue still present with kernel 2.6.19?
Comment 2 Stefan Richter 2006-12-06 05:08:04 UTC
The problem still exist in principle. Note, it's not just a FireWire thing; all
more modern SCSI transports like SAS or iSCSI and the USB storage have to deal
with it one way or another. The process of device discovery is practically
non-deterministic, unlike with the old parallel SCSI hardware.

The FireWire driver stack really only deals with the hot-plug case and has no
special provisions for cold-plugging. Note though that some distributors have
successfully set up environments for "install from FireWire disk" or even "root
filesystem on FireWire disk". The way to go is a basic hot-plug capable
environment in an initrd.

Recently though there was infrastructure added to the SCSI stack for
asynchronous parallelized SCSI scanning. I intend to look into its usefulness
for sbp2 once I have time for actual feature additions. AFAICS we would have to
let the administrator specify a timeout and/or a minimum number of SBP-2 units
to wait for.

I suggest we keep this bug entry as a reminder for integration of sbp2 with
async SCSI scanning, although it is not an ieee1394 subsystem bug per se.

Two side notes:
 - The "scsi add-single-device" method is not useful for FireWire disks under
Linux 2.6 anymore, it's only a necessary hack for Linux 2.4.
 - Of course each bugfix related to device recognition in the hot-plug case is
relevant to the cold-plug case too. Needless to say, there have been a lot
improvements in this department since 2004.
Comment 3 Don Krajewski 2006-12-06 05:56:28 UTC
The final solution was to write a custom initrd for the system where two scans 
are performed (not just one) with a bit of testing to make sure the timing 
would always work. I haven't tried setting up booting from a firewire or USB 
drive recently, but know of people who currently do something similar to this 
that don't have this problem in the 2.6.19 kernel. Thank you for reclassifying 
as a "low" level problem. DK
Comment 4 Don Krajewski 2007-09-04 08:02:51 UTC
I will be out of the office until Monday, 10 Sept. 2007. I will not be able to check e-mail while gone and will respond as soon as I am able to do so.

If you need a quicker response, please contact CNC through the web service request form on the CNC web site (http://www.engr.iupui.edu/cnc).

Thanks,

Don Krajewski
Comment 5 Natalie Protasevich 2007-09-19 05:41:32 UTC
Stefan, is this a "project" material?
Thanks.
Comment 6 Stefan Richter 2007-09-19 06:32:40 UTC
I have listed it at http://wiki.linux1394.org/ToDo which can be reached via http://kernelnewbies.org/KernelProjects.

If bugzilla.kernel.org is meant to be narrowed down to real bugs (as opposed to missing functionality, or "missed" functionality), then this bug entry should be "rejected" as "invalid" or maybe "documented".
Comment 7 Don Krajewski 2007-09-20 16:38:55 UTC

Maybe the question to ask is if this is a problem with the kernel module or
other packages? If the answer that this is a problem with the kernel module,
was this intended or something that was "missed" in design? Finally, is it
still relavant today? 

From the various projects I have worked on over the past year, this hasn't
shown up and I haven't been forced to use an initrd (workaround) for systems
like this. On the other hand, I haven't tested this fully to know if this
still is or isn't a problem. 


My $0.02: Close this issue as "rejected" and mark resolution as
"documented". This issue is probably no longer relevant, and this was
"missed" functionality from my understanding of design at the time the bug
was reported. 

DK
Comment 8 Stefan Richter 2007-09-20 23:01:26 UTC
Distros which support booting from FireWire have it typically solved in initrd by (1) loading the required drivers, (b) periodically looking if the desired block device is there --- I guess by means of sysfs, (c) proceeding to mount the root filesystem.

The new infrastructure for parallelized SCSI scanning which I mentioned in comment #2 has a side product, the scsi_wait_scan kernel module.  AFAIU, "modprobe scsi_wait_scan" will look up which scsi low-level drivers support it and then sleep until all these low-level drivers tell it that they are finished with (parallelized asynchronous) scanning.  I.e. the SCSI folks thus got the synchronization point back which they had before they implemented parallelized scanning.

So, this scsi_wait_scan side product could be useful for boot from FireWire too.  But since it isn't essential and since it would be a new feature and is tracked as such at the linux1394 TODO list, I close this bug entry as rejected/ documented.

Note You need to log in before you can comment on or make changes to this bug.