Latest working kernel version: 2.6.13 Earliest failing kernel version: 2.6.16 (that I've been able to test, may be 2.6.14) Distribution: Red Hat Enterprise 4, 5, Fedora 9, SuSE 9, 10, SuSE Enterprise 10, and Ubuntu 8.04 tested Hardware Environment: both x86 and amd64 on server, Quantum CD160UE-SST tape drive with 80/160 GB tape Software Environment: Linux Problem Description: Any attempt to read() or write() more than 122880 bytes at a time to the USB tape device results in EBUSY. Steps to reproduce: This happened inside our application when attempting to do a backup with a large write size. However, it can be reproduced with tar: - tar -b 240 -cf /dev/st0 foo (works) - tar -b 241 -cf /dev/st0 foo tar: /dev/st0: Cannot write: Device or resource busy tar: Error is not recoverable: exiting now The reads and writes worked fine in 2.6.13 and earlier.
Ok so the earliest you found the bug is 2.6.16 but I assume from the comments about distibution you also tried recent kernels as well and saw the same ? The cases where /dev/st will return EBUSY are pretty limited - An asynchronous command already being active - Out of control blocks for the tape - Buffer/request too large (which seems the likely candidate)
Yes, I've tried right up to 2.6.27 with the same failure. I did not try 2.6.14 or 2.6.15 because I don't have a machine on hand with those kernels, but 2.6.13 worked fine. Based on the behaviour I'm seeing, I'm guessing it's the third thing you listed, but the limit appears to have changed to something that's unreasonably small for a tape device.
I have been testing large blocks with 2.6.27.7 and I was able to reach 6 MB without any tricks. The difference between your system and mine is that my tape is connected with SCSI and yours is with USB. Looking at linux/drivers/usb/storage/scsiglue,c I find the following: /* limit the total size of a transfer to 120 KB */ .max_sectors = 240, Maybe the USB experts can tell what is the reason for this.
Reply-To: oliver@neukum.org Am Dienstag, 23. Dezember 2008 08:30:49 schrieb bugme-daemon@bugzilla.kernel.org: > http://bugzilla.kernel.org/show_bug.cgi?id=12207 > > > kai.makisara@kolumbus.fi changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |kai.makisara@kolumbus.fi > > > > > ------- Comment #3 from kai.makisara@kolumbus.fi 2008-12-22 23:30 ------- > I have been testing large blocks with 2.6.27.7 and I was able to reach 6 MB > without any tricks. The difference between your system and mine is that my > tape > is connected with SCSI and yours is with USB. > > Looking at linux/drivers/usb/storage/scsiglue,c I find the following: > > /* limit the total size of a transfer to 120 KB */ > .max_sectors = 240, > > Maybe the USB experts can tell what is the reason for this. Many USB devices fail larger transfers. Regards Oliver
Many don't and this causes a serious tape drive regression It has crippled working hardware for the benefit of junk.
You can change the max_sectors setting through sysfs. However the last time I looked, the block layer limited max_sectors to 512 KB or something on that order, so you can't get too much improvement. Why is a limit of 120 KB unreasonably small? All it means is that you have to use more system calls to transfer the same amount of data. Is anything wrong with that?
Tapes don't work that way. Each call to write() causes striping to be put on the tape. It has to be read and written in exactly the same way. Since older kernels had no problem with the larger block size, and all other operating systems don't have this limit, all existing tapes (with larger block sizes) are unreadable. We routinely write blocks of up to 2MB at a time during our backup procedure. This makes them impossible to restore on recent Linux systems. Maybe rather than limiting the transfer size, you should detect the failure on devices that do fail and report that.
Also note - the tape drives don't go via the block layer so aren't subjected to the block limits.
Reply-To: bharrosh@panasas.com bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12207 > > > > > > ------- Comment #6 from stern@rowland.harvard.edu 2008-12-23 05:59 ------- > You can change the max_sectors setting through sysfs. However the last time > I > looked, the block layer limited max_sectors to 512 KB or something on that > order, so you can't get too much improvement. > > Why is a limit of 120 KB unreasonably small? All it means is that you have > to > use more system calls to transfer the same amount of data. Is anything wrong > with that? > > This is not true BIOs can be chained and also SGs can be chained. You should be able to configure a very large transfer Boaz
Even if this is true, it does not work out of the box. Is it the intent to make it harder (non-obvious and complicated) to do things that people want to do and are trivial on all other OSes? Remember, this used to work trivially...it's a regression. If the only excuse for the limit is that some devices don't like large transfers, I don't buy it. Those devices should be the exception, not the rule.
Reply-To: James.Bottomley@HansenPartnership.com > ------- Comment #6 from stern@rowland.harvard.edu 2008-12-23 05:59 ------- > You can change the max_sectors setting through sysfs. However the last time > I > looked, the block layer limited max_sectors to 512 KB or something on that > order, so you can't get too much improvement. > > Why is a limit of 120 KB unreasonably small? All it means is that you have > to > use more system calls to transfer the same amount of data. Is anything wrong > with that? Tapes need large block sizes. We can accommodate both: Just check for TYPE_TAPE in the slave_configure() and bump the limit back up. Any USB tape that doesn't do large block transfers will be truly broken. James
Reply-To: James.Bottomley@HansenPartnership.com On Tue, 2008-12-23 at 08:55 -0600, James Bottomley wrote: > > ------- Comment #6 from stern@rowland.harvard.edu 2008-12-23 05:59 ------- > > You can change the max_sectors setting through sysfs. However the last > time I > > looked, the block layer limited max_sectors to 512 KB or something on that > > order, so you can't get too much improvement. > > > > Why is a limit of 120 KB unreasonably small? All it means is that you have > to > > use more system calls to transfer the same amount of data. Is anything > wrong > > with that? > > Tapes need large block sizes. > > We can accommodate both: Just check for TYPE_TAPE in the > slave_configure() and bump the limit back up. Any USB tape that doesn't > do large block transfers will be truly broken. Following up on this, does this fix it? I notice that Linus was the one who actually committed this change in 2.6.0-test10, so it's been in the entire 2.6 release. James --- diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c index 09779f6..ae4b01c 100644 --- a/drivers/usb/storage/scsiglue.c +++ b/drivers/usb/storage/scsiglue.c @@ -127,7 +127,12 @@ static int slave_configure(struct scsi_device *sdev) if (sdev->request_queue->max_sectors > max_sectors) blk_queue_max_sectors(sdev->request_queue, max_sectors); - } + } else if (sdev->type == TYPE_TAPE) + /* Tapes need much higher max sector transfers, so just + * raise it to the maximum possible and let the queue + * segment size sort out the real limit + */ + blk_queue_max_sectors(sdev->request_queue, 0xFFFF); /* We can't put these settings in slave_alloc() because that gets * called before the device type is known. Consequently these
What may have happened is something else changed to trigger the enforcement of that limit on not block paths ?
On Tue, 23 Dec 2008, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12207 > > > > > > ------- Comment #13 from alan@lxorguk.ukuu.org.uk 2008-12-23 08:18 ------- > What may have happened is something else changed to trigger the enforcement > of > that limit on not block paths ? > Something like this ;-) From 2.6.16 st.c has used scsi_execute_async() that sends the request to the block layer. Kai
Reply-To: James.Bottomley@HansenPartnership.com On Tue, 2008-12-23 at 18:30 +0200, Kai Makisara wrote: > On Tue, 23 Dec 2008, bugme-daemon@bugzilla.kernel.org wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=12207 > > > > > > > > > > > > ------- Comment #13 from alan@lxorguk.ukuu.org.uk 2008-12-23 08:18 ------- > > What may have happened is something else changed to trigger the enforcement > of > > that limit on not block paths ? > > > Something like this ;-) From 2.6.16 st.c has used scsi_execute_async() > that sends the request to the block layer. That's probably it! Realistically, though, allowing st to override the block limits was wrong. Most drivers (except USB) don't set these arbitrarily, they usually represent fundamental hardware limits. If you force down a transaction that's larger than they declared themselves capable of, they'll do strange things like wrap descriptors or truncate the transaction, which will cause silent data corruption. Hopefully we can figure out how to get USB working. James
USB is working fine, and the patch you proposed should take care of tape devices okay. However the block layer still limits max_sectors to BLK_DEF_MAX_SECTORS (see the code for blk_queue_max_sectors), which is defined as 1024. Thus there's no way to request a transfer of more than 512 KB, unless st.c does some fancy footwork.
Reply-To: fujita.tomonori@lab.ntt.co.jp On Tue, 23 Dec 2008 18:56:21 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12207 > > > > > > ------- Comment #16 from stern@rowland.harvard.edu 2008-12-23 18:56 ------- > USB is working fine, and the patch you proposed should take care of tape > devices okay. However the block layer still limits max_sectors to > BLK_DEF_MAX_SECTORS (see the code for blk_queue_max_sectors), which is > defined > as 1024. Thus there's no way to request a transfer of more than 512 KB, > unless > st.c does some fancy footwork. st uses pc requests thus max_hw_sectors matters. st.c can use large block sizes without any fancy footwork.
Does James's patch fix the problem? If it does, would we be better off making 0xFFFF the default max_sectors value and then explicitly decreasing it to 240 only for disk-type devices? Or would that be liable to cause problems for things like CD writers and cdrecord? Alan Stern PS: Why is 0xFFFF the accepted maximum value? Historical reasons? In theory, the real maximum should be one less than the number of sectors in 4 GB, i.e., ((1 << (32-9)) - 1 or 0x7FFFFF.
Created attachment 20126 [details] Increase max_sectors for tape drives Phil, are you still there? This is the patch in the form I intend to submit. Does it fix your problem?
Yes, sorry. I meant to get back to you but time has been tight. As such, I still haven't had a chance to recompile a kernel and try it out. I agree that the patch will probably fix the problem. I'll do my best to find some time to try it out within the next week.
Still no response. Should I forget about the patch?
*ugh* sorry, I've been really busy. Not surprisingly, I'm not set up to actually build Linux kernels as I'm not a Linux kernel developer and don't generally care to do so. I've done it before, and I don't mind doing it for this, but it disrupts the rest of my work, so I have to find appropriate time. In any case, I'll do my best to get there soon. Thank you for your patience. Keep in mind, wrt the Linux kernel, I'm just an end-user.
Okay, I *finally* managed to try it out. It does seem to solve the problem. Thanks, Phil
5c16034d73da2c1b663aa25dedadbc533b3d811c
Created attachment 20859 [details] Following up circumstances Dear fellows, Thank you so much about maintaining linux kernel and also releasing the version which contains this fix. Now I attached another patch including more features; - returning max_hw_sectors for reading max_sectors - to limit max_sectors by the special value for usb storage I think these features are required. Of course they are useful for me, I may run my code on many versions of kernel so max_sectors should be checkable and changeable at anytime. Please discuss about them. For addition, my PC can send/recv 4MB block but with 8MB ioctl() will report ENOMEM. Usually the PC is running with 1GB RAM and increasing RAM is not effective. I'm sorry my target is only SCSI Generic so I don't know how SCSI Tape may behave. Sincerely, Teruo Oshida
On Tue, 7 Apr 2009 bugzilla-daemon@bugzilla.kernel.org wrote: > Now I attached another patch including more features; > > - returning max_hw_sectors for reading max_sectors Why do you want to do this? The attribute is named "max_sectors", so shouldn't it return the value of max_sectors? > - to limit max_sectors by the special value for usb storage Where did your limit come from? I agree, the value should be limited, but the limit should be 4 GB - 1. Not 8 MB. > I think these features are required. > Of course they are useful for me, I may run my code on many versions of > kernel > so max_sectors should be checkable and changeable at anytime. Did you know that max_sectors can also be changed through the block interface? For example, under /sys/block/sda/queue/ you can write to max_sectors_kb and you can read max_hw_sectors_kb. > For addition, my PC can send/recv 4MB block but with 8MB ioctl() will report > ENOMEM. Usually the PC is running with 1GB RAM and increasing RAM is not > effective. You'll have to trace this down by yourself. Alan Stern
Thanks for reading. Please share that this is just for tape device. > You'll have to trace this down by yourself. Yes of course, but this is not a matter for me because I was satisfied with 4MB limit. #Via SAS interface, the block size limit for tape device is 4MB w/o any configurations. #It can be also one of the reason to apply the value for me. ##On Solaris 10, it is larger than 4MB and I've not found the limit. > Where did your limit come from? Just above experiments. But 4MB (or twiced 8MB) is not my recommend. For reguralizing I hope you to decide which is reasonable value. Generally, there is no device which can operate the block size larger than 0xffffff bytes. This is from the SCSI tape device specification. (T10/SSC) > Why do you want to do this? The attribute is named "max_sectors", so > shouldn't it return the value of max_sectors? I can agree your suggestion but then I need declaring the read only attribute named "max_hw_sectors". I don't know the restrictions arround scsi driver stacks, ex, what max_sectors is used for and max_hw_sectors is used for. But by simple image, if a really effective value can be set onto a variable, it should be verifiable by reading same variable. > Did you know that max_sectors can also be changed through the block > interface? I'm sorry I did not. Thanks for teaching. But I could not find the device (of course tape drive) under that tree. I think that is just for kinds of block device which can be manipulated with T10/SBC command set. Sincerely, Teruo Oshida
On Fri, 10 Apr 2009 bugzilla-daemon@bugzilla.kernel.org wrote: > --- Comment #27 from oshida@bb-next.net 2009-04-10 09:48:44 --- > Thanks for reading. > > Please share that this is just for tape device. Your patch affects the max_sectors attribute file for all devices, not just for tape devices. > > Where did your limit come from? > > Just above experiments. > But 4MB (or twiced 8MB) is not my recommend. For reguralizing I hope you to > decide which is reasonable value. > > Generally, there is no device which can operate the block size larger than > 0xffffff bytes. This is from the SCSI tape device specification. (T10/SSC) So there is no _tape_ device which can operate with larger block size. But maybe a non-tape device can. Besides, the block size isn't the same as max_sectors. max_sectors is allowed to be larger than the block size (but it mustn't be smaller). > > Why do you want to do this? The attribute is named "max_sectors", so > > shouldn't it return the value of max_sectors? > > I can agree your suggestion but then I need declaring the read only attribute > named "max_hw_sectors". Why? > I don't know the restrictions arround scsi driver stacks, ex, what > max_sectors > is used for and max_hw_sectors is used for. But by simple image, if a really > effective value can be set onto a variable, it should be verifiable by > reading > same variable. max_hw_sectors is supposed to be the largest transfer size supported by the hardware. max_sectors is supposed to be the largest transfer size the kernel will use. Therefore we should always have max_sectors <= max_hw_sectors. With USB mass-storage devices this is difficult, because the driver doesn't know what transfer sizes are supported by the hardware. The USB protocol doesn't provide this information. > > Did you know that max_sectors can also be changed through the block > > interface? > > I'm sorry I did not. Thanks for teaching. > But I could not find the device (of course tape drive) under that tree. I > think > that is just for kinds of block device which can be manipulated with T10/SBC > command set. No, it is for all block devices. If you can't find your device under /sys/block/ then look for it somewhere else, such as under /sys/bus/scsi/devices/. Alan Stern
On Fri, 10 Apr 2009, bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12207 > > > > > > --- Comment #28 from Alan Stern <stern@rowland.harvard.edu> 2009-04-10 > 15:19:23 --- > On Fri, 10 Apr 2009 bugzilla-daemon@bugzilla.kernel.org wrote: > > > --- Comment #27 from oshida@bb-next.net 2009-04-10 09:48:44 --- ... > > > Did you know that max_sectors can also be changed through the block > > > interface? > > > > I'm sorry I did not. Thanks for teaching. > > But I could not find the device (of course tape drive) under that tree. I > think > > that is just for kinds of block device which can be manipulated with > T10/SBC > > command set. > > No, it is for all block devices. If you can't find your device under > /sys/block/ then look for it somewhere else, such as under > /sys/bus/scsi/devices/. > Maybe you should check your facts before telling others where to look. max_sectors_kb etc. exist only for block devices (in the Unix sense). Tapes are character devices. Kai
On Fri, 10 Apr 2009 bugzilla-daemon@bugzilla.kernel.org wrote: > Maybe you should check your facts before telling others where to look. > max_sectors_kb etc. exist only for block devices (in the Unix sense). > Tapes are character devices. There's no need to be rude. Besides, how would you suggest I check my facts (bearing in mind that I don't have access to any SCSI or USB tape drives)? As the author of st.c, you certainly are the authority on how this works. And it's definitely true that tapes are char devices. But isn't it also true that SCSI tapes, like all SCSI devices, have their device queue set up by scsi_alloc_queue(), which calls blk_init_queue(), which registers the queue's kobject using blk_queue_ktype, whose default attributes include both queue_max_sectors_entry.attr and queue_max_hw_sectors_entry.attr? And doesn't this mean that SCSI tapes _do_ have max_sectors_kb etc. attributes? Alan Stern
On Sat, 11 Apr 2009, bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12207 > > > > > > --- Comment #30 from Alan Stern <stern@rowland.harvard.edu> 2009-04-11 > 02:38:27 --- > On Fri, 10 Apr 2009 bugzilla-daemon@bugzilla.kernel.org wrote: > > > Maybe you should check your facts before telling others where to look. > > max_sectors_kb etc. exist only for block devices (in the Unix sense). > > Tapes are character devices. > > There's no need to be rude. Besides, how would you suggest I check my > facts (bearing in mind that I don't have access to any SCSI or USB tape > drives)? > I am sorry. My words were inappropriate. But even without a tape drive it is possible to read the code. > As the author of st.c, you certainly are the authority on how this > works. And it's definitely true that tapes are char devices. > > But isn't it also true that SCSI tapes, like all SCSI devices, have > their device queue set up by scsi_alloc_queue(), which calls > blk_init_queue(), which registers the queue's kobject using > blk_queue_ktype, whose default attributes include both > queue_max_sectors_entry.attr and queue_max_hw_sectors_entry.attr? And > doesn't this mean that SCSI tapes _do_ have max_sectors_kb etc. > attributes? > Yes, SCSI uses the (Linux) block layer and the attributes do exist within the kernel. They just are not visible in sysfs for character devices. Last night I read the code to find out why. Your analysis is correct but it only shows that the kobject exists. The kobject is added to sysfs using kobject_add() in blk_register_queue() called by add_disk(), which is only called by disk drivers. Whether character devices should add the kobject somewhere in sysfs is getting beyond the original topic in this bugzilla entry. This is a more general problem because there is no /sys/char directory. My current opinion is that it would be nice to see these attributes somewhere, but probably not worth the trouble. Kai
> Yes, SCSI uses the (Linux) block layer and the attributes do exist within > the kernel. They just are not visible in sysfs for character devices. Last > night I read the code to find out why. Your analysis is correct but it > only shows that the kobject exists. The kobject is added to sysfs using > kobject_add() in blk_register_queue() called by add_disk(), which is only > called by disk drivers. You are right. And this makes me feel better -- I had thought that usb-storage's max_sectors attribute was redundant. Now I know that for some SCSI-over-USB devices, the information is not accessible anywhere else. > Whether character devices should add the kobject somewhere in sysfs is > getting beyond the original topic in this bugzilla entry. This is a more > general problem because there is no /sys/char directory. My current > opinion is that it would be nice to see these attributes somewhere, but > probably not worth the trouble. Until somebody asks for them, we're okay. However it shouldn't be very hard to write a function like blk_register_queue() that accepts a non-disk device as parent. (And likewise for blk_unregister_queue, of course.) Alan Stern
Dear all, Thanks for discussing. Against my expectations the story was aimed more generic goal. It is fine. > Your patch affects the max_sectors attribute file for all devices, not just > for tape devices. Yes, I know my code is experimental. It should check the type of a target device to affect only for tape devices. > So there is no _tape_ device which can operate with larger block size. > But maybe a non-tape device can. But, usually, accepting very large block size is used to improve the performance with a tape device. So,,, at least about my circumstance, changing only scsiglue.c is a nice solution. Teruo
On Wed, 15 Apr 2009 bugzilla-daemon@bugzilla.kernel.org wrote: > Dear all, > > Thanks for discussing. > Against my expectations the story was aimed more generic goal. It is fine. > > > > Your patch affects the max_sectors attribute file for all devices, not just > for tape devices. > > Yes, I know my code is experimental. > It should check the type of a target device to affect only for tape devices. > > > > So there is no _tape_ device which can operate with larger block size. > > But maybe a non-tape device can. > > But, usually, accepting very large block size is used to improve the > performance with a tape device. > > So,,, at least about my circumstance, changing only scsiglue.c is a nice > solution. This sort of issue should be discussed by email, not on Bugzilla. If you want to continue talking about it, post your messages to <usb-storage@lists.one-eyed-alien.net> and CC: anyone who might be particularly interested. Alan Stern