Latest working kernel version: N/A (never worked afaik) Earliest failing kernel version: 2.6.24.2 and 2.6.25.4 both have this issue Distribution: Debian Hardware Environment: Panasonic Toughbook CF-18 (not platform specific) Software Environment: Debian Etch + latest vanilla kernel from kernel.org Problem Description: The check_partition() function in fs/partitions/check.c fails to identify the partition table on commonly available pre-formated USB keys. This might be a an issue with the partition table generated in the USB flash disk manufacturing process. However, with that being said there are two Linux command-line utilities (fdisk and disktype) which can correctly identify the partition table. It seems silly to take the stand that the USB key has an invalid partition table therefore the kernel should ignore it and require the user to repartition and reformat their device before they can use it under Linux. This partition and file system works fine under Windows and Mac OS. Steps to reproduce: 1) Take the attached compressed dd image and write it to a removable USB block device. 2) Remove and plug in the block device. 3) Run dmesg and confirm you get something like this: #dmesg ... usb-storage: device found at 5 usb-storage: waiting for device to settle before scanning usb-storage: device scan complete scsi 5:0:0:0: Direct-Access CBM Flash Disk 4.00 PQ: 0 ANSI: 2 sd 5:0:0:0: [sdb] 127971 512-byte hardware sectors (66 MB) sd 5:0:0:0: [sdb] Write Protect is off sd 5:0:0:0: [sdb] Mode Sense: 00 00 00 00 sd 5:0:0:0: [sdb] Assuming drive cache: write through sd 5:0:0:0: [sdb] 127971 512-byte hardware sectors (66 MB) sd 5:0:0:0: [sdb] Write Protect is off sd 5:0:0:0: [sdb] Mode Sense: 00 00 00 00 sd 5:0:0:0: [sdb] Assuming drive cache: write through sdb: unknown partition table sd 5:0:0:0: [sdb] Attached SCSI removable disk 5) Run the fdisk command and verify the partition is really there: # fdisk -l /dev/sdb Disk /dev/sdb: 65 MB, 65521152 bytes 4 heads, 32 sectors/track, 999 cylinders Units = cylinders of 128 * 512 = 65536 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdb1 * 1 1000 63969+ 6 FAT16 5) Run the disktype command and confirm the partition and file system are really there: # disktype /dev/sdb --- /dev/sdb Block device, size 62.49 MiB (65521152 bytes) DOS/MBR partition map Partition 1: 62.47 MiB (65504768 bytes, 127939 sectors from 32, bootable) Type 0x06 (FAT16) Windows NTLDR boot loader FAT16 file system (hints score 4 of 5) Volume size 62.21 MiB (65229824 bytes, 63701 clusters of 1 KiB) 6) Set up a loop device to access the file system directly. # modprobe loop # losetup -o 16384 /dev/loop0 /dev/sd? <-- your device node goes here # mount -t vfat /dev/loop0 /mnt # ls /mnt fat16.txt # cat /mnt/fat16.txt This is a fat16 file system on a usb key. This particular key came with a CelleBrite cell phone dumping appliance that runs Windows CE, but I've seen this issue on a dozen or so brand new USB keys I've bought over the last year or so. I'll attach the disk image after I create this issue.
Created attachment 16299 [details] This is a compressed 64MB image from a USB key that exhibits the issue.
Created attachment 16321 [details] Kernel config (yes, MSDOS partitioning is enabled)
Created attachment 16324 [details] debug patch Unfortunately, I couldn't reproduce this on my USB storage with your data. And the partition table seems good. Umm... can you try attached patch for debugging this?
I'll apply your patch and run it now, but I've got an update. I've chased the issue down to the code in msdos.c that does the boot indicator test: p = (struct partition *) (data + 0x1be); for (slot = 1; slot <= 4; slot++, p++) { if (p->boot_ind != 0 && p->boot_ind != 0x80) { put_dev_sector(sect); return 0; } } For some reason the first p->boot_ind test comes back with a bogus value of 111 (0x6F) on the partition boot indicator check. Verifying the image using hexedit shows the value on the disk is indeed 128 (0x80). Obviously this code works when I boot from my hard drive and for USB keys partitioned/formated under Linux, so I have no idea why it doesn't work in this case. I'll apply your patch and get your the output in a few minutes.
Created attachment 16326 [details] dmesg output of patch
Created attachment 16327 [details] dmesg output from another key insertion event
Created attachment 16330 [details] hexedit output of the boot sector from the usb key (not all 80s)
BTW, this part of your patch was rejected and I couldn't find the referenced code to resolve it manually: @@ -453,10 +472,11 @@ int msdos_partition(struct parsed_partit fb = (struct fat_boot_sector *) data; if (slot == 1 && fb->reserved && fb->fats && fat_valid_media(fb->media)) { - printk("\n"); + printk("seems fat\n"); put_dev_sector(sect); return 1; } else { + printk("not fat\n"); put_dev_sector(sect); return 0; } I'm testing this on 2.6.25.4 right now. If you have a different version you want me to use, please let me know.
Sorry to keep posting like this, but I just wanted to point out that the first dmesg output (which still exhibits the unknown partition error) does not have the dd image I submitted initially. The second dmesg output *is* from a key with that image, as is the hexedit output I posted after that.
> 1) Take the attached compressed dd image and write it to a removable >USB block device. or maybe "modprobe g_file_storage file=/path/to/diskimage" ? (if g_file_storage worked correctly - see bug http://bugzilla.kernel.org/show_bug.cgi?id=10834 , found because of this bug)
Thanks. The rejected part of patch can be ignored, it's not necessary. (BTW, my kernel is current git) The dump of second dmesg is a first sector of this partition, and it seems all is 0x80. The cause of problem seems to be strange data from driver or something. Um.. if you run "blockdev --rereadpt /dev/sdb", it helps something? Also, if you write re-formated data (not pre-formated data) to the device and do same step, does it work?
Note, "write re-formated data" is not "fdisk". It means to replace only data.
Created attachment 16336 [details] dmesg from another insertion + ran the "blockdev --rereadpt /dev/sdb" command
As you suspected, it looks like the problem may be in the driver... Can anyone suggest next steps? Do I need to reassign this bug to Drivers/USB?
I'll CC this to linux-usb.
Although I don't know much about usb stuff, CONFIG_USB_STORAGE_DEBUG may help debug for this.
A better alternative may be to use usbmon (see Documentation/usb/usbmon.txt).
Created attachment 16377 [details] usbmon output from Inserting the key Let me know if these attachments give the information needed to debug this.
Created attachment 16378 [details] usbmon output from dd'ing the master boot record
Created attachment 16379 [details] usbmon output from removing the usb key
Created attachment 16380 [details] Output of the dd command as text via hd
The usbmon log in comment 18 is very interesting. It includes two reads of the master boot record: the first at timestamp 4166876664 (this is the system reading the partition table) and the second at timestamp 4167036093 (this is some application program reading the partition table). Although the commands sent to the device are identical except for the Tag value, the responses are different. The first response starts with: 80808080 80808080 80808080 80808080 80808080 80808080 (which looks like garbage), and the second response starts with: fa33c08e d0bc007c 8bf45007 501ffbfc bf0006b9 0001f2a5 (which looks like valid data). Evidently this device is buggy. Maybe it never responds correctly to the first read, or maybe it needs a longer time delay before it will work (although there is already a 5-second delay). At this point the best solution seems to be the "blockdev --rereadpt" command mentioned in comment 11.
I appreciate the responses and time to investigate this issue. I was able to get the partitions to show up running the "blockdev --rereadpt" command as suggested. The partitions even got propagated through hal properly. I just wish we had something other than the dmesg output to go on to indicate this error occurred... My question at this point is why does this device's partitions get detected properly under Windows and Mac OS and not Linux. If this was a completely isolated case I'd agree that it is a bad device and leave it at that. Unfortunately, I have to work with vendor supplied equipment and out of the dozens of different vendor's USB keys I've seen the last year or two, I have seen approximately 10 keys with this same kind of symptom under Linux... and they all work with Windows and Mac systems. And outside of this one issue, they are fast and reliable storage devices. I'll spend some time tracking down the bad keys to get better data on the failures. Does Linux have a more restrictive timeout value to allowing new USB devices to settle than other operating systems? What about retrying the initial failed reads once or twice before writing off a new device? Assuming everyone is developing to the same USB spec (not a conclusion I'm willing to make) why does a given product work perfectly well on the two other major operating systems, but not Linux? It can't be some conspiracy. Are there any other suggestions for diagnosis at this point? Would someone be interested in seeing this key for a closer inspection? I'd be happy to drop it in the mail if you want to contact me privately with an address. Thanks again.
I can only speculate about the cause. My guess is that the device always returns garbage for the first read and then starts working correctly. (Note: The read doesn't fail! It just returns garbage. The kernel has no way of knowing that the read should be retried.) As for the other USB keys you've seen with the same symptoms, their firmware was probably written by the same manufacturer, leading them all to behave the same way. Linux's timeout values are not more restrictive than other OSs', as far as I know. In fact the default settling time is unreasonably long: 5 seconds. You can see this in the usbmon output from inserting the key, if you know what to look for. There's a 5-second gap in the timestamps (second column of the output) between 4161840720 (when the device enumeration finished) and 4166840528 (when usb-storage started communicating with the device) -- note that these values are in microseconds. Why does a given product work perfectly well on the two other major operating systems but not Linux? The answer is very simple: Manufacturers _test_ their products with Windows (almost always) and OS-X (sometimes), but only a few test with Linux. So of course the devices don't work. I know from my own tests that Windows reads the partition sector multiple times. It probably ignores the garbage data and uses the valid information that comes from the later reads. (I don't know how Mac OS-X works and I haven't tried testing it.) Linux doesn't operate that way; it reads the partition sector just once and believes what the device tells it. You can test this guess by running a USB sniffer program under Windows (for example, SnoopyPro). I bet it shows the same garbage data coming through on the first read of the partition sector.
Created attachment 16451 [details] A very simple (ugly) work-around using hal and blockdev Okay, I give. I've just created a work-around for me and any others that have this issue. It uses hal to identify devices that have storage.partition_scheme = 'none' and runs blockdev --rereadpt against the device node. This works for me, but I haven't really looked at the wider implications of doing this. Use at your own risk, disclaimer, etc.
If your solution works, you may as well close out this bug report. You can mark it REJECTED, since it isn't a bug in the kernel but rather a bug in the device. On the other hand, if enough other people experience this same sort of problem then we'll end up implementing a workaround eventually...
Per your advice, I am closing this issue. The solution I found is good enough for now. Thanks everyone for your help on this.
brand new FLASH Drive AU_USB20, same problem, ubuntu 2.6.24-25-generic, no i'm not bothering to try a more recent kernel, if it's been fixed THANK YOU. if not, here's evidence of more brand new rogue flash drives that work fine in windows. drivers work around rogue hardware all the time, that's life. ignore if you like, that just helps keep linux as an also-ran.
Greg, can you attach a usbmon trace showing what happens when you plug in your new flash drive?
Created attachment 24488 [details] usbmon while plugging in flashdrive usbmon as requested by Alan Stern (Comment #29) at the same time this appears in /var/log/messages: Jan 9 03:00:48 nana kernel: [3574285.570294] usb 1-2: new full speed USB device using uhci_hcd and address 5 Jan 9 03:00:48 nana kernel: [3574285.738726] usb 1-2: configuration #1 chosen from 1 choice Jan 9 03:00:48 nana kernel: [3574285.751800] scsi5 : SCSI emulation for USB Mass Storage devices Jan 9 03:00:53 nana kernel: [3574290.761850] scsi 5:0:0:0: Direct-Access FLASH Drive AU_USB20 8.07 PQ: 0 ANSI: 2 Jan 9 03:00:53 nana kernel: [3574290.783720] sd 5:0:0:0: [sdb] 8216576 512-byte hardware sectors (4207 MB) Jan 9 03:00:53 nana kernel: [3574290.786711] sd 5:0:0:0: [sdb] Write Protect is off Jan 9 03:00:53 nana kernel: [3574290.798704] sd 5:0:0:0: [sdb] 8216576 512-byte hardware sectors (4207 MB) Jan 9 03:00:53 nana kernel: [3574290.801701] sd 5:0:0:0: [sdb] Write Protect is off Jan 9 03:00:54 nana kernel: [3574290.801793] sdb: unknown partition table Jan 9 03:00:54 nana kernel: [3574291.286955] sd 5:0:0:0: [sdb] Attached SCSI removable disk Jan 9 03:00:54 nana kernel: [3574291.287205] sd 5:0:0:0: Attached scsi generic sg2 type 0
Created attachment 24489 [details] usbmon while mounting flashdrive and fwiw here is usbmon while mounting the just-inserted flashdrive
The usbmon trace definitely shows that your device doesn't have the same bug as Anthony's, because unlike his, your device returns valid data the first time it is read. One more test: Let's see the output from dd if=/dev/sdb count=1 | hexdump -C
Created attachment 24495 [details] dd if=/dev/sdb count=1 | hexdump -C
That looks clear enough. The kernel reports "unknown partition table" because the drive doesn't _have_ a partition table! Instead of trying to mount /dev/sdb1, you should just mount /dev/sdb.
ok, fair enough, i read between the lines here that what's lacking are some heuristics/hacks somewhere that notice this lack of partition table, and mount accordingly, i'll venture a guess that's the domain of some userland usb hotplug system, hence not the kernel. sorry to bother. nice work.