Bug 201609

Summary: sysfs duplicate filename on driver loading Adaptec AIC-9410W
Product: SCSI Drivers Reporter: Vitaly Vlasov (vnigtha)
Component: AIC94XXAssignee: scsi_drivers-aic94xx
Status: NEW ---    
Severity: normal CC: bjorn, bvanassche, emil.l.velikov, pablo.ruiz, vnigtha
Priority: P1    
Hardware: x86-64   
OS: Linux   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=1443678
Kernel Version: 4.18.16 Subsystem:
Regression: No Bisected commit-id:

Description Vitaly Vlasov 2018-11-03 09:56:34 UTC
Kernel can't load aic94xx driver

Motherboard: Supermicro X7DA8/X7DA8, BIOS 2.0b 03/20/2008
RAID controller: AIC-9410W SAS/SATA Host Adapter, device 0000:07:02.0

lspci:
00:00.0 Host bridge: Intel Corporation 5000X Chipset Memory Controller Hub (rev 31)
00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 2 (rev 31)
00:03.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 3 (rev 31)
00:04.0 PCI bridge: Intel Corporation 5000X Chipset PCI Express x16 Port 4-7 (rev 31)
00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA Engine (rev 31)
00:10.0 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 31)
00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 31)
00:10.2 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 31)
00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev 31)
00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev 31)
00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 31)
00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 31)
00:1d.0 USB controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09)
00:1d.1 USB controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09)
00:1d.2 USB controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09)
00:1d.3 USB controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #4 (rev 09)
00:1d.7 USB controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
00:1e.2 Multimedia audio controller: Intel Corporation 631xESB/632xESB AC '97 Audio Controller (rev 09)
00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09)
00:1f.2 IDE interface: Intel Corporation 631xESB/632xESB/3100 Chipset SATA IDE Controller (rev 09)
00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2)
01:00.1 Audio device: NVIDIA Corporation High Definition Audio Controller (rev a1)
02:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01)
02:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01)
03:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01)
03:02.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E3 (rev 01)
04:00.0 PCI bridge: Intel Corporation 6702PXH PCI Express-to-PCI Bridge A (rev 09)
06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
07:02.0 RAID bus controller: Adaptec AIC-9410W SAS (Razor ASIC RAID) (rev 09)


dmesg:
[66831.520219] aic94xx: Adaptec aic94xx SAS/SATA driver version 1.0.3 loaded
[66831.520555] aic94xx: found Adaptec AIC-9410W SAS/SATA Host Adapter, device 0000:07:02.0
[66831.520559] scsi host2: aic94xx
[66831.521049] aic94xx: BIOS present (1,1), 1822
[66831.521050] aic94xx: ue num:1, ue size:88
[66831.533221] aic94xx: manuf sect SAS_ADDR 500304800050ad20
[66831.533223] aic94xx: manuf sect PCBA SN ORG
[66831.533224] aic94xx: ms: num_phy_desc: 8
[66831.533224] aic94xx: ms: phy0: ENABLED
[66831.533225] aic94xx: ms: phy1: ENABLED
[66831.533226] aic94xx: ms: phy2: ENABLED
[66831.533226] aic94xx: ms: phy3: ENABLED
[66831.533227] aic94xx: ms: phy4: ENABLED
[66831.533228] aic94xx: ms: phy5: ENABLED
[66831.533228] aic94xx: ms: phy6: ENABLED
[66831.533229] aic94xx: ms: phy7: ENABLED
[66831.533230] aic94xx: ms: max_phys:0x8, num_phys:0x8
[66831.533231] aic94xx: ms: enabled_phys:0xff
[66831.541491] aic94xx: ctrla: phy0: sas_addr: 500304800050ad20, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
[66831.541493] aic94xx: ctrla: phy1: sas_addr: 500304800050ad20, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
[66831.541495] aic94xx: ctrla: phy2: sas_addr: 500304800050ad20, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
[66831.541496] aic94xx: ctrla: phy3: sas_addr: 500304800050ad20, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
[66831.541498] aic94xx: ctrla: phy4: sas_addr: 500304800050ad20, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
[66831.541499] aic94xx: ctrla: phy5: sas_addr: 500304800050ad20, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
[66831.541501] aic94xx: ctrla: phy6: sas_addr: 500304800050ad20, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
[66831.541502] aic94xx: ctrla: phy7: sas_addr: 500304800050ad20, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
[66831.541507] aic94xx: max_scbs:512, max_ddbs:128
[66831.541510] aic94xx: setting phy0 addr to 500304800050ad20
[66831.541511] aic94xx: setting phy1 addr to 500304800050ad20
[66831.541512] aic94xx: setting phy2 addr to 500304800050ad20
[66831.541513] aic94xx: setting phy3 addr to 500304800050ad20
[66831.541514] aic94xx: setting phy4 addr to 500304800050ad20
[66831.541518] aic94xx: setting phy5 addr to 500304800050ad20
[66831.541519] aic94xx: setting phy6 addr to 500304800050ad20
[66831.541519] aic94xx: setting phy7 addr to 500304800050ad20
[66831.541543] aic94xx: num_edbs:21
[66831.541545] aic94xx: num_escbs:3
[66831.541611] aic94xx: Found sequencer Firmware version 1.1 (V30)
[66831.541612] aic94xx: downloading CSEQ...
[66831.541628] aic94xx: dma-ing 8192 bytes
[66831.543732] aic94xx: verified 8192 bytes, passed
[66831.543733] aic94xx: downloading LSEQs...
[66831.543746] aic94xx: dma-ing 14336 bytes
[66831.547447] aic94xx: LSEQ0 verified 14336 bytes, passed
[66831.550965] aic94xx: LSEQ1 verified 14336 bytes, passed
[66831.554477] aic94xx: LSEQ2 verified 14336 bytes, passed
[66831.557990] aic94xx: LSEQ3 verified 14336 bytes, passed
[66831.561505] aic94xx: LSEQ4 verified 14336 bytes, passed
[66831.565019] aic94xx: LSEQ5 verified 14336 bytes, passed
[66831.568531] aic94xx: LSEQ6 verified 14336 bytes, passed
[66831.572044] aic94xx: LSEQ7 verified 14336 bytes, passed
[66831.584868] aic94xx: max_scbs:446
[66831.584869] aic94xx: first_scb_site_no:0x20
[66831.584870] aic94xx: last_scb_site_no:0x1fe
[66831.584890] aic94xx: First SCB dma_handle: 0x22dbe4000
[66831.585312] aic94xx: device 0000:07:02.0: SAS addr 500304800050ad20, PCBA SN ORG, 8 phys, 8 enabled phys, flash present, BIOS build 1822
[66831.585333] aic94xx: posting 3 escbs
[66831.585334] aic94xx: escbs posted
[66831.585342] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:03.0/0000:02:00.3/0000:07:02.0/revision'
[66831.585345] CPU: 3 PID: 2167 Comm: modprobe Not tainted 4.18.16-arch1-1-ARCH #1
[66831.585346] Hardware name: Supermicro X7DA8/X7DA8, BIOS 2.0b 03/20/2008
[66831.585347] Call Trace:
[66831.585357]  dump_stack+0x5c/0x80
[66831.585362]  sysfs_warn_dup.cold.0+0x17/0x2a
[66831.585366]  sysfs_add_file_mode_ns+0x147/0x170
[66831.585372]  asd_pci_probe.cold.12+0x8c0/0xb7f [aic94xx]
[66831.585377]  local_pci_probe+0x41/0x90
[66831.585380]  pci_device_probe+0x115/0x1a0
[66831.585384]  driver_probe_device+0x2da/0x450
[66831.585387]  __driver_attach+0xdd/0x110
[66831.585389]  ? driver_probe_device+0x450/0x450
[66831.585391]  ? driver_probe_device+0x450/0x450
[66831.585393]  bus_for_each_dev+0x76/0xc0
[66831.585396]  bus_add_driver+0x152/0x230
[66831.585398]  ? 0xffffffffc0541000
[66831.585400]  driver_register+0x6b/0xb0
[66831.585402]  ? 0xffffffffc0541000
[66831.585406]  aic94xx_init+0xf4/0x1000 [aic94xx]
[66831.585409]  do_one_initcall+0x46/0x1f5
[66831.585414]  ? free_unref_page_commit+0x70/0xf0
[66831.585418]  ? kmem_cache_alloc_trace+0x181/0x1d0
[66831.585421]  ? do_init_module+0x22/0x210
[66831.585424]  do_init_module+0x5a/0x210
[66831.585426]  load_module+0x2384/0x2590
[66831.585431]  ? vmap_page_range_noflush+0x23f/0x350
[66831.585434]  ? __se_sys_init_module+0x10c/0x170
[66831.585436]  __se_sys_init_module+0x10c/0x170
[66831.585440]  do_syscall_64+0x5b/0x170
[66831.585444]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[66831.585447] RIP: 0033:0x7fa71570c56e
[66831.585448] Code: 48 8b 0d f5 18 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c2 18 0c 00 f7 d8 64 89 01 48 
[66831.585484] RSP: 002b:00007fff1d8cd2d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[66831.585486] RAX: ffffffffffffffda RBX: 0000560df87a8c60 RCX: 00007fa71570c56e
[66831.585488] RDX: 0000560df87a9640 RSI: 0000000000041a39 RDI: 0000560df8ffa080
[66831.585489] RBP: 0000560df87a9640 R08: 0000000000000001 R09: 0000000000000000
[66831.585490] R10: 0000560df87a8010 R11: 0000000000000246 R12: 0000560df8ffa080
[66831.585491] R13: 0000560df87a8d90 R14: 0000000000040000 R15: 0000560df87a8c60
[66831.624342] aic94xx: probe of 0000:07:02.0 failed with error -17
Comment 1 Vitaly Vlasov 2018-11-03 10:03:53 UTC
I found same error on Redhat bugtracker:
https://bugzilla.redhat.com/show_bug.cgi?id=1443678

Some people says that kernel 4.9.x is not affected
Comment 2 Pablo Ruiz García 2019-01-26 03:13:23 UTC
Hi,

I've stumbled upon this very same issue, and have created a github repo with a fix for the actual issue (see git commit's description for bug details).

https://github.com/pruiz/aic94xx-hotfix
https://github.com/pruiz/aic94xx-hotfix/commit/01898ea7dff8eb52f43773ea3b59ab41ef2482c8

Hope it helps someone.

Best Regards
Comment 3 Bart Van Assche 2019-01-26 04:12:09 UTC
Please consider to do a root-cause analysis, to check which patch introduced this bug and to send the patch upstream according to the instructions in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst.
Comment 4 James.Bottomley 2019-01-26 04:44:33 UTC
On Sat, 2019-01-26 at 03:13 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=201609
> 
> Pablo Ruiz García (pablo.ruiz@gmail.com) changed:
> 
>            What    |Removed                     |Added
> -------------------------------------------------------------------
> ---------
>                  CC|                            |pablo.ruiz@gmail.com
> 
> --- Comment #2 from Pablo Ruiz García (pablo.ruiz@gmail.com) ---
> Hi,
> 
> I've stumbled upon this very same issue, and have created a github
> repo with a
> fix for the actual issue (see git commit's description for bug
> details).
> 
> https://github.com/pruiz/aic94xx-hotfix
> https://github.com/pruiz/aic94xx-hotfix/commit/01898ea7dff8eb52f43773
> ea3b59ab41ef2482c8
> 
> Hope it helps someone.

I think the correct fix is to rename the ai94xx revision file (say to
aic94x_revision). However I've got to say that commit

commit 702ed3be1b1bf4dea05954168321741c0910c645
Author: Emil Velikov <emil.velikov@collabora.com>
Date:   Mon Nov 21 16:24:49 2016 -0600

    PCI: Create revision file in sysfs
  
Was a blatently silly thing to do without trying to namespace the added
file (say calling it something less generic like pci_revision).  I
assume we're lucky and nothing in userspace depends on the 'revision'
sysfs file of the aic driver, but technically I should request
reversion of this patch on the grounds that it's broken our ABI (since
that's what exported sysfs files are).

James
Comment 5 Emil Velikov 2019-01-30 11:43:34 UTC
The other five files ({subsystem_,}{vendor,device} and class) do not have a prefix, so the more reasonable thing was it not add one for the revision file.

That said, I did spend a lot of time going through the Documentation looking for guidelines, rules or restrictions that should be applied and could not find any.
The PCI subsystem maintainer did not suggest adding a prefix either, so I would not use "blatently silly" here.

On a more practical side - the aic94xx driver custom revision file solves the exact same problem my patch does. Admittedly, aic94xx lacks the leading "0x" for the hexadecimal number provided.

My personal inclination is that we'd want to check if existing userspace requires the leading 0x and if so to what extend:
 - cosmetic - remove the aic94xx code, update userspace
 - unused - remove the aic94xx code
 - major impact - the aic94xx driver removes the generic revision file, before setting it's own

Does that sound reasonable, am I missing something?
Comment 6 Emil Velikov 2019-01-30 13:38:12 UTC
I stand corrected:
The aic94xx revision file exposes a string (which look like hexadecimal number) based on the revision number.

Regardless, the ideas proposed earlier are still applicable.
Comment 7 Bjorn Helgaas 2019-01-30 14:21:58 UTC
I don't know what the sysfs namespace rules are (if there are any).  It doesn't sound ideal for the PCI core to prefix all the files in a PCI device directory with "pci_" or similar, since we already know by the path (/sys/devices/pci0000:00, /sys/bus/pci/devices, etc) that they are PCI-related).
Comment 8 James.Bottomley 2019-01-30 15:12:18 UTC
On Wed, 2019-01-30 at 14:21 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=201609
> 
> Bjorn Helgaas (bhelgaas@google.com) changed:
> 
>            What    |Removed                     |Added
> -------------------------------------------------------------------
> ---------
>                  CC|                            |bhelgaas@google.com
> 
> --- Comment #7 from Bjorn Helgaas (bhelgaas@google.com) ---
> I don't know what the sysfs namespace rules are (if there are
> any).  It doesn't sound ideal for the PCI core to prefix all the
> files in a PCI device directory with "pci_" or similar, since we
> already know by the path (/sys/devices/pci0000:00,
> /sys/bus/pci/devices, etc) that they are PCI-related).

In the early days, we all simply grabbed files in the directory.  This
resulted in a complete lack of namespacing for everything.  Arguably
this was the really silly thing to do but we stuck with it.  What it
means is that now we have to be very careful about adding files because
things like this can easily happen.  In particular, if you add a
generic name like "revision" there's a really good chance some driver
already has it.

THere's nothing we can do about the prior lack of namespacing, but if
you don't want to check every driver in the kernel, using a namespace
prefix is a good way of avoiding clashes like this.  You can still add
a non-namespaced file, it's just you're on the hook to check every
driver to prevent cockups like this.

The key point is that if the file you clashed with is in use by
userspace, we can't rename it because it's an ABI, so now the conflict
is unresolvable.  I don't think this applies in the aic driver, but
it's the reason why you need to be very careful about adding files.

James
Comment 9 James.Bottomley 2019-01-30 15:17:16 UTC
On Wed, 2019-01-30 at 11:43 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=201609
> 
> Emil Velikov (emil.l.velikov@gmail.com) changed:
> 
>            What    |Removed                     |Added
> -------------------------------------------------------------------
> ---------
>                  CC|                            |emil.l.velikov@gmail
> .com
> 
> --- Comment #5 from Emil Velikov (emil.l.velikov@gmail.com) ---
> The other five files ({subsystem_,}{vendor,device} and class) do not
> have a prefix, so the more reasonable thing was it not add one for
> the revision file.

No, it wasn't.  This is the problem.  In the early days we all grabbed
un-namespaced names in this directory.  If you add another un-
namespaced file, a clash is likely to happen which has serious
consequences.  If you want to add a generically named file, you have to
check all the drivers before doing it.  If you clash with a file that's
in use by userspace, your new addition gets reverted because we can't
rename an in-use file because it's an exported ABI.

> That said, I did spend a lot of time going through the Documentation
> looking for guidelines, rules or restrictions that should be applied
> and could not find any. The PCI subsystem maintainer did not suggest
> adding a prefix either, so I would not use "blatently silly" here.

Either you prefix the addition or you check every driver ... neither
happened in this case.

> On a more practical side - the aic94xx driver custom revision file
> solves the exact same problem my patch does. Admittedly, aic94xx
> lacks the leading "0x" for the hexadecimal number provided.
> 
> My personal inclination is that we'd want to check if existing
> userspace
> requires the leading 0x and if so to what extend:
>  - cosmetic - remove the aic94xx code, update userspace
>  - unused - remove the aic94xx code
>  - major impact - the aic94xx driver removes the generic revision
> file, before setting it's own
> 
> Does that sound reasonable, am I missing something?

I think we're safe and the aic file isn't used from userspace, so just
renaming the aic revision file should work.

James
Comment 10 Emil Velikov 2019-01-30 15:46:30 UTC
(In reply to James.Bottomley from comment #9)
> On Wed, 2019-01-30 at 11:43 +0000, bugzilla-daemon@bugzilla.kernel.org
> wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=201609
> > 
> > Emil Velikov (emil.l.velikov@gmail.com) changed:
> > 
> >            What    |Removed                     |Added
> > -------------------------------------------------------------------
> > ---------
> >                  CC|                            |emil.l.velikov@gmail
> > .com
> > 
> > --- Comment #5 from Emil Velikov (emil.l.velikov@gmail.com) ---
> > The other five files ({subsystem_,}{vendor,device} and class) do not
> > have a prefix, so the more reasonable thing was it not add one for
> > the revision file.
> 
> No, it wasn't.  This is the problem.  In the early days we all grabbed
> un-namespaced names in this directory.  If you add another un-
> namespaced file, a clash is likely to happen which has serious
> consequences.  If you want to add a generically named file, you have to
> check all the drivers before doing it.  If you clash with a file that's
> in use by userspace, your new addition gets reverted because we can't
> rename an in-use file because it's an exported ABI.
> 
I see thank you. Can we get this and the (check every driver) documented please? Otherwise we'll end up in the same situation more often than needed.

The usual regex in MAINTAINERS is unlikely to catch such patches, so it will be hard for experienced developers/maintainers to catch these easily :-\

Fwiw many platforms depend on the revision entry that I've introduced. From desktops (using the open-source graphics stack) to servers depending on libvirt.

> > That said, I did spend a lot of time going through the Documentation
> > looking for guidelines, rules or restrictions that should be applied
> > and could not find any. The PCI subsystem maintainer did not suggest
> > adding a prefix either, so I would not use "blatently silly" here.
> 
> Either you prefix the addition or you check every driver ... neither
> happened in this case.
> 
Ack. Going through the kernel:

$ git grep "\<DEVICE_ATTR\>.*\<revision\>"

drivers/base/soc.c:static DEVICE_ATTR(revision, S_IRUGO, soc_info_get,  NULL);
drivers/scsi/aic94xx/aic94xx_init.c:static DEVICE_ATTR(revision, S_IRUGO, asd_show_dev_rev, NULL);
drivers/video/fbdev/gbefb.c:static DEVICE_ATTR(revision, S_IRUGO, gbefb_show_rev, NULL);


First is safe as is the last one (a platform_device). So only aic needs updating.

> > On a more practical side - the aic94xx driver custom revision file
> > solves the exact same problem my patch does. Admittedly, aic94xx
> > lacks the leading "0x" for the hexadecimal number provided.
> > 
> > My personal inclination is that we'd want to check if existing
> > userspace
> > requires the leading 0x and if so to what extend:
> >  - cosmetic - remove the aic94xx code, update userspace
> >  - unused - remove the aic94xx code
> >  - major impact - the aic94xx driver removes the generic revision
> > file, before setting it's own
> > 
> > Does that sound reasonable, am I missing something?
> 
> I think we're safe and the aic file isn't used from userspace, so just
> renaming the aic revision file should work.
> 
Ack. Are you going to write a patch or shall I?
Any name recommendations - aic94xx_revision, aic_revision, other?

Thanks
Emil