Bug 217317

Summary: net, pci: 6.3-rc1-6 hangs during boot on PowerEdge R620 with igb
Product: Drivers Reporter: Donald Hunter (donald.hunter)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: NEW ---    
Severity: blocking CC: vkuznets
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: v6.3-rc1 Subsystem:
Regression: Yes Bisected commit-id: 6fffbc7ae1373e10b989afe23a9eeb9c49fe15c3
Attachments: dmesg log for igb
Output from acpidump
Output from lspci -vv
Full dmesg with dyndbg="file drivers/acpi/* +p"

Description Donald Hunter 2023-04-10 15:02:08 UTC
Created attachment 304107 [details]
dmesg log for igb

The 6.3-rc1 and later release candidates are hanging during boot on our
Dell PowerEdge R620 servers with Intel I350 nics (igb).

After bisecting from v6.2 to v6.3-rc1, I isolated the problem to:

> [6fffbc7ae1373e10b989afe23a9eeb9c49fe15c3] PCI: Honor firmware's device
> disabled status
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 1779582fb500..b1d80c1d7a69 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1841,6 +1841,8 @@ int pci_setup_device(struct pci_dev *dev)
>
>         pci_set_of_node(dev);
>         pci_set_acpi_fwnode(dev);
> +       if (dev->dev.fwnode && !fwnode_device_is_available(dev->dev.fwnode))
> +               return -ENODEV;
>
>         pci_dev_assign_slot(dev);
>


I have verified that reverting 6fffbc7ae1373e10b989afe23a9eeb9c49fe15c3
resolves the issue on v6.3-rc4.

I have attached the dmesg log from v6.3.0-rc1.
Comment 1 Donald Hunter 2023-04-10 15:04:20 UTC
Created attachment 304108 [details]
Output from acpidump
Comment 2 Donald Hunter 2023-04-10 15:05:43 UTC
Created attachment 304109 [details]
Output from lspci -vv
Comment 3 Donald Hunter 2023-04-11 08:14:47 UTC
Created attachment 304115 [details]
Full dmesg with dyndbg="file drivers/acpi/* +p"
Comment 4 vkuznets 2023-04-11 10:10:19 UTC
Not sure how this is related by I've discovered that AWS Xen instances also crash on boot with 

[    3.635376] ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function Network Driver
[    3.635379] ixgbevf: Copyright (c) 2009 - 2018 Intel Corporation.
[    3.673083] Invalid max_queues (4), will use default max: 2.
[    3.680817] BUG: kernel NULL pointer dereference, address: 0000000000000004
[    3.687649] #PF: supervisor read access in kernel mode
[    3.692673] #PF: error_code(0x0000) - not-present page
[    3.697690] PGD 0 P4D 0 
[    3.700522] Oops: 0000 [#1] PREEMPT SMP PTI
[    3.704778] CPU: 1 PID: 62 Comm: xenwatch Not tainted 5.14.0+ #58
[    3.710675] Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
[    3.715333] ixgbevf 0000:00:03.0: 06:5f:49:6d:1b:92
[    3.717220] RIP: 0010:get_free_entries+0xc3/0x2f0
[    3.717228] Code: e8 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 8b 05 7c 7f 34 02 41 89 de 31 d2 44 8b 3d 90 7f 34 02 45 29 e6 <8b> 48 04 41 8d 44 0e ff f7 f1 45 8d 2c 07 41 89 c4 e8 17 fe ff ff
[    3.717231] RSP: 0018:ffffb340402c3d48 EFLAGS: 00010002
[    3.717235] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffb340402c3df8
[    3.717236] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa05fb184
[    3.717237] RBP: 0000000000000286 R08: ffffd6f9020f3880 R09: 0000000000000000
[    3.717239] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    3.718023] ixgbevf 0000:00:03.0: MAC: 1
[    3.718768] R13: 0000000000083ce2 R14: 0000000000000001 R15: 0000000000000000
[    3.718770] FS:  0000000000000000(0000) GS:ffff90266fa40000(0000) knlGS:0000000000000000
[    3.720480] ixgbevf 0000:00:03.0: Intel(R) 82599 Virtual Function
[    3.721255] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.721258] CR2: 0000000000000004 CR3: 0000000033152006 CR4: 00000000001706e0
[    3.721260] Call Trace:
[    3.721262]  <TASK>
[    3.721265]  gnttab_grant_foreign_access+0x1c/0x70
[    3.822046]  xenbus_grant_ring+0x56/0x120
[    3.826044]  setup_blkring+0x1c8/0x460 [xen_blkfront]
[    3.830977]  talk_to_blkback+0xb8/0x910 [xen_blkfront]
[    3.835967]  ? xenbus_dev_request_and_reply+0x80/0x80
[    3.840848]  xenwatch_thread+0x94/0x180
[    3.844670]  ? cpuacct_percpu_seq_show+0x10/0x10
[    3.849264]  kthread+0xe0/0x100
[    3.852553]  ? kthread_complete_and_exit+0x20/0x20
[    3.857311]  ret_from_fork+0x22/0x30
[    3.860972]  </TASK>
[    3.863405] Modules linked in: xen_blkfront(+) serio_raw ghash_clmulni_intel(+) ixgbevf dm_mirror dm_region_hash dm_log dm_mod
[    3.873844] CR2: 0000000000000004
[    3.877518] ---[ end trace 5fa245c60a20f54e ]---
[    3.877520] RIP: 0010:get_free_entries+0xc3/0x2f0
[    3.877528] Code: e8 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 8b 05 7c 7f 34 02 41 89 de 31 d2 44 8b 3d 90 7f 34 02 45 29 e6 <8b> 48 04 41 8d 44 0e ff f7 f1 45 8d 2c 07 41 89 c4 e8 17 fe ff ff
[    3.877530] RSP: 0018:ffffb340402c3d48 EFLAGS: 00010002
[    3.877532] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffb340402c3df8
[    3.877534] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa05fb184
[    3.877535] RBP: 0000000000000286 R08: ffffd6f9020f3880 R09: 0000000000000000
[    3.877536] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    3.877538] R13: 0000000000083ce2 R14: 0000000000000001 R15: 0000000000000000
[    3.877539] FS:  0000000000000000(0000) GS:ffff90266fa40000(0000) knlGS:0000000000000000
[    3.877541] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.877542] CR2: 0000000000000004 CR3: 0000000033152006 CR4: 00000000001706e0
[    3.877544] Kernel panic - not syncing: Fatal exception
[    3.944455] Kernel Offset: 0x1cc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

and I've bisected the problem to the same 6fffbc7ae1373e10b989afe23a9eeb9c49fe15c3.