Bug 217379 - Latency issues in irq_bypass_register_consumer
Summary: Latency issues in irq_bypass_register_consumer
Status: NEW
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: Intel Linux
: P3 normal
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-04-28 07:27 UTC by zhuangel
Modified: 2023-05-11 09:59 UTC (History)
0 users

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description zhuangel 2023-04-28 07:27:37 UTC
We found some latency issue in high-density and high-concurrency scenarios,
we are using cloud hypervisor as vmm for lightweight VM, using VIRTIO net and
block for VM. In our test, we got about 50ms to 100ms+ latency in creating VM
and register irqfd, after trace with funclatency (a tool of bcc-tools,
https://github.com/iovisor/bcc), we found the latency introduced by following
functions:

- irq_bypass_register_consumer introduce more than 60ms per VM.
  This function was called when registering irqfd, the function will register
  irqfd as consumer to irqbypass, wait for connecting from irqbypass producers,
  like VFIO or VDPA. In our test, one irqfd register will get about 4ms
  latency, and 5 devices with total 16 irqfd will introduce more than 60ms
  latency.

Here is a simple case, which can emulate the latency issue (the real latency
is lager). The case create 800 VM as background do nothing, then repeatedly
create 20 VM then destroy them after 400ms, every VM will do simple thing,
create in kernel irq chip, and register 15 riqfd (emulate 5 devices and every
device has 3 irqfd), just trace the "irq_bypass_register_consumer" latency, you
will reproduce such kind latency issue. Here is a trace log on Xeon(R) Platinum
8255C server (96C, 2 sockets) with linux 6.2.20.

Reproduce Case
https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/kvm_irqfd_fork.c
Reproduce log
https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/test.log

To fix these latencies, I didn't have a graceful method, just simple ideas
is give user a chance to avoid these latencies, like new flag to disable irqbypass for each irqfd.

Any suggestion to fix the issue if welcomed.
Comment 1 Sean Christopherson 2023-05-01 16:51:30 UTC
On Fri, Apr 28, 2023, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217379
> 
>             Bug ID: 217379
>            Summary: Latency issues in irq_bypass_register_consumer
>            Product: Virtualization
>            Version: unspecified
>           Hardware: Intel
>                 OS: Linux
>             Status: NEW
>           Severity: normal
>           Priority: P3
>          Component: kvm
>           Assignee: virtualization_kvm@kernel-bugs.osdl.org
>           Reporter: zhuangel570@gmail.com
>         Regression: No
> 
> We found some latency issue in high-density and high-concurrency scenarios,
> we are using cloud hypervisor as vmm for lightweight VM, using VIRTIO net and
> block for VM. In our test, we got about 50ms to 100ms+ latency in creating VM
> and register irqfd, after trace with funclatency (a tool of bcc-tools,
> https://github.com/iovisor/bcc), we found the latency introduced by following
> functions:
> 
> - irq_bypass_register_consumer introduce more than 60ms per VM.
>   This function was called when registering irqfd, the function will register
>   irqfd as consumer to irqbypass, wait for connecting from irqbypass
>   producers,
>   like VFIO or VDPA. In our test, one irqfd register will get about 4ms
>   latency, and 5 devices with total 16 irqfd will introduce more than 60ms
>   latency.
> 
> Here is a simple case, which can emulate the latency issue (the real latency
> is lager). The case create 800 VM as background do nothing, then repeatedly
> create 20 VM then destroy them after 400ms, every VM will do simple thing,
> create in kernel irq chip, and register 15 riqfd (emulate 5 devices and every
> device has 3 irqfd), just trace the "irq_bypass_register_consumer" latency,
> you
> will reproduce such kind latency issue. Here is a trace log on Xeon(R)
> Platinum
> 8255C server (96C, 2 sockets) with linux 6.2.20.
> 
> Reproduce Case
>
> https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/kvm_irqfd_fork.c
> Reproduce log
> https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/test.log
> 
> To fix these latencies, I didn't have a graceful method, just simple ideas
> is give user a chance to avoid these latencies, like new flag to disable
> irqbypass for each irqfd.
> 
> Any suggestion to fix the issue if welcomed.

Looking at the code, it's not surprising that irq_bypass_register_consumer() can
exhibit high latencies.  The producers and consumers are stored in simple linked
lists, and a single mutex is held while traversing the lists *and* connecting
a consumer to a producer (and vice versa).

There are two obvious optimizations that can be done to reduce latency in
irq_bypass_register_consumer():

   - Use a different data type to track the producers and consumers so that lookups
     don't require a linear walk.  AIUI, the "tokens" used to match producers and
     consumers are just kernel pointers, so I _think_ XArray would perform reasonably
     well.

   - Connect producers and consumers outside of a global mutex.

Unfortunately, because .add_producer() and .add_consumer() can fail, and because
connections can be established by adding a consumer _or_ a producer, getting the
locking right without a global mutex is quite difficult.  It's certainly doable
to move the (dis)connect logic out of a global lock, but it's going to require a
dedicated effort, i.e. not something that can be sketched out in a few minutes
(I played around with the code for the better part of an hour trying to do just
that and kept running into edge case race conditions).
Comment 2 zhuangel 2023-05-11 09:59:40 UTC
Thanks for the suggestion, Sean.

Before we had a complete optimize for irq_bypass, do you think such kind of fix is acceptable. We should provide VMM with the ability to turn off irq_bypass feature for device not need irq_bypass:


diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 737318b1c1d9..a7a018ce954a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1292,7 +1292,7 @@ struct kvm_xen_hvm_config {
 };
 #endif

-#define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
+#define KVM_IRQFD_FLAG_DEASSIGN		(1 << 0)
 /*
  * Available with KVM_CAP_IRQFD_RESAMPLE
  *
@@ -1300,7 +1300,8 @@ struct kvm_xen_hvm_config {
  * the irqfd to operate in resampling mode for level triggered interrupt
  * emulation.  See Documentation/virt/kvm/api.rst.
  */
-#define KVM_IRQFD_FLAG_RESAMPLE (1 << 1)
+#define KVM_IRQFD_FLAG_RESAMPLE		(1 << 1)
+#define KVM_IRQFD_FLAG_NO_BYPASS	(1 << 2)

 struct kvm_irqfd {
 	__u32 fd;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index b0af834ffa95..90fa203d7ef3 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -425,7 +425,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 		schedule_work(&irqfd->inject);

 #ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
-	if (kvm_arch_has_irq_bypass()) {
+	if (!(args->flags & KVM_IRQFD_FLAG_NO_BYPASS) && kvm_arch_has_irq_bypass()) {
 		irqfd->consumer.token = (void *)irqfd->eventfd;
 		irqfd->consumer.add_producer = kvm_arch_irq_bypass_add_producer;
 		irqfd->consumer.del_producer = kvm_arch_irq_bypass_del_producer;
@@ -587,7 +587,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
 int
 kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
 {
-	if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN | KVM_IRQFD_FLAG_RESAMPLE))
+	if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN | KVM_IRQFD_FLAG_RESAMPLE
+				| KVM_IRQFD_FLAG_NO_BYPASS))
 		return -EINVAL;

 	if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)
--
2.31.1

Note You need to log in before you can comment on or make changes to this bug.