Re: [RFC v3 0/6] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-04-15 Thread Lei Yang
QE tested this series with packed=on/off, in_order=true and vhost=off
under regression tests, everything are works fine.

Tested-by: Lei Yang 

On Mon, Apr 8, 2024 at 11:34 PM Jonah Palmer  wrote:
>
> The goal of these patches is to add support to a variety of virtio and
> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
> indicates that all buffers are used by the device in the same order in
> which they were made available by the driver.
>
> These patches attempt to implement a generalized, non-device-specific
> solution to support this feature.
>
> The core feature behind this solution is a buffer mechanism in the form
> of a VirtQueue's used_elems VirtQueueElement array. This allows devices
> who always use buffers in-order by default to have a minimal overhead
> impact. Devices that may not always use buffers in-order likely will
> experience a performance hit. How large that performance hit is will
> depend on how frequent elements are completed out-of-order.
>
> A VirtQueue whose device who uses this feature will use its used_elems
> VirtQueueElement array to hold used VirtQueueElements. The index that
> used elements are placed in used_elems is the same index on the
> used/descriptor ring that would satisfy the in-order requirement. In
> other words, used elements are placed in their in-order locations on
> used_elems and are only written to the used/descriptor ring once the
> elements on used_elems are able to continue their expected order.
>
> To differentiate between a "used" and "unused" element on the used_elems
> array (a "used" element being an element that has returned from
> processing and an "unused" element being an element that has not yet
> been processed), we added a boolean 'filled' member to the
> VirtQueueElement struct. This flag is set to true when the element comes
> back from processing (virtqueue_ordered_fill) and then set back to false
> once it's been written to the used/descriptor ring
> (virtqueue_ordered_flush).
>
> ---
> v3: Add elements to used_elems during virtqueue_split/packed_pop
> Replace current_seq_idx usage with vq->last_avail_idx
> Remove used_seq_idx, leverage used_idx and last_avail_idx for
> searching used_elems
> Remove seq_idx in VirtQueueElement
> Add boolean to VirtQueueElement to signal element status
> Add virtqueue_ordered_fill/flush functions for ordering
>
> v2: Use a VirtQueue's used_elems array as a buffer mechanism
>
> v1: Implement custom GLib GHashTable as a buffer mechanism
>
> Jonah Palmer (6):
>   virtio: Add bool to VirtQueueElement
>   virtio: virtqueue_pop - VIRTIO_F_IN_ORDER support
>   virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support
>   virtio: virtqueue_ordered_flush - VIRTIO_F_IN_ORDER support
>   vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
>   virtio: Add VIRTIO_F_IN_ORDER property definition
>
>  hw/block/vhost-user-blk.c|   1 +
>  hw/net/vhost_net.c   |   2 +
>  hw/scsi/vhost-scsi.c |   1 +
>  hw/scsi/vhost-user-scsi.c|   1 +
>  hw/virtio/vhost-user-fs.c|   1 +
>  hw/virtio/vhost-user-vsock.c |   1 +
>  hw/virtio/virtio.c   | 118 ++-
>  include/hw/virtio/virtio.h   |   5 +-
>  net/vhost-vdpa.c |   1 +
>  9 files changed, 127 insertions(+), 4 deletions(-)
>
> --
> 2.39.3
>




Re: [RFC v3 0/6] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-04-15 Thread Lei Yang
QE tested this series with packed=on/off, in_order=true and vhost=off
under regression tests, everything are works fine.

Tested-by: Lei Yang 

On Mon, Apr 8, 2024 at 11:34 PM Jonah Palmer  wrote:
>
> The goal of these patches is to add support to a variety of virtio and
> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
> indicates that all buffers are used by the device in the same order in
> which they were made available by the driver.
>
> These patches attempt to implement a generalized, non-device-specific
> solution to support this feature.
>
> The core feature behind this solution is a buffer mechanism in the form
> of a VirtQueue's used_elems VirtQueueElement array. This allows devices
> who always use buffers in-order by default to have a minimal overhead
> impact. Devices that may not always use buffers in-order likely will
> experience a performance hit. How large that performance hit is will
> depend on how frequent elements are completed out-of-order.
>
> A VirtQueue whose device who uses this feature will use its used_elems
> VirtQueueElement array to hold used VirtQueueElements. The index that
> used elements are placed in used_elems is the same index on the
> used/descriptor ring that would satisfy the in-order requirement. In
> other words, used elements are placed in their in-order locations on
> used_elems and are only written to the used/descriptor ring once the
> elements on used_elems are able to continue their expected order.
>
> To differentiate between a "used" and "unused" element on the used_elems
> array (a "used" element being an element that has returned from
> processing and an "unused" element being an element that has not yet
> been processed), we added a boolean 'filled' member to the
> VirtQueueElement struct. This flag is set to true when the element comes
> back from processing (virtqueue_ordered_fill) and then set back to false
> once it's been written to the used/descriptor ring
> (virtqueue_ordered_flush).
>
> ---
> v3: Add elements to used_elems during virtqueue_split/packed_pop
> Replace current_seq_idx usage with vq->last_avail_idx
> Remove used_seq_idx, leverage used_idx and last_avail_idx for
> searching used_elems
> Remove seq_idx in VirtQueueElement
> Add boolean to VirtQueueElement to signal element status
> Add virtqueue_ordered_fill/flush functions for ordering
>
> v2: Use a VirtQueue's used_elems array as a buffer mechanism
>
> v1: Implement custom GLib GHashTable as a buffer mechanism
>
> Jonah Palmer (6):
>   virtio: Add bool to VirtQueueElement
>   virtio: virtqueue_pop - VIRTIO_F_IN_ORDER support
>   virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support
>   virtio: virtqueue_ordered_flush - VIRTIO_F_IN_ORDER support
>   vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
>   virtio: Add VIRTIO_F_IN_ORDER property definition
>
>  hw/block/vhost-user-blk.c|   1 +
>  hw/net/vhost_net.c   |   2 +
>  hw/scsi/vhost-scsi.c |   1 +
>  hw/scsi/vhost-user-scsi.c|   1 +
>  hw/virtio/vhost-user-fs.c|   1 +
>  hw/virtio/vhost-user-vsock.c |   1 +
>  hw/virtio/virtio.c   | 118 ++-
>  include/hw/virtio/virtio.h   |   5 +-
>  net/vhost-vdpa.c |   1 +
>  9 files changed, 127 insertions(+), 4 deletions(-)
>
> --
> 2.39.3
>




Re: [PATCH v6] virtio-pci: Fix the crash that the vector was used after released.

2024-04-14 Thread Lei Yang
QE tested this patch with regression tests, everything works fine.

Tested-by: Lei Yang 

On Fri, Apr 12, 2024 at 2:37 PM Cindy Lu  wrote:
>
> Hi All
> I apologize for bothering you again
> I send the new patch is because I found that the function
> kvm_virtio_pci_vector_use_one/kvm_virtio_pci_vector_release_one
> can only change the vector that already set to the device.
> 
>   ret = virtio_pci_get_notifier(proxy, queue_no, , );
>   if (ret < 0) {
> return;
>   }
> ...
> So I move the setting vector into the function
> virtio_pci_set_and_change_vector()
> the other part are the same .
>
> the sanity test is passed and the qemu qtest is also passed
>
> Thanks
> Cindy
>
> On Fri, Apr 12, 2024 at 2:28 PM Cindy Lu  wrote:
> >
> > During the booting process of the non-standard image, the behavior of the
> > called function in qemu is as follows:
> >
> > 1. vhost_net_stop() was triggered by guest image. This will call the 
> > function
> > virtio_pci_set_guest_notifiers() with assgin= false,
> > virtio_pci_set_guest_notifiers() will release the irqfd for vector 0
> >
> > 2. virtio_reset() was triggered, this will set configure vector to 
> > VIRTIO_NO_VECTOR
> >
> > 3.vhost_net_start() was called (at this time, the configure vector is
> > still VIRTIO_NO_VECTOR) and then call virtio_pci_set_guest_notifiers() with
> > assgin=true, so the irqfd for vector 0 is still not "init" during this 
> > process
> >
> > 4. The system continues to boot and sets the vector back to 0. After that
> > msix_fire_vector_notifier() was triggered to unmask the vector 0 and  meet 
> > the crash
> >
> > To fix the issue, we need to support changing the vector after 
> > VIRTIO_CONFIG_S_DRIVER_OK is set.
> >
> > (gdb) bt
> > 0  __pthread_kill_implementation (threadid=, 
> > signo=signo@entry=6, no_tid=no_tid@entry=0)
> > at pthread_kill.c:44
> > 1  0x7fc87148ec53 in __pthread_kill_internal (signo=6, 
> > threadid=) at pthread_kill.c:78
> > 2  0x7fc87143e956 in __GI_raise (sig=sig@entry=6) at 
> > ../sysdeps/posix/raise.c:26
> > 3  0x7fc8714287f4 in __GI_abort () at abort.c:79
> > 4  0x7fc87142871b in __assert_fail_base
> > (fmt=0x7fc8715bbde0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
> > assertion=0x5606413efd53 "ret == 0", file=0x5606413ef87d 
> > "../accel/kvm/kvm-all.c", line=1837, function=) at 
> > assert.c:92
> > 5  0x7fc871437536 in __GI___assert_fail
> > (assertion=0x5606413efd53 "ret == 0", file=0x5606413ef87d 
> > "../accel/kvm/kvm-all.c", line=1837, function=0x5606413f06f0 
> > <__PRETTY_FUNCTION__.19> "kvm_irqchip_commit_routes") at assert.c:101
> > 6  0x560640f884b5 in kvm_irqchip_commit_routes (s=0x560642cae1f0) at 
> > ../accel/kvm/kvm-all.c:1837
> > 7  0x560640c98f8e in virtio_pci_one_vector_unmask
> > (proxy=0x560643c65f00, queue_no=4294967295, vector=0, msg=..., 
> > n=0x560643c6e4c8)
> > at ../hw/virtio/virtio-pci.c:1005
> > 8  0x560640c99201 in virtio_pci_vector_unmask (dev=0x560643c65f00, 
> > vector=0, msg=...)
> > at ../hw/virtio/virtio-pci.c:1070
> > 9  0x560640bc402e in msix_fire_vector_notifier (dev=0x560643c65f00, 
> > vector=0, is_masked=false)
> > at ../hw/pci/msix.c:120
> > 10 0x560640bc40f1 in msix_handle_mask_update (dev=0x560643c65f00, 
> > vector=0, was_masked=true)
> > at ../hw/pci/msix.c:140
> > 11 0x560640bc4503 in msix_table_mmio_write (opaque=0x560643c65f00, 
> > addr=12, val=0, size=4)
> > at ../hw/pci/msix.c:231
> > 12 0x560640f26d83 in memory_region_write_accessor
> > (mr=0x560643c66540, addr=12, value=0x7fc86b7bc628, size=4, shift=0, 
> > mask=4294967295, attrs=...)
> > at ../system/memory.c:497
> > 13 0x560640f270a6 in access_with_adjusted_size
> >
> >  (addr=12, value=0x7fc86b7bc628, size=4, access_size_min=1, 
> > access_size_max=4, access_fn=0x560640f26c8d , 
> > mr=0x560643c66540, attrs=...) at ../system/memory.c:573
> > 14 0x560640f2a2b5 in memory_region_dispatch_write (mr=0x560643c66540, 
> > addr=12, data=0, op=MO_32, attrs=...)
> > at ../system/memory.c:1521
> > 15 0x560640f37bac in flatview_write_continue
> > (fv=0x7fc65805e0b0, addr=4273803276, attrs=..., ptr=0x7fc871e9c028, 
> > len=4, addr1=12, l=4, mr=0x560643c66540)
> > at ../system/physmem.c:2714
> > 16 0x560640f37d0f in flatview_write
> > (fv=0x7fc65805e0b0, addr=4273803276

Re: [PATCH v1 0/8] virtio, vhost: Add VIRTIO_F_NOTIFICATION_DATA support

2024-03-08 Thread Lei Yang
Hi Jonah

QE tested this series v1 with a tap device with vhost=off with
regression tests, everything works fine. And QE also add
"notification_data=true" to the qemu command line then got "1" when
performing the command [1] inside the guest.
[1] cut -c39 /sys/devices/pci:00/:00:01.3/:05:00.0/virtio1/features

Tested-by: Lei Yang 

On Thu, Mar 7, 2024 at 7:18 PM Eugenio Perez Martin  wrote:
>
> On Wed, Mar 6, 2024 at 8:34 AM Michael S. Tsirkin  wrote:
> >
> > On Wed, Mar 06, 2024 at 08:07:31AM +0100, Eugenio Perez Martin wrote:
> > > On Wed, Mar 6, 2024 at 6:34 AM Jason Wang  wrote:
> > > >
> > > > On Tue, Mar 5, 2024 at 3:46 AM Jonah Palmer  
> > > > wrote:
> > > > >
> > > > > The goal of these patches are to add support to a variety of virtio 
> > > > > and
> > > > > vhost devices for the VIRTIO_F_NOTIFICATION_DATA transport feature. 
> > > > > This
> > > > > feature indicates that a driver will pass extra data (instead of just 
> > > > > a
> > > > > virtqueue's index) when notifying the corresponding device.
> > > > >
> > > > > The data passed in by the driver when this feature is enabled varies 
> > > > > in
> > > > > format depending on if the device is using a split or packed virtqueue
> > > > > layout:
> > > > >
> > > > >  Split VQ
> > > > >   - Upper 16 bits: shadow_avail_idx
> > > > >   - Lower 16 bits: virtqueue index
> > > > >
> > > > >  Packed VQ
> > > > >   - Upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
> > > > >   - Lower 16 bits: virtqueue index
> > > > >
> > > > > Also, due to the limitations of ioeventfd not being able to carry the
> > > > > extra provided by the driver, ioeventfd is left disabled for any 
> > > > > devices
> > > > > using this feature.
> > > >
> > > > Is there any method to overcome this? This might help for vhost.
> > > >
> > >
> > > As a half-baked idea, read(2)ing an eventfd descriptor returns an
> > > 8-byte integer already. The returned value of read depends on eventfd
> > > flags, but both have to do with the number of writes of the other end.
> > >
> > > My proposal is to replace this value with the last value written by
> > > the guest, so we can extract the virtio notification data from there.
> > > The behavior of read is similar to not-EFD_SEMAPHORE, reading a value
> > > and then blocking if read again without writes. The behavior of KVM
> > > writes is different, as it is not a counter anymore.
> > >
> > > Thanks!
> >
> >
> > I doubt you will be able to support this in ioeventfd...
>
> I agree.
>
> > But vhost does not really need the value at all.
> > So why mask out ioeventfd with vhost?
>
> The interface should not be able to start with vhost-kernel because
> the feature is not offered by the vhost-kernel device. So ioeventfd is
> always enabled with vhost-kernel.
>
> Or do you mean we should allow it and let vhost-kernel fetch data from
> the avail ring as usual? I'm ok with that but then the guest can place
> any value to it, so the driver cannot be properly "validated by
> software" that way.
>
> > vhost-vdpa is probably the only one that might need it...
>
> Right, but vhost-vdpa already supports doorbell memory regions so I
> guess it has little use, isn't it?
>
> Thanks!
>
> >
> >
> >
> > > > Thanks
> > > >
> > > > >
> > > > > A significant aspect of this effort has been to maintain compatibility
> > > > > across different backends. As such, the feature is offered by backend
> > > > > devices only when supported, with fallback mechanisms where backend
> > > > > support is absent.
> > > > >
> > > > > Jonah Palmer (8):
> > > > >   virtio/virtio-pci: Handle extra notification data
> > > > >   virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
> > > > >   virtio-mmio: Handle extra notification data
> > > > >   virtio-mmio: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
> > > > >   virtio-ccw: Handle extra notification data
> > > > >   virtio-ccw: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
> > > > >   vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature 
> > > > > bits
> > > > >   virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition
> > > > >
> > > > >  hw/block/vhost-user-blk.c|  1 +
> > > > >  hw/net/vhost_net.c   |  2 ++
> > > > >  hw/s390x/s390-virtio-ccw.c   | 16 
> > > > >  hw/s390x/virtio-ccw.c|  6 --
> > > > >  hw/scsi/vhost-scsi.c |  1 +
> > > > >  hw/scsi/vhost-user-scsi.c|  1 +
> > > > >  hw/virtio/vhost-user-fs.c|  2 +-
> > > > >  hw/virtio/vhost-user-vsock.c |  1 +
> > > > >  hw/virtio/virtio-mmio.c  | 15 +++
> > > > >  hw/virtio/virtio-pci.c   | 16 +++-
> > > > >  hw/virtio/virtio.c   | 18 ++
> > > > >  include/hw/virtio/virtio.h   |  5 -
> > > > >  net/vhost-vdpa.c |  1 +
> > > > >  13 files changed, 68 insertions(+), 17 deletions(-)
> > > > >
> > > > > --
> > > > > 2.39.3
> > > > >
> > > >
> >
>
>




Re: [PATCH v1 0/8] virtio, vhost: Add VIRTIO_F_NOTIFICATION_DATA support

2024-03-08 Thread Lei Yang
Hi Jonah

QE tested this series v1 with a tap device with vhost=off with
regression tests, everything works fine. And QE also add
"notification_data=true" to the qemu command line then got "1" when
performing the command [1] inside the guest.
[1] cut -c39 /sys/devices/pci:00/:00:01.3/:05:00.0/virtio1/features

Tested-by: Lei Yang 

On Thu, Mar 7, 2024 at 7:18 PM Eugenio Perez Martin  wrote:
>
> On Wed, Mar 6, 2024 at 8:34 AM Michael S. Tsirkin  wrote:
> >
> > On Wed, Mar 06, 2024 at 08:07:31AM +0100, Eugenio Perez Martin wrote:
> > > On Wed, Mar 6, 2024 at 6:34 AM Jason Wang  wrote:
> > > >
> > > > On Tue, Mar 5, 2024 at 3:46 AM Jonah Palmer  
> > > > wrote:
> > > > >
> > > > > The goal of these patches are to add support to a variety of virtio 
> > > > > and
> > > > > vhost devices for the VIRTIO_F_NOTIFICATION_DATA transport feature. 
> > > > > This
> > > > > feature indicates that a driver will pass extra data (instead of just 
> > > > > a
> > > > > virtqueue's index) when notifying the corresponding device.
> > > > >
> > > > > The data passed in by the driver when this feature is enabled varies 
> > > > > in
> > > > > format depending on if the device is using a split or packed virtqueue
> > > > > layout:
> > > > >
> > > > >  Split VQ
> > > > >   - Upper 16 bits: shadow_avail_idx
> > > > >   - Lower 16 bits: virtqueue index
> > > > >
> > > > >  Packed VQ
> > > > >   - Upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
> > > > >   - Lower 16 bits: virtqueue index
> > > > >
> > > > > Also, due to the limitations of ioeventfd not being able to carry the
> > > > > extra provided by the driver, ioeventfd is left disabled for any 
> > > > > devices
> > > > > using this feature.
> > > >
> > > > Is there any method to overcome this? This might help for vhost.
> > > >
> > >
> > > As a half-baked idea, read(2)ing an eventfd descriptor returns an
> > > 8-byte integer already. The returned value of read depends on eventfd
> > > flags, but both have to do with the number of writes of the other end.
> > >
> > > My proposal is to replace this value with the last value written by
> > > the guest, so we can extract the virtio notification data from there.
> > > The behavior of read is similar to not-EFD_SEMAPHORE, reading a value
> > > and then blocking if read again without writes. The behavior of KVM
> > > writes is different, as it is not a counter anymore.
> > >
> > > Thanks!
> >
> >
> > I doubt you will be able to support this in ioeventfd...
>
> I agree.
>
> > But vhost does not really need the value at all.
> > So why mask out ioeventfd with vhost?
>
> The interface should not be able to start with vhost-kernel because
> the feature is not offered by the vhost-kernel device. So ioeventfd is
> always enabled with vhost-kernel.
>
> Or do you mean we should allow it and let vhost-kernel fetch data from
> the avail ring as usual? I'm ok with that but then the guest can place
> any value to it, so the driver cannot be properly "validated by
> software" that way.
>
> > vhost-vdpa is probably the only one that might need it...
>
> Right, but vhost-vdpa already supports doorbell memory regions so I
> guess it has little use, isn't it?
>
> Thanks!
>
> >
> >
> >
> > > > Thanks
> > > >
> > > > >
> > > > > A significant aspect of this effort has been to maintain compatibility
> > > > > across different backends. As such, the feature is offered by backend
> > > > > devices only when supported, with fallback mechanisms where backend
> > > > > support is absent.
> > > > >
> > > > > Jonah Palmer (8):
> > > > >   virtio/virtio-pci: Handle extra notification data
> > > > >   virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
> > > > >   virtio-mmio: Handle extra notification data
> > > > >   virtio-mmio: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
> > > > >   virtio-ccw: Handle extra notification data
> > > > >   virtio-ccw: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
> > > > >   vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature 
> > > > > bits
> > > > >   virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition
> > > > >
> > > > >  hw/block/vhost-user-blk.c|  1 +
> > > > >  hw/net/vhost_net.c   |  2 ++
> > > > >  hw/s390x/s390-virtio-ccw.c   | 16 
> > > > >  hw/s390x/virtio-ccw.c|  6 --
> > > > >  hw/scsi/vhost-scsi.c |  1 +
> > > > >  hw/scsi/vhost-user-scsi.c|  1 +
> > > > >  hw/virtio/vhost-user-fs.c|  2 +-
> > > > >  hw/virtio/vhost-user-vsock.c |  1 +
> > > > >  hw/virtio/virtio-mmio.c  | 15 +++
> > > > >  hw/virtio/virtio-pci.c   | 16 +++-
> > > > >  hw/virtio/virtio.c   | 18 ++
> > > > >  include/hw/virtio/virtio.h   |  5 -
> > > > >  net/vhost-vdpa.c |  1 +
> > > > >  13 files changed, 68 insertions(+), 17 deletions(-)
> > > > >
> > > > > --
> > > > > 2.39.3
> > > > >
> > > >
> >
>
>




Re: [syzbot] [virtualization?] linux-next boot error: WARNING: refcount bug in __free_pages_ok

2024-02-21 Thread Lei Yang
Hi All

I hit a similar issue when doing a regression testing from my side.
For the error messages please help review the attachment.

The latest commit:
commit c02197fc9076e7d991c8f6adc11759c5ba52ddc6 (HEAD -> master,
origin/master, origin/HEAD)
Merge: f2667e0c3240 0846dd77c834
Author: Linus Torvalds 
Date:   Sat Feb 17 16:59:31 2024 -0800

Merge tag 'powerpc-6.8-3' of
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "This is a bit of a big batch for rc4, but just due to holiday hangover
  and because I didn't send any fixes last week due to a late revert
  request. I think next week should be back to normal.

Regards
Lei

On Mon, Feb 19, 2024 at 3:35 PM Michael S. Tsirkin  wrote:
>
> On Sun, Feb 18, 2024 at 09:06:18PM -0800, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:d37e1e4c52bc Add linux-next specific files for 20240216
> > git tree:   linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=171ca65218
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=4bc446d42a7d56c0
> > dashboard link: https://syzkaller.appspot.com/bug?extid=6f3c38e8a6a0297caa5a
> > compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for 
> > Debian) 2.40
> >
> > Downloadable assets:
> > disk image: 
> > https://storage.googleapis.com/syzbot-assets/14d0894504b9/disk-d37e1e4c.raw.xz
> > vmlinux: 
> > https://storage.googleapis.com/syzbot-assets/6cda61e084ee/vmlinux-d37e1e4c.xz
> > kernel image: 
> > https://storage.googleapis.com/syzbot-assets/720c85283c05/bzImage-d37e1e4c.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+6f3c38e8a6a0297ca...@syzkaller.appspotmail.com
> >
> > Key type pkcs7_test registered
> > Block layer SCSI generic (bsg) driver version 0.4 loaded (major 239)
> > io scheduler mq-deadline registered
> > io scheduler kyber registered
> > io scheduler bfq registered
> > input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
> > ACPI: button: Power Button [PWRF]
> > input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1
> > ACPI: button: Sleep Button [SLPF]
> > ioatdma: Intel(R) QuickData Technology Driver 5.00
> > ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > virtio-pci :00:03.0: virtio_pci: leaving for legacy driver
> > ACPI: \_SB_.LNKD: Enabled at IRQ 10
> > virtio-pci :00:04.0: virtio_pci: leaving for legacy driver
> > ACPI: \_SB_.LNKB: Enabled at IRQ 10
> > virtio-pci :00:06.0: virtio_pci: leaving for legacy driver
> > virtio-pci :00:07.0: virtio_pci: leaving for legacy driver
> > N_HDLC line discipline registered with maxframe=4096
> > Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
> > 00:05: ttyS2 at I/O 0x3e8 (irq = 6, base_baud = 115200) is a 16550A
> > 00:06: ttyS3 at I/O 0x2e8 (irq = 7, base_baud = 115200) is a 16550A
> > Non-volatile memory driver v1.3
> > Linux agpgart interface v0.103
> > ACPI: bus type drm_connector registered
> > [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0
> > [drm] Initialized vkms 1.0.0 20180514 for vkms on minor 1
> > Console: switching to colour frame buffer device 128x48
> > platform vkms: [drm] fb0: vkmsdrmfb frame buffer device
> > usbcore: registered new interface driver udl
> > brd: module loaded
> > loop: module loaded
> > zram: Added device: zram0
> > null_blk: disk nullb0 created
> > null_blk: module loaded
> > Guest personality initialized and is inactive
> > VMCI host device registered (name=vmci, major=10, minor=118)
> > Initialized host personality
> > usbcore: registered new interface driver rtsx_usb
> > usbcore: registered new interface driver viperboard
> > usbcore: registered new interface driver dln2
> > usbcore: registered new interface driver pn533_usb
> > nfcsim 0.2 initialized
> > usbcore: registered new interface driver port100
> > usbcore: registered new interface driver nfcmrvl
> > Loading iSCSI transport class v2.0-870.
> > virtio_scsi virtio0: 1/0/0 default/read/poll queues
> > [ cut here ]
> > refcount_t: decrement hit 0; leaking memory.
> > WARNING: CPU: 0 PID: 1 at lib/refcount.c:31 
> > refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31
> > Modules linked in:
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc4-next-20240216-syzkaller 
> > #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> > Google 01/25/2024
> > RIP: 0010:refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31
> > Code: b2 00 00 00 e8 b7 94 f0 fc 5b 5d c3 cc cc cc cc e8 ab 94 f0 fc c6 05 
> > c6 16 ce 0a 01 90 48 c7 c7 a0 5a fe 8b e8 67 69 b4 fc 90 <0f> 0b 90 90 eb 
> > d9 e8 8b 94 f0 fc c6 05 a3 16 ce 0a 01 90 48 c7 c7
> > RSP: :c9066e10 EFLAGS: 00010246
> > RAX: 15c2c224c9b50400 RBX: 

Re: [PATCH v2] vdpa/mlx5: Allow CVQ size changes

2024-02-18 Thread Lei Yang
QE tested this patch's V2, qemu no longer print error messages
"qemu-system-x86_64: Insufficient written data (0)" after
enable/disable multi queues multi times inside guest. Both "x-svq=on
'' and without it are all test pass.

Tested-by: Lei Yang 

On Fri, Feb 16, 2024 at 10:25 PM Jonah Palmer  wrote:
>
> The MLX driver was not updating its control virtqueue size at set_vq_num
> and instead always initialized to MLX5_CVQ_MAX_ENT (16) at
> setup_cvq_vring.
>
> Qemu would try to set the size to 64 by default, however, because the
> CVQ size always was initialized to 16, an error would be thrown when
> sending >16 control messages (as used-ring entry 17 is initialized to 0).
> For example, starting a guest with x-svq=on and then executing the
> following command would produce the error below:
>
>  # for i in {1..20}; do ifconfig eth0 hw ether XX:xx:XX:xx:XX:XX; done
>
>  qemu-system-x86_64: Insufficient written data (0)
>  [  435.331223] virtio_net virtio0: Failed to set mac address by vq command.
>  SIOCSIFHWADDR: Invalid argument
>
> Acked-by: Dragos Tatulea 
> Acked-by: Eugenio Pérez 
> Signed-off-by: Jonah Palmer 
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 778821bab7d9..ecfc16151d61 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -151,8 +151,6 @@ static void teardown_driver(struct mlx5_vdpa_net *ndev);
>
>  static bool mlx5_vdpa_debug;
>
> -#define MLX5_CVQ_MAX_ENT 16
> -
>  #define MLX5_LOG_VIO_FLAG(_feature)  
>   \
> do {  
>  \
> if (features & BIT_ULL(_feature)) 
>  \
> @@ -2276,9 +2274,16 @@ static void mlx5_vdpa_set_vq_num(struct vdpa_device 
> *vdev, u16 idx, u32 num)
> struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> struct mlx5_vdpa_virtqueue *mvq;
>
> -   if (!is_index_valid(mvdev, idx) || is_ctrl_vq_idx(mvdev, idx))
> +   if (!is_index_valid(mvdev, idx))
> return;
>
> +if (is_ctrl_vq_idx(mvdev, idx)) {
> +struct mlx5_control_vq *cvq = >cvq;
> +
> +cvq->vring.vring.num = num;
> +return;
> +}
> +
> mvq = >vqs[idx];
> mvq->num_ent = num;
>  }
> @@ -2963,7 +2968,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
> u16 idx = cvq->vring.last_avail_idx;
>
> err = vringh_init_iotlb(>vring, mvdev->actual_features,
> -   MLX5_CVQ_MAX_ENT, false,
> +   cvq->vring.vring.num, false,
> (struct vring_desc 
> *)(uintptr_t)cvq->desc_addr,
> (struct vring_avail 
> *)(uintptr_t)cvq->driver_addr,
> (struct vring_used 
> *)(uintptr_t)cvq->device_addr);
> --
> 2.39.3
>




[jira] [Updated] (HDFS-17341) Support dedicated user queues in Namenode FairCallQueue

2024-01-18 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17341:

Description: 
Some service users today in namenode like ETL, metrics collection, ad-hoc users 
that are critical to run business critical job accounts for many traffic in 
namenode and shouldn't be throttled the same way as other individual users in 
FCQ.

There is [feature|https://issues.apache.org/jira/browse/HADOOP-17165] in 
namenode to always prioritize some service users to not subject to FCQ 
scheduling. (Those users are always p0) but it is not perfect and it doesn't 
account for traffic surge from those users.

The idea is to allocate dedicated rpc queues for those service users with 
bounded queue capacity and allocate processing weight for those users. If queue 
is full, those users are expected to backoff and retry.

 

New configs:
{code:java}
"faircallqueue.reserved.users"; // list of service users that are assigned to 
dedicated queue
"faircallqueue.reserved.users.max"; // max number of service users allowed
"faircallqueue.reserved.users.capacities"; // custom queue capacities for each 
service user
"faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
dedicated queue{code}
For instance, for a FCQ with 4 priority levels, 2 reserved users(a, b)

FCQ would look like:

 
{code:java}
P0: shared queue
P1: shared queue
P2: shared queue
P3: shared queue
P4: dedicated for user a
P5: dedicated for user b{code}
{color:#172b4d}The Multiplexer would have following weights{color}

{color:#172b4d}shared queue default weights: [8, 4, 2, 1]{color}

{color:#172b4d}reserved queue weights=[3, 2]{color}

{color:#172b4d}So user a gets 15% of total cycles, user b gets 10% of total 
cycles.{color}

 

 

  was:
Some service users today in namenode like ETL, metrics collection, ad-hoc users 
that are critical to run business critical job accounts for many traffic in 
namenode and shouldn't be throttled the same way as other individual users in 
FCQ.

There is feature in namenode to always prioritize some service users to not 
subject to FCQ scheduling. (Those users are always p0) but it is not perfect 
and it doesn't account for traffic surge from those users.

The idea is to allocate dedicated rpc queues for those service users with 
bounded queue capacity and allocate processing weight for those users. If queue 
is full, those users are expected to backoff and retry.

 

New configs:
{code:java}
"faircallqueue.reserved.users"; // list of service users that are assigned to 
dedicated queue
"faircallqueue.reserved.users.max"; // max number of service users allowed
"faircallqueue.reserved.users.capacities"; // custom queue capacities for each 
service user
"faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
dedicated queue{code}
For instance, for a FCQ with 4 priority levels, 2 reserved users(a, b)

FCQ would look like:

 
{code:java}
P0: shared queue
P1: shared queue
P2: shared queue
P3: shared queue
P4: dedicated for user a
P5: dedicated for user b{code}
{color:#172b4d}The Multiplexer would have following weights{color}

{color:#172b4d}shared queue default weights: [8, 4, 2, 1]{color}

{color:#172b4d}reserved queue weights=[3, 2]{color}

{color:#172b4d}So user a gets 15% of total cycles, user b gets 10% of total 
cycles.{color}

 

 


> Support dedicated user queues in Namenode FairCallQueue
> ---
>
> Key: HDFS-17341
> URL: https://issues.apache.org/jira/browse/HDFS-17341
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Some service users today in namenode like ETL, metrics collection, ad-hoc 
> users that are critical to run business critical job accounts for many 
> traffic in namenode and shouldn't be throttled the same way as other 
> individual users in FCQ.
> There is [feature|https://issues.apache.org/jira/browse/HADOOP-17165] in 
> namenode to always prioritize some service users to not subject to FCQ 
> scheduling. (Those users are always p0) but it is not perfect and it doesn't 
> account for traffic surge from those users.
> The idea is to allocate dedicated rpc queues for those service users with 
> bounded queue capacity and allocate processing weight for those users. If 
> queue is full, those users are expected to backoff and retry.
>  
> New configs:
> {code:java}
> "faircallqueue.reserved.users"; // list of service users that are assigned to 
> dedicated queue
> "faircallqueue.reserved.users.ma

[jira] [Commented] (HDFS-17341) Support dedicated user queues in Namenode FairCallQueue

2024-01-18 Thread Lei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808392#comment-17808392
 ] 

Lei Yang commented on HDFS-17341:
-

[~hexiaoqiao] Thanks for your comment. 
{quote}One concerns, we evaluate the request if high- or low- priority based on 
user only, but not all requests from this user are always high or low priority 
in fact.
{quote}
Not sure I understand this. The idea is to get some critical service users 
exempt from existing FCQ mechanism to make sure they are not throttled in the 
same way as regular users in shared queue. Meanwhile, those users should not 
flood the entire queue if there are traffic 
surge(https://issues.apache.org/jira/browse/HADOOP-17165 can assign service 
user to p0 but it cannot solve the traffic surge from those users). We can 
assign weights for those users to ensure they are not exceeding certain % of 
total processing cycles.

 

> Support dedicated user queues in Namenode FairCallQueue
> ---
>
> Key: HDFS-17341
> URL: https://issues.apache.org/jira/browse/HDFS-17341
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Some service users today in namenode like ETL, metrics collection, ad-hoc 
> users that are critical to run business critical job accounts for many 
> traffic in namenode and shouldn't be throttled the same way as other 
> individual users in FCQ.
> There is feature in namenode to always prioritize some service users to not 
> subject to FCQ scheduling. (Those users are always p0) but it is not perfect 
> and it doesn't account for traffic surge from those users.
> The idea is to allocate dedicated rpc queues for those service users with 
> bounded queue capacity and allocate processing weight for those users. If 
> queue is full, those users are expected to backoff and retry.
>  
> New configs:
> {code:java}
> "faircallqueue.reserved.users"; // list of service users that are assigned to 
> dedicated queue
> "faircallqueue.reserved.users.max"; // max number of service users allowed
> "faircallqueue.reserved.users.capacities"; // custom queue capacities for 
> each service user
> "faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
> dedicated queue{code}
> For instance, for a FCQ with 4 priority levels, 2 reserved users(a, b)
> FCQ would look like:
>  
> {code:java}
> P0: shared queue
> P1: shared queue
> P2: shared queue
> P3: shared queue
> P4: dedicated for user a
> P5: dedicated for user b{code}
> {color:#172b4d}The Multiplexer would have following weights{color}
> {color:#172b4d}shared queue default weights: [8, 4, 2, 1]{color}
> {color:#172b4d}reserved queue weights=[3, 2]{color}
> {color:#172b4d}So user a gets 15% of total cycles, user b gets 10% of total 
> cycles.{color}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17341) Support dedicated user queues in Namenode FairCallQueue

2024-01-18 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17341:

Description: 
Some service users today in namenode like ETL, metrics collection, ad-hoc users 
that are critical to run business critical job accounts for many traffic in 
namenode and shouldn't be throttled the same way as other individual users in 
FCQ.

There is feature in namenode to always prioritize some service users to not 
subject to FCQ scheduling. (Those users are always p0) but it is not perfect 
and it doesn't account for traffic surge from those users.

The idea is to allocate dedicated rpc queues for those service users with 
bounded queue capacity and allocate processing weight for those users. If queue 
is full, those users are expected to backoff and retry.

 

New configs:
{code:java}
"faircallqueue.reserved.users"; // list of service users that are assigned to 
dedicated queue
"faircallqueue.reserved.users.max"; // max number of service users allowed
"faircallqueue.reserved.users.capacities"; // custom queue capacities for each 
service user
"faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
dedicated queue{code}
For instance, for a FCQ with 4 priority levels, 2 reserved users(a, b)

FCQ would look like:

 
{code:java}
P0: shared queue
P1: shared queue
P2: shared queue
P3: shared queue
P4: dedicated for user a
P5: dedicated for user b{code}
{color:#172b4d}The Multiplexer would have following weights{color}

{color:#172b4d}shared queue default weights: [8, 4, 2, 1]{color}

{color:#172b4d}reserved queue weights=[3, 2]{color}

{color:#172b4d}So user a gets 15% of total cycles, user b gets 10% of total 
cycles.{color}

 

 

  was:
Some service users today in namenode like ETL, metrics collection, ad-hoc users 
that are critical to run business critical job accounts for many traffic in 
namenode and shouldn't be throttled the same way as other individual users in 
FCQ.

There is feature in namenode to always prioritize some service users to not 
subject to FCQ scheduling. (Those users are always p0) but it is not perfect 
and it doesn't account for traffic surge from those users.

The idea is to allocate dedicated rpc queues for those service users with 
bounded queue capacity and allocate processing weight for those users. If queue 
is full, those users are expected to backoff and retry.

 

New configs:
{code:java}
"faircallqueue.reserved.users"; // list of service users that are assigned to 
dedicated queue
"faircallqueue.reserved.users.max"; // max number of service users allowed
"faircallqueue.reserved.users.capacities"; // custom queue capacities for each 
service user
"faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
dedicated queue{code}
For instance, for a FCQ with 4 priority levels, 2 reserved users(a, b)

FCQ would look like:

 
{code:java}
P0: shared queue
P1: shared queue
P2: shared queue
P3: shared queue
P4: dedicated for user a
P5: dedicated for user b{code}
{color:#172b4d}The WRM would have following weights{color}

{color:#172b4d}shared queue default weights: [8, 4, 2, 1]{color}

{color:#172b4d}reserved queue weights=[3, 2]{color}

{color:#172b4d}So user a gets 15% of total cycles, user b gets 10% of total 
cycles.{color}

 

 


> Support dedicated user queues in Namenode FairCallQueue
> ---
>
> Key: HDFS-17341
> URL: https://issues.apache.org/jira/browse/HDFS-17341
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Some service users today in namenode like ETL, metrics collection, ad-hoc 
> users that are critical to run business critical job accounts for many 
> traffic in namenode and shouldn't be throttled the same way as other 
> individual users in FCQ.
> There is feature in namenode to always prioritize some service users to not 
> subject to FCQ scheduling. (Those users are always p0) but it is not perfect 
> and it doesn't account for traffic surge from those users.
> The idea is to allocate dedicated rpc queues for those service users with 
> bounded queue capacity and allocate processing weight for those users. If 
> queue is full, those users are expected to backoff and retry.
>  
> New configs:
> {code:java}
> "faircallqueue.reserved.users"; // list of service users that are assigned to 
> dedicated queue
> "faircallqueue.reserved.users.max"; // max number of service users allowed
> "faircallqueue.reserved.users.capacities"; // custom queue 

[jira] [Updated] (HDFS-17341) Support dedicated user queues in Namenode FairCallQueue

2024-01-18 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17341:

Description: 
Some service users today in namenode like ETL, metrics collection, ad-hoc users 
that are critical to run business critical job accounts for many traffic in 
namenode and shouldn't be throttled the same way as other individual users in 
FCQ.

There is feature in namenode to always prioritize some service users to not 
subject to FCQ scheduling. (Those users are always p0) but it is not perfect 
and it doesn't account for traffic surge from those users.

The idea is to allocate dedicated rpc queues for those service users with 
bounded queue capacity and allocate processing weight for those users. If queue 
is full, those users are expected to backoff and retry.

 

New configs:
{code:java}
"faircallqueue.reserved.users"; // list of service users that are assigned to 
dedicated queue
"faircallqueue.reserved.users.max"; // max number of service users allowed
"faircallqueue.reserved.users.capacities"; // custom queue capacities for each 
service user
"faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
dedicated queue{code}
For instance, for a FCQ with 4 priority levels, 2 reserved users(a, b)

FCQ would look like:

 
{code:java}
P0: shared queue
P1: shared queue
P2: shared queue
P3: shared queue
P4: dedicated for user a
P5: dedicated for user b{code}
{color:#172b4d}The WRM would have following weights{color}

{color:#172b4d}shared queue default weights: [8, 4, 2, 1]{color}

{color:#172b4d}reserved queue weights=[3, 2]{color}

{color:#172b4d}So user a gets 15% of total cycles, user b gets 10% of total 
cycles.{color}

 

 

  was:
Some service users today in namenode like ETL, metrics collection, ad-hoc users 
that are critical to run business critical job accounts for many traffic in 
namenode and shouldn't be throttled the same way as other individual users in 
FCQ.

There is feature in namenode to always prioritize some service users to not 
subject to FCQ scheduling. (Those users are always p0) but it is not perfect 
and it doesn't account for traffic surge from those users.

The idea is to allocate dedicated rpc queues for those service users with 
bounded queue capacity and allocate processing weight for those users. If queue 
is full, those users are expected to backoff and retry.

 

New configs:
{code:java}
"faircallqueue.reserved.users"; // list of service users that are assigned to 
dedicated queue
"faircallqueue.reserved.users.max"; // max number of service users allowed
"faircallqueue.reserved.users.capacities"; // custom queue capacities for each 
service user
"faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
dedicated queue{code}
 


> Support dedicated user queues in Namenode FairCallQueue
> ---
>
> Key: HDFS-17341
> URL: https://issues.apache.org/jira/browse/HDFS-17341
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Some service users today in namenode like ETL, metrics collection, ad-hoc 
> users that are critical to run business critical job accounts for many 
> traffic in namenode and shouldn't be throttled the same way as other 
> individual users in FCQ.
> There is feature in namenode to always prioritize some service users to not 
> subject to FCQ scheduling. (Those users are always p0) but it is not perfect 
> and it doesn't account for traffic surge from those users.
> The idea is to allocate dedicated rpc queues for those service users with 
> bounded queue capacity and allocate processing weight for those users. If 
> queue is full, those users are expected to backoff and retry.
>  
> New configs:
> {code:java}
> "faircallqueue.reserved.users"; // list of service users that are assigned to 
> dedicated queue
> "faircallqueue.reserved.users.max"; // max number of service users allowed
> "faircallqueue.reserved.users.capacities"; // custom queue capacities for 
> each service user
> "faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
> dedicated queue{code}
> For instance, for a FCQ with 4 priority levels, 2 reserved users(a, b)
> FCQ would look like:
>  
> {code:java}
> P0: shared queue
> P1: shared queue
> P2: shared queue
> P3: shared queue
> P4: dedicated for user a
> P5: dedicated for user b{code}
> {color:#172b4d}The WRM would have following weights{color}
> {color:#

[jira] [Updated] (HDFS-17341) Support dedicated user queues in Namenode FairCallQueue

2024-01-17 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17341:

Description: 
Some service users today in namenode like ETL, metrics collection, ad-hoc users 
that are critical to run business critical job accounts for many traffic in 
namenode and shouldn't be throttled the same way as other individual users in 
FCQ.

There is feature in namenode to whitelist some service users to not subject to 
FCQ scheduling. (Those users are always p0) but it is not perfect and it 
doesn't account for traffic surge from those users.

The idea is to allocate dedicated rpc queues for those service users with 
bounded queue capacity and allocate processing weight for those users. If queue 
is full, those users are expected to backoff and retry.

 

New configs:
{code:java}
"faircallqueue.reserved.users"; // list of service users that are assigned to 
dedicated queue
"faircallqueue.reserved.users.max"; // max number of service users allowed
"faircallqueue.reserved.users.capacities"; // custom queue capacities for each 
service user
"faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
dedicated queue{code}
 

  was:
Some service users today in namenode like ETL, metrics collection accounts for 
many traffic and shouldn't be throttled the same way as other individual users 
in FCQ.

There is feature in namenode to whitelist some service users to not subject to 
FCQ scheduling. (Those users are always p0) but it is not perfect and it 
doesn't account for traffic surge from those users.

The idea is to allocate dedicated rpc queues for those service users with 
bounded queue capacity and allocate processing weight for those users. If queue 
is full, those users are expected to backoff and retry.

 

New configs:
{code:java}
"faircallqueue.reserved.users"; // list of service users that are assigned to 
dedicated queue
"faircallqueue.reserved.users.max"; // max number of service users allowed
"faircallqueue.reserved.users.capacities"; // custom queue capacities for each 
service user
"faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
dedicated queue{code}
 


> Support dedicated user queues in Namenode FairCallQueue
> ---
>
> Key: HDFS-17341
> URL: https://issues.apache.org/jira/browse/HDFS-17341
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Priority: Major
>
> Some service users today in namenode like ETL, metrics collection, ad-hoc 
> users that are critical to run business critical job accounts for many 
> traffic in namenode and shouldn't be throttled the same way as other 
> individual users in FCQ.
> There is feature in namenode to whitelist some service users to not subject 
> to FCQ scheduling. (Those users are always p0) but it is not perfect and it 
> doesn't account for traffic surge from those users.
> The idea is to allocate dedicated rpc queues for those service users with 
> bounded queue capacity and allocate processing weight for those users. If 
> queue is full, those users are expected to backoff and retry.
>  
> New configs:
> {code:java}
> "faircallqueue.reserved.users"; // list of service users that are assigned to 
> dedicated queue
> "faircallqueue.reserved.users.max"; // max number of service users allowed
> "faircallqueue.reserved.users.capacities"; // custom queue capacities for 
> each service user
> "faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
> dedicated queue{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17341) Support dedicated user queues in Namenode FairCallQueue

2024-01-17 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17341:

Description: 
Some service users today in namenode like ETL, metrics collection, ad-hoc users 
that are critical to run business critical job accounts for many traffic in 
namenode and shouldn't be throttled the same way as other individual users in 
FCQ.

There is feature in namenode to always prioritize some service users to not 
subject to FCQ scheduling. (Those users are always p0) but it is not perfect 
and it doesn't account for traffic surge from those users.

The idea is to allocate dedicated rpc queues for those service users with 
bounded queue capacity and allocate processing weight for those users. If queue 
is full, those users are expected to backoff and retry.

 

New configs:
{code:java}
"faircallqueue.reserved.users"; // list of service users that are assigned to 
dedicated queue
"faircallqueue.reserved.users.max"; // max number of service users allowed
"faircallqueue.reserved.users.capacities"; // custom queue capacities for each 
service user
"faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
dedicated queue{code}
 

  was:
Some service users today in namenode like ETL, metrics collection, ad-hoc users 
that are critical to run business critical job accounts for many traffic in 
namenode and shouldn't be throttled the same way as other individual users in 
FCQ.

There is feature in namenode to whitelist some service users to not subject to 
FCQ scheduling. (Those users are always p0) but it is not perfect and it 
doesn't account for traffic surge from those users.

The idea is to allocate dedicated rpc queues for those service users with 
bounded queue capacity and allocate processing weight for those users. If queue 
is full, those users are expected to backoff and retry.

 

New configs:
{code:java}
"faircallqueue.reserved.users"; // list of service users that are assigned to 
dedicated queue
"faircallqueue.reserved.users.max"; // max number of service users allowed
"faircallqueue.reserved.users.capacities"; // custom queue capacities for each 
service user
"faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
dedicated queue{code}
 


> Support dedicated user queues in Namenode FairCallQueue
> ---
>
> Key: HDFS-17341
> URL: https://issues.apache.org/jira/browse/HDFS-17341
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Priority: Major
>
> Some service users today in namenode like ETL, metrics collection, ad-hoc 
> users that are critical to run business critical job accounts for many 
> traffic in namenode and shouldn't be throttled the same way as other 
> individual users in FCQ.
> There is feature in namenode to always prioritize some service users to not 
> subject to FCQ scheduling. (Those users are always p0) but it is not perfect 
> and it doesn't account for traffic surge from those users.
> The idea is to allocate dedicated rpc queues for those service users with 
> bounded queue capacity and allocate processing weight for those users. If 
> queue is full, those users are expected to backoff and retry.
>  
> New configs:
> {code:java}
> "faircallqueue.reserved.users"; // list of service users that are assigned to 
> dedicated queue
> "faircallqueue.reserved.users.max"; // max number of service users allowed
> "faircallqueue.reserved.users.capacities"; // custom queue capacities for 
> each service user
> "faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
> dedicated queue{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17341) Support dedicated user queues in Namenode FairCallQueue

2024-01-17 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17341:

Description: 
Some service users today in namenode like ETL, metrics collection accounts for 
many traffic and shouldn't be throttled the same way as other individual users 
in FCQ.

There is feature in namenode to whitelist some service users to not subject to 
FCQ scheduling. (Those users are always p0) but it is not perfect and it 
doesn't account for traffic surge from those users.

The idea is to allocate dedicated rpc queues for those service users with 
bounded queue capacity and allocate processing weight for those users. If queue 
is full, those users are expected to backoff and retry.

 

New configs:
{code:java}
"faircallqueue.reserved.users"; // list of service users that are assigned to 
dedicated queue
"faircallqueue.reserved.users.max"; // max number of service users allowed
"faircallqueue.reserved.users.capacities"; // custom queue capacities for each 
service user
"faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
dedicated queue{code}
 

  was:
Some service users today in namenode like ETL, metrics collection accounts for 
many traffic and shouldn't be throttled the same way as other individual users 
in FCQ.

The idea is to allocate dedicated rpc queues for those service users and 
allocate processing weight for those users.

 

New configs:
{code:java}
"faircallqueue.reserved.users"; // list of service users that are assigned to 
dedicated queue
"faircallqueue.reserved.users.max"; // max number of service users allowed
"faircallqueue.reserved.users.capacities"; // custom queue capacities for each 
service user
"faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
dedicated queue{code}
 


> Support dedicated user queues in Namenode FairCallQueue
> ---
>
> Key: HDFS-17341
> URL: https://issues.apache.org/jira/browse/HDFS-17341
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Priority: Major
>
> Some service users today in namenode like ETL, metrics collection accounts 
> for many traffic and shouldn't be throttled the same way as other individual 
> users in FCQ.
> There is feature in namenode to whitelist some service users to not subject 
> to FCQ scheduling. (Those users are always p0) but it is not perfect and it 
> doesn't account for traffic surge from those users.
> The idea is to allocate dedicated rpc queues for those service users with 
> bounded queue capacity and allocate processing weight for those users. If 
> queue is full, those users are expected to backoff and retry.
>  
> New configs:
> {code:java}
> "faircallqueue.reserved.users"; // list of service users that are assigned to 
> dedicated queue
> "faircallqueue.reserved.users.max"; // max number of service users allowed
> "faircallqueue.reserved.users.capacities"; // custom queue capacities for 
> each service user
> "faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
> dedicated queue{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17341) Support dedicated user queues in Namenode FairCallQueue

2024-01-17 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17341:

Affects Version/s: 2.10.0
   3.4.0

> Support dedicated user queues in Namenode FairCallQueue
> ---
>
> Key: HDFS-17341
> URL: https://issues.apache.org/jira/browse/HDFS-17341
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Priority: Major
>
> Some service users today in namenode like ETL, metrics collection accounts 
> for many traffic and shouldn't be throttled the same way as other individual 
> users in FCQ.
> The idea is to allocate dedicated rpc queues for those service users and 
> allocate processing weight for those users.
>  
> New configs:
> {code:java}
> "faircallqueue.reserved.users"; // list of service users that are assigned to 
> dedicated queue
> "faircallqueue.reserved.users.max"; // max number of service users allowed
> "faircallqueue.reserved.users.capacities"; // custom queue capacities for 
> each service user
> "faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
> dedicated queue{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17341) Support dedicated user queues in Namenode FairCallQueue

2024-01-17 Thread Lei Yang (Jira)
Lei Yang created HDFS-17341:
---

 Summary: Support dedicated user queues in Namenode FairCallQueue
 Key: HDFS-17341
 URL: https://issues.apache.org/jira/browse/HDFS-17341
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Lei Yang


Some service users today in namenode like ETL, metrics collection accounts for 
many traffic and shouldn't be throttled the same way as other individual users 
in FCQ.

The idea is to allocate dedicated rpc queues for those service users and 
allocate processing weight for those users.

 

New configs:
{code:java}
"faircallqueue.reserved.users"; // list of service users that are assigned to 
dedicated queue
"faircallqueue.reserved.users.max"; // max number of service users allowed
"faircallqueue.reserved.users.capacities"; // custom queue capacities for each 
service user
"faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
dedicated queue{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-17341) Support dedicated user queues in Namenode FairCallQueue

2024-01-17 Thread Lei Yang (Jira)
Lei Yang created HDFS-17341:
---

 Summary: Support dedicated user queues in Namenode FairCallQueue
 Key: HDFS-17341
 URL: https://issues.apache.org/jira/browse/HDFS-17341
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Lei Yang


Some service users today in namenode like ETL, metrics collection accounts for 
many traffic and shouldn't be throttled the same way as other individual users 
in FCQ.

The idea is to allocate dedicated rpc queues for those service users and 
allocate processing weight for those users.

 

New configs:
{code:java}
"faircallqueue.reserved.users"; // list of service users that are assigned to 
dedicated queue
"faircallqueue.reserved.users.max"; // max number of service users allowed
"faircallqueue.reserved.users.capacities"; // custom queue capacities for each 
service user
"faircallqueue.multiplexer.reserved.weights"; // processing weights for each 
dedicated queue{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2024-01-08 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17290:

Fix Version/s: 3.4.0

> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher priority queues when connection between client and 
> namenode remains open. Currently IPC server just emits a single metrics for 
> all the backoffs.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2024-01-08 Thread Lei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804483#comment-17804483
 ] 

Lei Yang commented on HDFS-17290:
-

[~simbadzina]  Thanks for reviewing and merging the PR.

> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher priority queues when connection between client and 
> namenode remains open. Currently IPC server just emits a single metrics for 
> all the backoffs.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



Re: [PATCH for 9.0 00/12] Map memory at destination .load_setup in vDPA-net migration

2023-12-24 Thread Lei Yang
QE tested this series with regression tests, there are no new regression issues.

Tested-by: Lei Yang 



On Sat, Dec 16, 2023 at 1:28 AM Eugenio Pérez  wrote:
>
> Current memory operations like pinning may take a lot of time at the
> destination.  Currently they are done after the source of the migration is
> stopped, and before the workload is resumed at the destination.  This is a
> period where neigher traffic can flow, nor the VM workload can continue
> (downtime).
>
> We can do better as we know the memory layout of the guest RAM at the
> destination from the moment the migration starts.  Moving that operation 
> allows
> QEMU to communicate the kernel the maps while the workload is still running in
> the source, so Linux can start mapping them.
>
> Also, the destination of the guest memory may finish before the destination
> QEMU maps all the memory.  In this case, the rest of the memory will be mapped
> at the same time as before applying this series, when the device is starting.
> So we're only improving with this series.
>
> If the destination has the switchover_ack capability enabled, the destination
> hold the migration until all the memory is mapped.
>
> This needs to be applied on top of [1]. That series performs some code
> reorganization that allows to map the guest memory without knowing the queue
> layout the guest configure on the device.
>
> This series reduced the downtime in the stop-and-copy phase of the live
> migration from 20s~30s to 5s, with a 128G mem guest and two mlx5_vdpa devices,
> per [2].
>
> Future directions on top of this series may include:
> * Iterative migration of virtio-net devices, as it may reduce downtime per 
> [3].
>   vhost-vdpa net can apply the configuration through CVQ in the destination
>   while the source is still migrating.
> * Move more things ahead of migration time, like DRIVER_OK.
> * Check that the devices of the destination are valid, and cancel the 
> migration
>   in case it is not.
>
> v1 from RFC v2:
> * Hold on migration if memory has not been mapped in full with switchover_ack.
> * Revert map if the device is not started.
>
> RFC v2:
> * Delegate map to another thread so it does no block QMP.
> * Fix not allocating iova_tree if x-svq=on at the destination.
> * Rebased on latest master.
> * More cleanups of current code, that might be split from this series too.
>
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg01986.html
> [2] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg00909.html
> [3] 
> https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566...@nvidia.com/T/
>
> Eugenio Pérez (12):
>   vdpa: do not set virtio status bits if unneeded
>   vdpa: make batch_begin_once early return
>   vdpa: merge _begin_batch into _batch_begin_once
>   vdpa: extract out _dma_end_batch from _listener_commit
>   vdpa: factor out stop path of vhost_vdpa_dev_start
>   vdpa: check for iova tree initialized at net_client_start
>   vdpa: set backend capabilities at vhost_vdpa_init
>   vdpa: add vhost_vdpa_load_setup
>   vdpa: approve switchover after memory map in the migration destination
>   vdpa: add vhost_vdpa_net_load_setup NetClient callback
>   vdpa: add vhost_vdpa_net_switchover_ack_needed
>   virtio_net: register incremental migration handlers
>
>  include/hw/virtio/vhost-vdpa.h |  32 
>  include/net/net.h  |   8 +
>  hw/net/virtio-net.c|  48 ++
>  hw/virtio/vhost-vdpa.c | 274 +++--
>  net/vhost-vdpa.c   |  43 +-
>  5 files changed, 357 insertions(+), 48 deletions(-)
>
> --
> 2.39.3
>
>




Re: [PATCH v4 01/13] vdpa: add VhostVDPAShared

2023-12-22 Thread Lei Yang
QE tested this series v4 with regression tests. It has fixed the qemu
core issues that hit last time.And everything works fine.

Tested-by: Lei Yang 




On Fri, Dec 22, 2023 at 1:43 AM Eugenio Pérez  wrote:
>
> It will hold properties shared among all vhost_vdpa instances associated
> with of the same device.  For example, we just need one iova_tree or one
> memory listener for the entire device.
>
> Next patches will register the vhost_vdpa memory listener at the
> beginning of the VM migration at the destination. This enables QEMU to
> map the memory to the device before stopping the VM at the source,
> instead of doing while both source and destination are stopped, thus
> minimizing the downtime.
>
> However, the destination QEMU is unaware of which vhost_vdpa struct will
> register its memory_listener.  If the source guest has CVQ enabled, it
> will be the one associated with the CVQ.  Otherwise, it will be the
> first one.
>
> Save the memory operations related members in a common place rather than
> always in the first / last vhost_vdpa.
>
> Signed-off-by: Eugenio Pérez 
> Acked-by: Jason Wang 
> ---
>  include/hw/virtio/vhost-vdpa.h |  5 +
>  net/vhost-vdpa.c   | 24 ++--
>  2 files changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 5407d54fd7..eb1a56d75a 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -30,6 +30,10 @@ typedef struct VhostVDPAHostNotifier {
>  void *addr;
>  } VhostVDPAHostNotifier;
>
> +/* Info shared by all vhost_vdpa device models */
> +typedef struct vhost_vdpa_shared {
> +} VhostVDPAShared;
> +
>  typedef struct vhost_vdpa {
>  int device_fd;
>  int index;
> @@ -46,6 +50,7 @@ typedef struct vhost_vdpa {
>  bool suspended;
>  /* IOVA mapping used by the Shadow Virtqueue */
>  VhostIOVATree *iova_tree;
> +VhostVDPAShared *shared;
>  GPtrArray *shadow_vqs;
>  const VhostShadowVirtqueueOps *shadow_vq_ops;
>  void *shadow_vq_ops_opaque;
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index d0614d7954..8b661b9e6d 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -240,6 +240,10 @@ static void vhost_vdpa_cleanup(NetClientState *nc)
>  qemu_close(s->vhost_vdpa.device_fd);
>  s->vhost_vdpa.device_fd = -1;
>  }
> +if (s->vhost_vdpa.index != 0) {
> +return;
> +}
> +g_free(s->vhost_vdpa.shared);
>  }
>
>  /** Dummy SetSteeringEBPF to support RSS for vhost-vdpa backend  */
> @@ -1661,6 +1665,7 @@ static NetClientState 
> *net_vhost_vdpa_init(NetClientState *peer,
> bool svq,
> struct vhost_vdpa_iova_range 
> iova_range,
> uint64_t features,
> +   VhostVDPAShared *shared,
> Error **errp)
>  {
>  NetClientState *nc = NULL;
> @@ -1696,6 +1701,7 @@ static NetClientState 
> *net_vhost_vdpa_init(NetClientState *peer,
>  if (queue_pair_index == 0) {
>  vhost_vdpa_net_valid_svq_features(features,
>>vhost_vdpa.migration_blocker);
> +s->vhost_vdpa.shared = g_new0(VhostVDPAShared, 1);
>  } else if (!is_datapath) {
>  s->cvq_cmd_out_buffer = mmap(NULL, vhost_vdpa_net_cvq_cmd_page_len(),
>   PROT_READ | PROT_WRITE,
> @@ -1708,11 +1714,16 @@ static NetClientState 
> *net_vhost_vdpa_init(NetClientState *peer,
>  s->vhost_vdpa.shadow_vq_ops_opaque = s;
>  s->cvq_isolated = cvq_isolated;
>  }
> +if (queue_pair_index != 0) {
> +s->vhost_vdpa.shared = shared;
> +}
> +
>  ret = vhost_vdpa_add(nc, (void *)>vhost_vdpa, queue_pair_index, nvqs);
>  if (ret) {
>  qemu_del_net_client(nc);
>  return NULL;
>  }
> +
>  return nc;
>  }
>
> @@ -1824,17 +1835,26 @@ int net_init_vhost_vdpa(const Netdev *netdev, const 
> char *name,
>  ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
>
>  for (i = 0; i < queue_pairs; i++) {
> +VhostVDPAShared *shared = NULL;
> +
> +if (i) {
> +shared = DO_UPCAST(VhostVDPAState, nc, 
> ncs[0])->vhost_vdpa.shared;
> +}
>  ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>   vdpa_device_fd, i, 2, true, opts->x_svq,
> -

[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2023-12-18 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17290:

Description: 
Clients are backoff when rpcs cannot be enqueued. However there are different 
scenarios when backoff could happen. Currently there is no way to differenciate 
whether a backoff happened due to lowest prio+disconnection or queue overflow 
from higher priority queues when connection between client and namenode remains 
open. Currently IPC server just emits a single metrics for all the backoffs.

Example:
 # Client are directly enqueued into lowest priority queue and backoff when 
lowest queue is full. Client are expected to disconnect from namenode.
 # Client are enqueued into non-lowest priority queue and overflowed all the 
way down to lowest priority queue and back off. In this case, connection 
between client and namenode remains open.

We would like to add metrics for #1

  was:
Clients are backoff when rpcs cannot be enqueued. However there are different 
scenarios when backoff could happen. Currently there is no way to differenciate 
whether a backoff happened due to lowest prio+disconnection or queue overflow 
from higher ones. IPC server just emits a monolithic metrics for all the 
backoffs.

Example:
 # Client are directly enqueued into lowest priority queue and backoff when 
lowest queue is full. Client are expected to disconnect from namenode.
 # Client are enqueued into non-lowest priority queue and overflowed all the 
way down to lowest priority queue and back off. In this case, connection 
between client and namenode remains open.

We would like to add metrics for #1


> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher priority queues when connection between client and 
> namenode remains open. Currently IPC server just emits a single metrics for 
> all the backoffs.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2023-12-18 Thread Lei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798382#comment-17798382
 ] 

Lei Yang edited comment on HDFS-17290 at 12/18/23 11:48 PM:


[~goiri]  [~simbadzina] Can you please review this and let me know if you have 
any concerns?

fyi [~mccormickt12] 


was (Author: JIRAUSER286942):
[~goiri]  [~simbadzina] Can you please review this and let me know if you have 
any concerns?

> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher ones. IPC server just emits a monolithic metrics 
> for all the backoffs.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2023-12-18 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17290:

Description: 
Clients are backoff when rpcs cannot be enqueued. However there are different 
scenarios when backoff could happen. Currently there is no way to differenciate 
whether a backoff happened due to lowest prio+disconnection or queue overflow 
from higher ones. IPC server just emits a monolithic metrics for all the 
backoffs.

Example:
 # Client are directly enqueued into lowest priority queue and backoff when 
lowest queue is full. Client are expected to disconnect from namenode.
 # Client are enqueued into non-lowest priority queue and overflowed all the 
way down to lowest priority queue and back off. In this case, connection 
between client and namenode remains open.

We would like to add metrics for #1

  was:
Clients are backoff when rpcs cannot be enqueued. However there are different 
scenarios when backoff could happen. Currently there is no way to differenciate 
whether a backoff happened due to lowest prio+disconnection or queue overflow 
from higher ones.

Example:
 # Client are directly enqueued into lowest priority queue and backoff when 
lowest queue is full. Client are expected to disconnect from namenode.
 # Client are enqueued into non-lowest priority queue and overflowed all the 
way down to lowest priority queue and back off. In this case, connection 
between client and namenode remains open.

We would like to add metrics for #1


> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher ones. IPC server just emits a monolithic metrics 
> for all the backoffs.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2023-12-18 Thread Lei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798382#comment-17798382
 ] 

Lei Yang commented on HDFS-17290:
-

[~goiri]  [~simbadzina] Can you please review this and let me know if you have 
any concerns?

> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher ones. IPC server just emits a monolithic metrics 
> for all the backoffs.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2023-12-18 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17290:

Description: 
Clients are backoff when rpcs cannot be enqueued. However there are different 
scenarios when backoff could happen. Currently there is no way to differenciate 
whether a backoff happened due to lowest prio+disconnection or queue overflow 
from higher ones.

Example:
 # Client are directly enqueued into lowest priority queue and backoff when 
lowest queue is full. Client are expected to disconnect from namenode.
 # Client are enqueued into non-lowest priority queue and overflowed all the 
way down to lowest priority queue and back off. In this case, connection 
between client and namenode remains open.

We would like to add metrics for #1

  was:
Clients are backoff when rpcs cannot be enqueued. However there are cases when 
this could happen. Example assumes prio

 
 # Client are directly enqueued into lowest priority queue and backoff when 
lowest queue is full. Client are expected to disconnect from namenode.
 # Client are enqueued into non-lowest priority queue and overflowed all the 
way down to lowest priority queue and back off. In this case, connection 
between client and namenode remains open.


> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.3.6
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher ones.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2023-12-18 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17290:

Affects Version/s: 3.4.0
   (was: 3.3.6)

> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher ones.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2023-12-18 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-17290:

Summary: HDFS: add client rpc backoff metrics due to disconnection from 
lowest priority queue  (was: HDFS: add client rpc backoff metrics due to 
throttling from lowest priority queue)

> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.3.6
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are cases 
> when this could happen. Example assumes prio
>  
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17290) HDFS: add client rpc backoff metrics due to throttling from lowest priority queue

2023-12-14 Thread Lei Yang (Jira)
Lei Yang created HDFS-17290:
---

 Summary: HDFS: add client rpc backoff metrics due to throttling 
from lowest priority queue
 Key: HDFS-17290
 URL: https://issues.apache.org/jira/browse/HDFS-17290
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.3.6, 2.10.0
Reporter: Lei Yang
Assignee: Lei Yang


Clients are backoff when rpcs cannot be enqueued. However there are cases when 
this could happen. Example assumes prio

 
 # Client are directly enqueued into lowest priority queue and backoff when 
lowest queue is full. Client are expected to disconnect from namenode.
 # Client are enqueued into non-lowest priority queue and overflowed all the 
way down to lowest priority queue and back off. In this case, connection 
between client and namenode remains open.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-17290) HDFS: add client rpc backoff metrics due to throttling from lowest priority queue

2023-12-14 Thread Lei Yang (Jira)
Lei Yang created HDFS-17290:
---

 Summary: HDFS: add client rpc backoff metrics due to throttling 
from lowest priority queue
 Key: HDFS-17290
 URL: https://issues.apache.org/jira/browse/HDFS-17290
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.3.6, 2.10.0
Reporter: Lei Yang
Assignee: Lei Yang


Clients are backoff when rpcs cannot be enqueued. However there are cases when 
this could happen. Example assumes prio

 
 # Client are directly enqueued into lowest priority queue and backoff when 
lowest queue is full. Client are expected to disconnect from namenode.
 # Client are enqueued into non-lowest priority queue and overflowed all the 
way down to lowest priority queue and back off. In this case, connection 
between client and namenode remains open.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



Re: [PATCH 9.0 00/13] Consolidate common vdpa members in VhostVDPAShared

2023-11-30 Thread Lei Yang
On Thu, Nov 30, 2023 at 3:38 PM Eugenio Perez Martin
 wrote:
>
> On Thu, Nov 30, 2023 at 4:22 AM Lei Yang  wrote:
> >
> > Hi Eugenio
> >
> > QE performed regression testing after applying this patch. This series
> > patch introduced a qemu core dump bug, for the core dump information
> > please review the attached file.
> >
>
> Hi Lei, thank you very much for the testing!
>
Hi Eugenio

> Can you describe the test steps that lead to the crash? I think you
> removed the vdpa device via QMP, but I'd like to be sure.

Yes, you're right, the core dump occurs when hot unplug nic, please
review the following simple test steps:
Test Steps:
1. create two vdpa device(vdpa0 and vdpa1) with multi queues
2. Boot a guest with vdpa0
3. set_link false to vdpa0
4. hotplug vdpa1
5. stop and resume guest via QMP
6. hotunplug vdpa1, hit core dump in this time

Thanks
Lei
>
> Thanks!
>
> > Tested-by: Lei Yang 
> >
> >
> >
> >
> > On Sat, Nov 25, 2023 at 1:14 AM Eugenio Pérez  wrote:
> > >
> > > Current memory operations like pinning may take a lot of time at the
> > > destination.  Currently they are done after the source of the migration is
> > > stopped, and before the workload is resumed at the destination.  This is a
> > > period where neigher traffic can flow, nor the VM workload can continue
> > > (downtime).
> > >
> > > We can do better as we know the memory layout of the guest RAM at the
> > > destination from the moment the migration starts.  Moving that operation 
> > > allows
> > > QEMU to communicate the kernel the maps while the workload is still 
> > > running in
> > > the source, so Linux can start mapping them.  Ideally, all IOMMU is 
> > > configured,
> > > but if the vDPA parent driver uses on-chip IOMMU and .set_map we're still
> > > saving all the pinning time.
> > >
> > > This is a first required step to consolidate all the members in a common
> > > struct.  This is needed because the destination does not know what 
> > > vhost_vdpa
> > > struct will have the registered listener member, so it is easier to place 
> > > them
> > > in a shared struct rather to keep them in vhost_vdpa struct.
> > >
> > > v1 from RFC:
> > > * Fix vhost_vdpa_net_cvq_start checking for always_svq instead of
> > >   shadow_data.  This could cause CVQ not being shadowed if
> > >   vhost_vdpa_net_cvq_start was called in the middle of a migration.
> > >
> > > Eugenio Pérez (13):
> > >   vdpa: add VhostVDPAShared
> > >   vdpa: move iova tree to the shared struct
> > >   vdpa: move iova_range to vhost_vdpa_shared
> > >   vdpa: move shadow_data to vhost_vdpa_shared
> > >   vdpa: use vdpa shared for tracing
> > >   vdpa: move file descriptor to vhost_vdpa_shared
> > >   vdpa: move iotlb_batch_begin_sent to vhost_vdpa_shared
> > >   vdpa: move backend_cap to vhost_vdpa_shared
> > >   vdpa: remove msg type of vhost_vdpa
> > >   vdpa: move iommu_list to vhost_vdpa_shared
> > >   vdpa: use VhostVDPAShared in vdpa_dma_map and unmap
> > >   vdpa: use dev_shared in vdpa_iommu
> > >   vdpa: move memory listener to vhost_vdpa_shared
> > >
> > >  include/hw/virtio/vhost-vdpa.h |  36 +---
> > >  hw/virtio/vdpa-dev.c   |   7 +-
> > >  hw/virtio/vhost-vdpa.c | 160 +
> > >  net/vhost-vdpa.c   | 117 
> > >  hw/virtio/trace-events |  14 +--
> > >  5 files changed, 174 insertions(+), 160 deletions(-)
> > >
> > > --
> > > 2.39.3
> > >
> > >
>




Re: [PATCH 9.0 00/13] Consolidate common vdpa members in VhostVDPAShared

2023-11-29 Thread Lei Yang
Hi Eugenio

QE performed regression testing after applying this patch. This series
patch introduced a qemu core dump bug, for the core dump information
please review the attached file.

Tested-by: Lei Yang 




On Sat, Nov 25, 2023 at 1:14 AM Eugenio Pérez  wrote:
>
> Current memory operations like pinning may take a lot of time at the
> destination.  Currently they are done after the source of the migration is
> stopped, and before the workload is resumed at the destination.  This is a
> period where neigher traffic can flow, nor the VM workload can continue
> (downtime).
>
> We can do better as we know the memory layout of the guest RAM at the
> destination from the moment the migration starts.  Moving that operation 
> allows
> QEMU to communicate the kernel the maps while the workload is still running in
> the source, so Linux can start mapping them.  Ideally, all IOMMU is 
> configured,
> but if the vDPA parent driver uses on-chip IOMMU and .set_map we're still
> saving all the pinning time.
>
> This is a first required step to consolidate all the members in a common
> struct.  This is needed because the destination does not know what vhost_vdpa
> struct will have the registered listener member, so it is easier to place them
> in a shared struct rather to keep them in vhost_vdpa struct.
>
> v1 from RFC:
> * Fix vhost_vdpa_net_cvq_start checking for always_svq instead of
>   shadow_data.  This could cause CVQ not being shadowed if
>   vhost_vdpa_net_cvq_start was called in the middle of a migration.
>
> Eugenio Pérez (13):
>   vdpa: add VhostVDPAShared
>   vdpa: move iova tree to the shared struct
>   vdpa: move iova_range to vhost_vdpa_shared
>   vdpa: move shadow_data to vhost_vdpa_shared
>   vdpa: use vdpa shared for tracing
>   vdpa: move file descriptor to vhost_vdpa_shared
>   vdpa: move iotlb_batch_begin_sent to vhost_vdpa_shared
>   vdpa: move backend_cap to vhost_vdpa_shared
>   vdpa: remove msg type of vhost_vdpa
>   vdpa: move iommu_list to vhost_vdpa_shared
>   vdpa: use VhostVDPAShared in vdpa_dma_map and unmap
>   vdpa: use dev_shared in vdpa_iommu
>   vdpa: move memory listener to vhost_vdpa_shared
>
>  include/hw/virtio/vhost-vdpa.h |  36 +---
>  hw/virtio/vdpa-dev.c   |   7 +-
>  hw/virtio/vhost-vdpa.c | 160 +
>  net/vhost-vdpa.c   | 117 
>  hw/virtio/trace-events |  14 +--
>  5 files changed, 174 insertions(+), 160 deletions(-)
>
> --
> 2.39.3
>
>


core_dump_info
Description: Binary data


Re: [PATCH 0/3] vhost: clean up device reset

2023-10-02 Thread Lei Yang
QE tested a regression testing on this series with vhost-vdpa device,
everything is working fine.

Tested-by: Lei Yang 

On Thu, Sep 28, 2023 at 6:40 PM Eugenio Perez Martin
 wrote:
>
> On Wed, Sep 27, 2023 at 9:27 PM Stefan Hajnoczi  wrote:
> >
> > Stateful vhost devices may need to free resources or clear device state upon
> > device reset. The vhost-user protocol has a VHOST_USER_RESET_DEVICE message 
> > for
> > this and vDPA has SET_STATUS 0, but only QEMU's vhost-user-scsi device 
> > actually
> > implements this today.
> >
> > This patch series performs device reset across all device types. When
> > virtio_reset() is called, the associated vhost_dev's ->vhost_reset_device() 
> > is
> > called. vhost-user-scsi's one-off implementation is obsoleted and removed.
> >
> > This patch affects behavior as follows:
> > - vhost-kernel: no change in behavior. No ioctl calls are made.
> > - vhost-user: back-ends that negotiate
> >   VHOST_USER_PROTOCOL_F_RESET_DEVICE now receive a
> >   VHOST_USER_DEVICE_RESET message upon device reset. Otherwise there is
> >   no change in behavior. DPDK, SPDK, libvhost-user, and the
> >   vhost-user-backend crate do not negotiate
> >   VHOST_USER_PROTOCOL_F_RESET_DEVICE automatically.
> > - vhost-vdpa: an extra SET_STATUS 0 call is made during device reset.
> >
> > I have tested this series with vhost-net (kernel), vhost-user-blk, and
> > vhost-user-fs (both Rust and legacy C).
> >
>
> Acked-by: Eugenio Pérez 
>
> > Stefan Hajnoczi (3):
> >   vhost-user: do not send RESET_OWNER on device reset
> >   vhost-backend: remove vhost_kernel_reset_device()
> >   virtio: call ->vhost_reset_device() during reset
> >
> >  include/hw/virtio/vhost.h |  3 +++
> >  hw/scsi/vhost-user-scsi.c | 20 
> >  hw/virtio/vhost-backend.c |  6 --
> >  hw/virtio/vhost-user.c| 13 +
> >  hw/virtio/vhost.c |  9 +
> >  hw/virtio/virtio.c|  4 
> >  6 files changed, 25 insertions(+), 30 deletions(-)
> >
> > --
> > 2.41.0
> >
>
>




Re: [PATCH v2 0/3] Follow VirtIO initialization properly at vdpa net cvq isolation probing

2023-09-20 Thread Lei Yang
QE tested this series with regression testing, everything works fine.

Tested-by: Lei Yang 

On Sat, Sep 16, 2023 at 1:08 AM Eugenio Pérez  wrote:
>
> This series solves a few issues.  The most obvious is that the feature set was
> done previous to ACKNOWLEDGE | DRIVER status bit set.  Current vdpa devices 
> are
> permissive with this, but it is better to follow the standard.
>
> Apart from that it fixes two issues reported by Peter Maydell:
> * Stop probing CVQ isolation if cannot set features (goto missed).
> * Fix incorrect error message statis "error setting features", while it should
> say status.
>
> v2: add forgotten Fixes tag
>
> Eugenio Pérez (3):
>   vdpa net: fix error message setting virtio status
>   vdpa net: stop probing if cannot set features
>   vdpa net: follow VirtIO initialization properly at cvq isolation
> probing
>
>  net/vhost-vdpa.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
>
> --
> 2.39.3
>
>




Re: [PATCH] vdpa net: zero vhost_vdpa iova_tree pointer at cleanup

2023-09-14 Thread Lei Yang
QE tested this patch with real nic,guest can works well after
cancelling migration.

Tested-by: Lei Yang 

On Thu, Sep 14, 2023 at 11:23 AM Jason Wang  wrote:
>
> On Wed, Sep 13, 2023 at 8:34 PM Eugenio Pérez  wrote:
> >
> > Not zeroing it causes a SIGSEGV if the live migration is cancelled, at
> > net device restart.
> >
> > This is caused because CVQ tries to reuse the iova_tree that is present
> > in the first vhost_vdpa device at the end of vhost_vdpa_net_cvq_start.
> > As a consequence, it tries to access an iova_tree that has been already
> > free.
> >
> > Fixes: 00ef422e9fbf ("vdpa net: move iova tree creation from init to start")
> > Reported-by: Yanhui Ma 
> > Signed-off-by: Eugenio Pérez 
>
> Acked-by: Jason Wang 
>
> Thanks
>
> > ---
> >  net/vhost-vdpa.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 34202ca009..1714ff4b11 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -385,6 +385,8 @@ static void vhost_vdpa_net_client_stop(NetClientState 
> > *nc)
> >  dev = s->vhost_vdpa.dev;
> >  if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> >  g_clear_pointer(>vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > +} else {
> > +s->vhost_vdpa.iova_tree = NULL;
> >  }
> >  }
> >
> > --
> > 2.39.3
> >
>




Re: [PATCH v3 0/5] Enable vdpa net migration with features depending on CVQ

2023-08-28 Thread Lei Yang
QE tested this series with MAC and MQ changes, and the guest migrated
successfully with "x-svq=off" or without this parameter.

Tested-by: Lei Yang 


On Tue, Aug 22, 2023 at 4:53 PM Eugenio Pérez  wrote:
>
> At this moment the migration of net features that depends on CVQ is not
> possible, as there is no reliable way to restore the device state like mac
> address, number of enabled queues, etc to the destination.  This is mainly
> caused because the device must only read CVQ, and process all the commands
> before resuming the dataplane.
>
> This series lift that requirement, sending the VHOST_VDPA_SET_VRING_ENABLE
> ioctl for dataplane vqs only after the device has processed all commands.
> ---
> v3:
> * Fix subject typo and expand message of patch ("vdpa: move
>   vhost_vdpa_set_vring_ready to the caller").
>
> v2:
> * Factor out VRING_ENABLE ioctls from vhost_vdpa_dev_start to the caller,
>   instead of providing a callback to know if it must be called or not.
> * at https://lists.nongnu.org/archive/html/qemu-devel/2023-07/msg05447.html
>
> RFC:
> * Enable vqs early in case CVQ cannot be shadowed.
> * at https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg01325.html
>
> Eugenio Pérez (5):
>   vdpa: use first queue SVQ state for CVQ default
>   vdpa: export vhost_vdpa_set_vring_ready
>   vdpa: rename vhost_vdpa_net_load to vhost_vdpa_net_cvq_load
>   vdpa: move vhost_vdpa_set_vring_ready to the caller
>   vdpa: remove net cvq migration blocker
>
>  include/hw/virtio/vhost-vdpa.h |  1 +
>  hw/virtio/vdpa-dev.c   |  3 ++
>  hw/virtio/vhost-vdpa.c | 22 +-
>  net/vhost-vdpa.c   | 75 +++---
>  hw/virtio/trace-events |  2 +-
>  5 files changed, 57 insertions(+), 46 deletions(-)
>
> --
> 2.39.3
>
>




Re: [PATCH v3 0/4] Vhost-vdpa Shadow Virtqueue VLAN support

2023-08-02 Thread Lei Yang
QE tested v3 of this series using the test steps provided by Hawkins
and everything works fine.

Tested-by: Lei Yang 

On Sun, Jul 23, 2023 at 8:10 PM Hawkins Jiawei  wrote:
>
> This series enables shadowed CVQ to intercept VLAN commands
> through shadowed CVQ, update the virtio NIC device model
> so qemu send it in a migration, and the restore of that
> VLAN state in the destination.
>
> ChangeLog
> =
> v3:
>  - remove the extra "From" line in patch 1
> "virtio-net: do not reset vlan filtering at set_features"
>
> v2: https://lore.kernel.org/all/cover.1690100802.git.yin31...@gmail.com/
>  - remove the extra line pointed out by Eugenio in patch 3
> "vdpa: Restore vlan filtering state"
>
> v1: https://lore.kernel.org/all/cover.1689690854.git.yin31...@gmail.com/
>  - based on patch "[PATCH 0/3] Vhost-vdpa Shadow Virtqueue VLAN support"
> at https://lists.gnu.org/archive/html/qemu-devel/2022-09/msg01016.html
>  - move `MAX_VLAN` macro to include/hw/virtio/virtio-net.h
> instead of net/vhost-vdpa.c
>  - fix conflicts with the master branch
>
>
> TestStep
> 
> 1. test the migration using vp-vdpa device
>   - For L0 guest, boot QEMU with two virtio-net-pci net device with
> `ctrl_vq`, `ctrl_vlan` features on, command line like:
>   -device virtio-net-pci,disable-legacy=on,disable-modern=off,
> iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off,
> indirect_desc=off,queue_reset=off,ctrl_vlan=on,...
>
>   - For L1 guest, apply the patch series and compile the source code,
> start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`,
> `ctrl_vlan` features on, command line like:
>   -netdev type=vhost-vdpa,x-svq=true,...
>   -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on,
> ctrl_vlan=on,...
>
>   - For L2 source guest, run the following bash command:
> ```bash
> #!/bin/sh
>
> for idx in {1..4094}
> do
>   ip link add link eth0 name vlan$idx type vlan id $idx
> done
> ```
>
>   - gdb attaches the L2 dest VM and break at the
> vhost_vdpa_net_load_single_vlan(), and execute the following
> gdbscript
> ```gdbscript
> ignore 1 4094
> c
> ```
>
>   - Execute the live migration in L2 source monitor
>
>   - Result
> * with this series, gdb can hit the breakpoint and continue
> the executing without triggering any error or warning.
>
> Eugenio Pérez (1):
>   virtio-net: do not reset vlan filtering at set_features
>
> Hawkins Jiawei (3):
>   virtio-net: Expose MAX_VLAN
>   vdpa: Restore vlan filtering state
>   vdpa: Allow VIRTIO_NET_F_CTRL_VLAN in SVQ
>
>  hw/net/virtio-net.c|  6 +
>  include/hw/virtio/virtio-net.h |  6 +
>  net/vhost-vdpa.c   | 49 ++
>  3 files changed, 56 insertions(+), 5 deletions(-)
>
> --
> 2.25.1
>
>




Re: [PATCH 0/4] Vhost-vdpa Shadow Virtqueue VLAN support

2023-07-20 Thread Lei Yang
QE tested this series patch with real hardware, it can support setup
vlan for the nic, and the vlan id still can be found after finished
migrate. In addition, this series patch also help to test another
patch, can got the expect result.

Tested-by: Lei Yang 

On Wed, Jul 19, 2023 at 6:54 PM Hawkins Jiawei  wrote:
>
> 在 2023/7/19 15:47, Hawkins Jiawei 写道:
> > This series enables shadowed CVQ to intercept VLAN commands
> > through shadowed CVQ, update the virtio NIC device model
> > so qemu send it in a migration, and the restore of that
> > VLAN state in the destination.
>
> This patch series is based on
> "[PATCH 0/3] Vhost-vdpa Shadow Virtqueue VLAN support" at [1],
> with these changes:
>
>   - move `MAX_VLAN` macro to include/hw/virtio/virtio-net.h
> instead of net/vhost-vdpa.c
>   - fix conflicts with the master branch
>
> Thanks!
>
> [1]. https://lists.gnu.org/archive/html/qemu-devel/2022-09/msg01016.html
>
>
> >
> > TestStep
> > 
> > 1. test the migration using vp-vdpa device
> >- For L0 guest, boot QEMU with two virtio-net-pci net device with
> > `ctrl_vq`, `ctrl_vlan` features on, command line like:
> >-device virtio-net-pci,disable-legacy=on,disable-modern=off,
> > iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off,
> > indirect_desc=off,queue_reset=off,ctrl_vlan=on,...
> >
> >- For L1 guest, apply the patch series and compile the source code,
> > start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`,
> > `ctrl_vlan` features on, command line like:
> >-netdev type=vhost-vdpa,x-svq=true,...
> >-device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on,
> > ctrl_vlan=on,...
> >
> >- For L2 source guest, run the following bash command:
> > ```bash
> > #!/bin/sh
> >
> > for idx in {1..4094}
> > do
> >ip link add link eth0 name vlan$idx type vlan id $idx
> > done
> > ```
> >
> >- gdb attaches the L2 dest VM and break at the
> > vhost_vdpa_net_load_single_vlan(), and execute the following
> > gdbscript
> > ```gdbscript
> > ignore 1 4094
> > c
> > ```
> >
> >- Execute the live migration in L2 source monitor
> >
> >- Result
> >  * with this series, gdb can hit the breakpoint and continue
> > the executing without triggering any error or warning.
> >
> > Eugenio Pérez (1):
> >virtio-net: do not reset vlan filtering at set_features
> >
> > Hawkins Jiawei (3):
> >virtio-net: Expose MAX_VLAN
> >vdpa: Restore vlan filtering state
> >vdpa: Allow VIRTIO_NET_F_CTRL_VLAN in SVQ
> >
> >   hw/net/virtio-net.c|  6 +---
> >   include/hw/virtio/virtio-net.h |  6 
> >   net/vhost-vdpa.c   | 50 ++
> >   3 files changed, 57 insertions(+), 5 deletions(-)
> >
>




Re: [PATCH v3 0/8] vdpa: Send all CVQ state load commands in parallel

2023-07-20 Thread Lei Yang
According to the Hawkins provided steps, I tested two cases based on
applied this series patches and without it. And all tests are based on
the real hardware.
Case 1, without  this series
Source: qemu-system-x86_64: vhost_vdpa_net_load() = 23308 us
Dest: qemu-system-x86_64: vhost_vdpa_net_load() = 23296 us

Case 2, applied  this series
Source: qemu-system-x86_64: vhost_vdpa_net_load() = 6558 us
Dest: qemu-system-x86_64: vhost_vdpa_net_load() = 6539 us

Tested-by: Lei Yang 


On Thu, Jul 20, 2023 at 6:54 AM Lei Yang  wrote:
>
> On Wed, Jul 19, 2023 at 11:25 PM Hawkins Jiawei  wrote:
> >
> > 在 2023/7/19 20:44, Lei Yang 写道:
> > > Hello Hawkins and Michael
> > >
> > > Looks like there are big changes about vp_vdpa, therefore, if needed,
> > > QE can test this series in QE's environment before the patch is
> >
> > Hi Lei,
> >
> > This patch series does not modify the code of vp_vdpa. Instead, it only
> > modifies how QEMU sends SVQ control commands to the vdpa device.
> >
> Hi Hawkins
>
> > Considering that the behavior of the vp_vdpa device differs from that
> > of real vdpa hardware, would it be possible for you to test this patch
> > series on a real vdpa device?
>
> Yes, there is a hardware device to test it , I will update the test
> results ASAP.
>
> BR
> Lei
> >
> > Thanks!
> >
> >
> > > merged, and provide the result.
> > >
> > > BR
> > > Lei
> > >
> > >
> > > On Wed, Jul 19, 2023 at 8:37 PM Hawkins Jiawei  wrote:
> > >>
> > >> 在 2023/7/19 17:11, Michael S. Tsirkin 写道:
> > >>> On Wed, Jul 19, 2023 at 03:53:45PM +0800, Hawkins Jiawei wrote:
> > >>>> This patchset allows QEMU to delay polling and checking the device
> > >>>> used buffer until either the SVQ is full or control commands shadow
> > >>>> buffers are full, instead of polling and checking immediately after
> > >>>> sending each SVQ control command, so that QEMU can send all the SVQ
> > >>>> control commands in parallel, which have better performance 
> > >>>> improvement.
> > >>>>
> > >>>> I use vp_vdpa device to simulate vdpa device, and create 4094 VLANS in
> > >>>> guest to build a test environment for sending multiple CVQ state load
> > >>>> commands. This patch series can improve latency from 10023 us to
> > >>>> 8697 us for about 4099 CVQ state load commands, about 0.32 us per 
> > >>>> command.
> > >>>
> > >>> Looks like a tiny improvement.
> > >>> At the same time we have O(n^2) behaviour with memory mappings.
> > >>
> > >> Hi Michael,
> > >>
> > >> Thanks for your review.
> > >>
> > >> I wonder why you say "we have O(n^2) behaviour on memory mappings" here?
> > >>
> > >>   From my understanding, QEMU maps two page-size buffers as control
> > >> commands shadow buffers at device startup. These buffers then are used
> > >> to cache SVQ control commands, where QEMU fills them with multiple SVQ 
> > >> control
> > >> commands bytes, flushes them when SVQ descriptors are full or these
> > >> control commands shadow buffers reach their capacity.
> > >>
> > >> QEMU repeats this process until all CVQ state load commands have been
> > >> sent in loading.
> > >>
> > >> In this loading process, only control commands shadow buffers
> > >> translation should be relative to memory mappings, which should be
> > >> O(log n) behaviour to my understanding(Please correct me if I am wrong).
> > >>
> > >>> Not saying we must not do this but I think it's worth
> > >>> checking where the bottleneck is. My guess would be
> > >>> vp_vdpa is not doing things in parallel. Want to try fixing that
> > >>
> > >> As for "vp_vdpa is not doing things in parallel.", do you mean
> > >> the vp_vdpa device cannot process QEMU's SVQ control commands
> > >> in parallel?
> > >>
> > >> In this situation, I will try to use real vdpa hardware to
> > >> test the patch series performance.
> > >>
> > >>> to see how far it can be pushed?
> > >>
> > >> Currently, I am involved in the "Add virtio-net Control Virtqueue state
> > >> restore support" project in Google Summer of Code now. Because I am
> > 

Re: [PATCH v3 0/8] vdpa: Send all CVQ state load commands in parallel

2023-07-19 Thread Lei Yang
On Wed, Jul 19, 2023 at 11:25 PM Hawkins Jiawei  wrote:
>
> 在 2023/7/19 20:44, Lei Yang 写道:
> > Hello Hawkins and Michael
> >
> > Looks like there are big changes about vp_vdpa, therefore, if needed,
> > QE can test this series in QE's environment before the patch is
>
> Hi Lei,
>
> This patch series does not modify the code of vp_vdpa. Instead, it only
> modifies how QEMU sends SVQ control commands to the vdpa device.
>
Hi Hawkins

> Considering that the behavior of the vp_vdpa device differs from that
> of real vdpa hardware, would it be possible for you to test this patch
> series on a real vdpa device?

Yes, there is a hardware device to test it , I will update the test
results ASAP.

BR
Lei
>
> Thanks!
>
>
> > merged, and provide the result.
> >
> > BR
> > Lei
> >
> >
> > On Wed, Jul 19, 2023 at 8:37 PM Hawkins Jiawei  wrote:
> >>
> >> 在 2023/7/19 17:11, Michael S. Tsirkin 写道:
> >>> On Wed, Jul 19, 2023 at 03:53:45PM +0800, Hawkins Jiawei wrote:
> >>>> This patchset allows QEMU to delay polling and checking the device
> >>>> used buffer until either the SVQ is full or control commands shadow
> >>>> buffers are full, instead of polling and checking immediately after
> >>>> sending each SVQ control command, so that QEMU can send all the SVQ
> >>>> control commands in parallel, which have better performance improvement.
> >>>>
> >>>> I use vp_vdpa device to simulate vdpa device, and create 4094 VLANS in
> >>>> guest to build a test environment for sending multiple CVQ state load
> >>>> commands. This patch series can improve latency from 10023 us to
> >>>> 8697 us for about 4099 CVQ state load commands, about 0.32 us per 
> >>>> command.
> >>>
> >>> Looks like a tiny improvement.
> >>> At the same time we have O(n^2) behaviour with memory mappings.
> >>
> >> Hi Michael,
> >>
> >> Thanks for your review.
> >>
> >> I wonder why you say "we have O(n^2) behaviour on memory mappings" here?
> >>
> >>   From my understanding, QEMU maps two page-size buffers as control
> >> commands shadow buffers at device startup. These buffers then are used
> >> to cache SVQ control commands, where QEMU fills them with multiple SVQ 
> >> control
> >> commands bytes, flushes them when SVQ descriptors are full or these
> >> control commands shadow buffers reach their capacity.
> >>
> >> QEMU repeats this process until all CVQ state load commands have been
> >> sent in loading.
> >>
> >> In this loading process, only control commands shadow buffers
> >> translation should be relative to memory mappings, which should be
> >> O(log n) behaviour to my understanding(Please correct me if I am wrong).
> >>
> >>> Not saying we must not do this but I think it's worth
> >>> checking where the bottleneck is. My guess would be
> >>> vp_vdpa is not doing things in parallel. Want to try fixing that
> >>
> >> As for "vp_vdpa is not doing things in parallel.", do you mean
> >> the vp_vdpa device cannot process QEMU's SVQ control commands
> >> in parallel?
> >>
> >> In this situation, I will try to use real vdpa hardware to
> >> test the patch series performance.
> >>
> >>> to see how far it can be pushed?
> >>
> >> Currently, I am involved in the "Add virtio-net Control Virtqueue state
> >> restore support" project in Google Summer of Code now. Because I am
> >> uncertain about the time it will take to fix that problem in the vp_vdpa
> >> device, I prefer to complete the gsoc project first.
> >>
> >> Thanks!
> >>
> >>
> >>>
> >>>
> >>>> Note that this patch should be based on
> >>>> patch "Vhost-vdpa Shadow Virtqueue VLAN support" at [1].
> >>>>
> >>>> [1]. https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg03719.html
> >>>>
> >>>> TestStep
> >>>> 
> >>>> 1. regression testing using vp-vdpa device
> >>>> - For L0 guest, boot QEMU with two virtio-net-pci net device with
> >>>> `ctrl_vq`, `ctrl_rx`, `ctrl_rx_extra` features on, command line like:
> >>>> -device virtio-net-pci,disable-legacy=on,disable-modern=off,
> >>>> iommu_platform=on,

Re: [PATCH v3 0/8] vdpa: Send all CVQ state load commands in parallel

2023-07-19 Thread Lei Yang
Hello Hawkins and Michael

Looks like there are big changes about vp_vdpa, therefore, if needed,
QE can test this series in QE's environment before the patch is
merged, and provide the result.

BR
Lei


On Wed, Jul 19, 2023 at 8:37 PM Hawkins Jiawei  wrote:
>
> 在 2023/7/19 17:11, Michael S. Tsirkin 写道:
> > On Wed, Jul 19, 2023 at 03:53:45PM +0800, Hawkins Jiawei wrote:
> >> This patchset allows QEMU to delay polling and checking the device
> >> used buffer until either the SVQ is full or control commands shadow
> >> buffers are full, instead of polling and checking immediately after
> >> sending each SVQ control command, so that QEMU can send all the SVQ
> >> control commands in parallel, which have better performance improvement.
> >>
> >> I use vp_vdpa device to simulate vdpa device, and create 4094 VLANS in
> >> guest to build a test environment for sending multiple CVQ state load
> >> commands. This patch series can improve latency from 10023 us to
> >> 8697 us for about 4099 CVQ state load commands, about 0.32 us per command.
> >
> > Looks like a tiny improvement.
> > At the same time we have O(n^2) behaviour with memory mappings.
>
> Hi Michael,
>
> Thanks for your review.
>
> I wonder why you say "we have O(n^2) behaviour on memory mappings" here?
>
>  From my understanding, QEMU maps two page-size buffers as control
> commands shadow buffers at device startup. These buffers then are used
> to cache SVQ control commands, where QEMU fills them with multiple SVQ control
> commands bytes, flushes them when SVQ descriptors are full or these
> control commands shadow buffers reach their capacity.
>
> QEMU repeats this process until all CVQ state load commands have been
> sent in loading.
>
> In this loading process, only control commands shadow buffers
> translation should be relative to memory mappings, which should be
> O(log n) behaviour to my understanding(Please correct me if I am wrong).
>
> > Not saying we must not do this but I think it's worth
> > checking where the bottleneck is. My guess would be
> > vp_vdpa is not doing things in parallel. Want to try fixing that
>
> As for "vp_vdpa is not doing things in parallel.", do you mean
> the vp_vdpa device cannot process QEMU's SVQ control commands
> in parallel?
>
> In this situation, I will try to use real vdpa hardware to
> test the patch series performance.
>
> > to see how far it can be pushed?
>
> Currently, I am involved in the "Add virtio-net Control Virtqueue state
> restore support" project in Google Summer of Code now. Because I am
> uncertain about the time it will take to fix that problem in the vp_vdpa
> device, I prefer to complete the gsoc project first.
>
> Thanks!
>
>
> >
> >
> >> Note that this patch should be based on
> >> patch "Vhost-vdpa Shadow Virtqueue VLAN support" at [1].
> >>
> >> [1]. https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg03719.html
> >>
> >> TestStep
> >> 
> >> 1. regression testing using vp-vdpa device
> >>- For L0 guest, boot QEMU with two virtio-net-pci net device with
> >> `ctrl_vq`, `ctrl_rx`, `ctrl_rx_extra` features on, command line like:
> >>-device virtio-net-pci,disable-legacy=on,disable-modern=off,
> >> iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off,
> >> indirect_desc=off,queue_reset=off,ctrl_rx=on,ctrl_rx_extra=on,...
> >>
> >>- For L1 guest, apply the patch series and compile the source code,
> >> start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`,
> >> `ctrl_rx`, `ctrl_rx_extra` features on, command line like:
> >>-netdev type=vhost-vdpa,x-svq=true,...
> >>-device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on,
> >> ctrl_rx=on,ctrl_rx_extra=on...
> >>
> >>- For L2 source guest, run the following bash command:
> >> ```bash
> >> #!/bin/sh
> >>
> >> for idx1 in {0..9}
> >> do
> >>for idx2 in {0..9}
> >>do
> >>  for idx3 in {0..6}
> >>  do
> >>ip link add macvlan$idx1$idx2$idx3 link eth0
> >> address 4a:30:10:19:$idx1$idx2:1$idx3 type macvlan mode bridge
> >>ip link set macvlan$idx1$idx2$idx3 up
> >>  done
> >>done
> >> done
> >> ```
> >>- Execute the live migration in L2 source monitor
> >>
> >>- Result
> >>  * with this series, QEMU should not trigger any error or warning.
> >>
> >>
> >>
> >> 2. perf using vp-vdpa device
> >>- For L0 guest, boot QEMU with two virtio-net-pci net device with
> >> `ctrl_vq`, `ctrl_vlan` features on, command line like:
> >>-device virtio-net-pci,disable-legacy=on,disable-modern=off,
> >> iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off,
> >> indirect_desc=off,queue_reset=off,ctrl_vlan=on,...
> >>
> >>- For L1 guest, apply the patch series, then apply an addtional
> >> patch to record the load time in microseconds as following:
> >> ```diff
> >> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> >> index 6b958d6363..501b510fd2 100644
> >> --- a/hw/net/vhost_net.c
> >> +++ b/hw/net/vhost_net.c
> >> @@ -295,7 

Re: [PATCH v3 0/3] vdpa: Return -EIO if device ack is VIRTIO_NET_ERR

2023-07-06 Thread Lei Yang
On Wed, Jul 5, 2023 at 7:03 PM Hawkins Jiawei  wrote:
>
> On 2023/7/5 15:59, Lei Yang wrote:
> > Hello Hawkins
> >
> > QE can help test this series before  it is merged into master, I would
> > like to know what test steps can cover this series related scenario?
> >
>
> Hi, I would like to suggest the following steps to test this patch series:
>
> 1.  Modify the QEMU source code to make the device return a
> VIRTIO_NET_ERR for the CVQ command. Please apply the patch
> provided below:
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 373609216f..58ade6d4e0 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -642,7 +642,7 @@ static int vhost_vdpa_net_load_mac(VhostVDPAState
> *s, const VirtIONet *n)
>   if (virtio_vdev_has_feature(>parent_obj,
> VIRTIO_NET_F_CTRL_MAC_ADDR)) {
>   ssize_t dev_written = vhost_vdpa_net_load_cmd(s,
> VIRTIO_NET_CTRL_MAC,
>
> VIRTIO_NET_CTRL_MAC_ADDR_SET,
> -  n->mac, sizeof(n->mac));
> +  n->mac,
> sizeof(n->mac) - 1);
>   if (unlikely(dev_written < 0)) {
>   return dev_written;
>   }
>
> 2. Start QEMU with the vdpa device in default state.
> Without the patch series, QEMU should not trigger any errors or warnings.
> With the series applied, QEMU should trigger the warning like
> "qemu-system-x86_64: unable to start vhost net: 5: falling back on
> userspace virtio".

Based on the above steps, QE tests it without the above patch first,
it will not trigger any errors or warnings. Then QE manually applied
the above patch, boot guest again, it can trigger this warning:
qemu-system-x86_64: unable to start vhost net: 5: falling back on
userspace virtio, this is an expected result.

Tested-by: Lei Yang 

BR
Lei

>
> Thanks!
>
>
> > Thanks
> > Lei
> >
> > On Tue, Jul 4, 2023 at 11:36 AM Hawkins Jiawei  wrote:
> >>
> >> According to VirtIO standard, "The class, command and
> >> command-specific-data are set by the driver,
> >> and the device sets the ack byte.
> >> There is little it can do except issue a diagnostic
> >> if ack is not VIRTIO_NET_OK."
> >>
> >> Therefore, QEMU should stop sending the queued SVQ commands and
> >> cancel the device startup if the device's ack is not VIRTIO_NET_OK.
> >>
> >> Yet the problem is that, vhost_vdpa_net_load_x() returns 1 based on
> >> `*s->status != VIRTIO_NET_OK` when the device's ack is VIRTIO_NET_ERR.
> >> As a result, net->nc->info->load() also returns 1, this makes
> >> vhost_net_start_one() incorrectly assume the device state is
> >> successfully loaded by vhost_vdpa_net_load() and return 0, instead of
> >> goto `fail` label to cancel the device startup, as vhost_net_start_one()
> >> only cancels the device startup when net->nc->info->load() returns a
> >> negative value.
> >>
> >> This patchset fixes this problem by returning -EIO when the device's
> >> ack is not VIRTIO_NET_OK.
> >>
> >> Changelog
> >> =
> >> v3:
> >>   - split the fixes suggested by Eugenio
> >>   - return -EIO suggested by Michael
> >>
> >> v2: 
> >> https://lore.kernel.org/all/69010e9ebb5e3729aef595ed92840f43e48e53e5.1687875592.git.yin31...@gmail.com/
> >>   - fix the same bug in vhost_vdpa_net_load_offloads()
> >>
> >> v1: https://lore.kernel.org/all/cover.1686746406.git.yin31...@gmail.com/
> >>
> >> Hawkins Jiawei (3):
> >>vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mac()
> >>vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mq()
> >>vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_offloads()
> >>
> >>   net/vhost-vdpa.c | 15 +++
> >>   1 file changed, 11 insertions(+), 4 deletions(-)
> >>
> >> --
> >> 2.25.1
> >>
> >>
> >
>




Re: [PATCH v3 0/3] vdpa: Return -EIO if device ack is VIRTIO_NET_ERR

2023-07-05 Thread Lei Yang
Hello Hawkins

QE can help test this series before  it is merged into master, I would
like to know what test steps can cover this series related scenario?

Thanks
Lei

On Tue, Jul 4, 2023 at 11:36 AM Hawkins Jiawei  wrote:
>
> According to VirtIO standard, "The class, command and
> command-specific-data are set by the driver,
> and the device sets the ack byte.
> There is little it can do except issue a diagnostic
> if ack is not VIRTIO_NET_OK."
>
> Therefore, QEMU should stop sending the queued SVQ commands and
> cancel the device startup if the device's ack is not VIRTIO_NET_OK.
>
> Yet the problem is that, vhost_vdpa_net_load_x() returns 1 based on
> `*s->status != VIRTIO_NET_OK` when the device's ack is VIRTIO_NET_ERR.
> As a result, net->nc->info->load() also returns 1, this makes
> vhost_net_start_one() incorrectly assume the device state is
> successfully loaded by vhost_vdpa_net_load() and return 0, instead of
> goto `fail` label to cancel the device startup, as vhost_net_start_one()
> only cancels the device startup when net->nc->info->load() returns a
> negative value.
>
> This patchset fixes this problem by returning -EIO when the device's
> ack is not VIRTIO_NET_OK.
>
> Changelog
> =
> v3:
>  - split the fixes suggested by Eugenio
>  - return -EIO suggested by Michael
>
> v2: 
> https://lore.kernel.org/all/69010e9ebb5e3729aef595ed92840f43e48e53e5.1687875592.git.yin31...@gmail.com/
>  - fix the same bug in vhost_vdpa_net_load_offloads()
>
> v1: https://lore.kernel.org/all/cover.1686746406.git.yin31...@gmail.com/
>
> Hawkins Jiawei (3):
>   vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mac()
>   vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mq()
>   vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_offloads()
>
>  net/vhost-vdpa.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
>
> --
> 2.25.1
>
>




Re: [PATCH v3 0/2] vdpa: Refactor vdpa_feature_bits array

2023-07-03 Thread Lei Yang
QE tested this series with regression tests and live migration test on
the vhost_vdpa device, there are no new problems.

Tested-by: Lei Yang 


On Fri, Jun 30, 2023 at 9:27 PM Hawkins Jiawei  wrote:
>
> This patchset removes the duplicated VIRTIO_NET_F_RSS entry
> in vdpa_feature_bits array and sorts the vdpa_feature_bits array
> alphabetically in ascending order to avoid future duplicates.
>
> Changelog
> =
> v3:
>   - sort array alphabetically suggested by Philippe
>
> v2: https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg06764.html
>   - resolve conflicts with the master branch
>
> v1: https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg01583.html
>
> Hawkins Jiawei (2):
>   vdpa: Delete duplicated VIRTIO_NET_F_RSS in vdpa_feature_bits
>   vdpa: Sort vdpa_feature_bits array alphabetically
>
>  net/vhost-vdpa.c | 40 +++-
>  1 file changed, 23 insertions(+), 17 deletions(-)
>
> --
> 2.25.1
>
>




Re: [PATCH] vdpa: mask _F_CTRL_GUEST_OFFLOADS for vhost vdpa devices

2023-06-07 Thread Lei Yang
QE tested sanity testing for this patch on the vhost_vdpa device,
everything works fine.

Tested-by: Lei Yang 

On Tue, Jun 6, 2023 at 9:33 AM Jason Wang  wrote:
>
> On Sat, Jun 3, 2023 at 1:33 AM Eugenio Pérez  wrote:
> >
> > QEMU does not emulate it so it must be disabled as long as the backend
> > does not support it.
> >
> > Signed-off-by: Eugenio Pérez 
>
> Acked-by: Jason Wang 
>
> Thanks
>
> > ---
> >  net/vhost-vdpa.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 5360924ba0..427a57dd6f 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -54,6 +54,7 @@ const int vdpa_feature_bits[] = {
> >  VIRTIO_F_VERSION_1,
> >  VIRTIO_NET_F_CSUM,
> >  VIRTIO_NET_F_GUEST_CSUM,
> > +VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
> >  VIRTIO_NET_F_GSO,
> >  VIRTIO_NET_F_GUEST_TSO4,
> >  VIRTIO_NET_F_GUEST_TSO6,
> > --
> > 2.31.1
> >
>




Re: [PATCH] vdpa: fix not using CVQ buffer in case of error

2023-06-07 Thread Lei Yang
QE tested sanity testing for this patch on the vhost_vdpa device,
everything works fine.

Tested-by: Lei Yang 


On Tue, Jun 6, 2023 at 9:32 AM Jason Wang  wrote:
>
> On Sat, Jun 3, 2023 at 1:35 AM Eugenio Pérez  wrote:
> >
> > Bug introducing when refactoring.  Otherway, the guest never received
> > the used buffer.
> >
> > Fixes: be4278b65fc1 ("vdpa: extract vhost_vdpa_net_cvq_add from 
> > vhost_vdpa_net_handle_ctrl_avail")
> > Signed-off-by: Eugenio Pérez 
>
> Acked-by: Jason Wang 
>
> Thanks
>
> > ---
> >  net/vhost-vdpa.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 16d47f7b3c..5360924ba0 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -807,7 +807,7 @@ static int 
> > vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
> >  }
> >
> >  if (*s->status != VIRTIO_NET_OK) {
> > -return VIRTIO_NET_ERR;
> > +goto out;
> >  }
> >
> >  status = VIRTIO_NET_ERR;
> > --
> > 2.31.1
> >
>




Re: [PATCH v2 1/3] vdpa: do not block migration if device has cvq and x-svq=on

2023-06-02 Thread Lei Yang
QE tested this series with vhost_vdpa and x-svq=on, guest can migrate
and everything works well.

Tested-by: Lei Yang 

On Fri, Jun 2, 2023 at 10:39 PM Eugenio Pérez  wrote:
>
> It was a mistake to forbid in all cases, as SVQ is already able to send
> all the CVQ messages before start forwarding data vqs.  It actually
> caused a regression, making impossible to migrate device previously
> migratable.
>
> Fixes: 36e4647247f2 ("vdpa: add vhost_vdpa_net_valid_svq_features")
> Signed-off-by: Eugenio Pérez 
> ---
>  net/vhost-vdpa.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 37cdc84562..c63456ff7c 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -837,13 +837,16 @@ static NetClientState 
> *net_vhost_vdpa_init(NetClientState *peer,
>  s->vhost_vdpa.shadow_vq_ops_opaque = s;
>
>  /*
> - * TODO: We cannot migrate devices with CVQ as there is no way to set
> - * the device state (MAC, MQ, etc) before starting the datapath.
> + * TODO: We cannot migrate devices with CVQ and no x-svq enabled as
> + * there is no way to set the device state (MAC, MQ, etc) before
> + * starting the datapath.
>   *
>   * Migration blocker ownership now belongs to s->vhost_vdpa.
>   */
> -error_setg(>vhost_vdpa.migration_blocker,
> -   "net vdpa cannot migrate with CVQ feature");
> +if (!svq) {
> +error_setg(>vhost_vdpa.migration_blocker,
> +   "net vdpa cannot migrate with CVQ feature");
> +}
>  }
>  ret = vhost_vdpa_add(nc, (void *)>vhost_vdpa, queue_pair_index, nvqs);
>  if (ret) {
> --
> 2.31.1
>




Re: [PATCH v4 0/6] Vhost-vdpa Shadow Virtqueue Offloads support

2023-06-02 Thread Lei Yang
QE tested this series v4 on the vp_vdpa device with the following
scenarios: reboot/shutdown, hotplug/unplug, nic driver load/unload,
ping, vDPA control virtqueue(changed the mac address), everything are
working fine. And L1 guest will not appear  error messages "vdpa svq
does not work with features 0x4" after applied this series of patches.

Tested-by: Lei Yang 


On Fri, Jun 2, 2023 at 7:55 PM Hawkins Jiawei  wrote:
>
> This series enables shadowed CVQ to intercept Offloads commands
> through shadowed CVQ, update the virtio NIC device model so qemu
> send it in a migration, and the restore of that Offloads state
> in the destination.
>
> Changelog
> =
> v4:
>   - refactor the commit message suggested by Eugenio in patch#4
> "virtio-net: expose virtio_net_supported_guest_offloads()"
>   - fix the wrong "cpu_to_le64()" pointed out by Eugenio in patch$5
> "vdpa: Add vhost_vdpa_net_load_offloads()"
>   - refactor the comment in patch#5
> "vdpa: Add vhost_vdpa_net_load_offloads()"
>
> v3: https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg00206.html
>
> v2: https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg00044.html
>
> v1: https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg07198.html
>
> Hawkins Jiawei (6):
>   include/hw/virtio: make some VirtIODevice const
>   vdpa: reuse virtio_vdev_has_feature()
>   hw/net/virtio-net: make some VirtIONet const
>   virtio-net: expose virtio_net_supported_guest_offloads()
>   vdpa: Add vhost_vdpa_net_load_offloads()
>   vdpa: Allow VIRTIO_NET_F_CTRL_GUEST_OFFLOADS in SVQ
>
>  hw/net/virtio-net.c|  2 +-
>  include/hw/virtio/virtio-net.h |  1 +
>  include/hw/virtio/virtio.h |  2 +-
>  net/vhost-vdpa.c   | 49 +++---
>  4 files changed, 48 insertions(+), 6 deletions(-)
>
> --
> 2.25.1
>
>




Re: [PATCH v4 0/2] Move ASID test to vhost-vdpa net initialization

2023-06-02 Thread Lei Yang
QE did a sanity test on v4 of this series using the vdpa_sim
device,everything is working fine.

Tested-by: Lei Yang 

On Fri, May 26, 2023 at 11:31 PM Eugenio Pérez  wrote:
>
> QEMU v8.0 is able to switch dynamically between vhost-vdpa passthrough
> and SVQ mode as long as the net device does not have CVQ.  The net device
> state followed (and migrated) by CVQ requires special care.
>
> A pre-requisite to add CVQ to that framework is to determine if devices with
> CVQ are migratable or not at initialization time.  The solution to it is to
> always shadow only CVQ, and vq groups and ASID are used for that.
>
> However, current qemu version only checks ASID at device start (as "driver set
> DRIVER_OK status bit"), not at device initialization.  A check at
> initialization time is required.  Otherwise, the guest would be able to set
> and remove migration blockers at will [1].
>
> This series is a requisite for migration of vhost-vdpa net devices with CVQ.
> However it already makes sense by its own, as it reduces the number of ioctls
> at migration time, decreasing the error paths there.
>
> [1] 
> https://lore.kernel.org/qemu-devel/2616f0cd-f9e8-d183-ea78-db1be4825...@redhat.com/
> ---
> v4:
> * Only probe one of MQ or !MQ.
> * Merge vhost_vdpa_cvq_is_isolated in vhost_vdpa_probe_cvq_isolation
> * Call ioctl directly instead of adding functions.
>
> v3:
> * Only record cvq_isolated, true if the device have cvq isolated in both !MQ
> * and MQ configurations.
> * Drop the cache of cvq group, it can be done on top
>
> v2:
> * Take out the reset of the device from vhost_vdpa_cvq_is_isolated
>   (reported by Lei Yang).
> * Expand patch messages by Stefano G. questions.
>
> Eugenio Pérez (2):
>   vdpa: return errno in vhost_vdpa_get_vring_group error
>   vdpa: move CVQ isolation check to net_init_vhost_vdpa
>
>  net/vhost-vdpa.c | 147 ---
>  1 file changed, 112 insertions(+), 35 deletions(-)
>
> --
> 2.31.1
>
>




Re: [PATCH v3 0/6] Vhost-vdpa Shadow Virtqueue Offloads support

2023-06-02 Thread Lei Yang
Hello Hawkins

QE based on the qemu command line [1] to test this series with the
following scenarios: reboot,shutdown,hotplug/unplug,ping, and
offloads(tx,sg,tso,gso,gro), everything is working fine. It's just
that even without applying your patch to test offload there is no
error like "vdpa svq is not available for feature 4".

[1] -device '{"driver": "virtio-net-pci", "mac": "00:11:22:33:44:00",
"id": "net0", "netdev": "hostnet0", "ctrl_guest_offloads": true,
"bus": "pcie-root-port-3", "addr": "0x0"}'  \
-netdev vhost-vdpa,id=hostnet0,vhostdev=/dev/vhost-vdpa-0,x-svq=on \

Tested-by: Lei Yang 




On Thu, Jun 1, 2023 at 9:49 PM Hawkins Jiawei  wrote:
>
> This series enables shadowed CVQ to intercept Offloads commands
> through shadowed CVQ, update the virtio NIC device model so qemu
> send it in a migration, and the restore of that Offloads state
> in the destination.
>
> Changelog
> =
> v3:
>   - refactor the commit message in patch
> "virtio-net: expose virtio_net_supported_guest_offloads()"
>
> v2: https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg00044.html
>
> v1: https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg07198.html
>
> Hawkins Jiawei (6):
>   include/hw/virtio: make some VirtIODevice const
>   vdpa: reuse virtio_vdev_has_feature()
>   hw/net/virtio-net: make some VirtIONet const
>   virtio-net: expose virtio_net_supported_guest_offloads()
>   vdpa: Add vhost_vdpa_net_load_offloads()
>   vdpa: Allow VIRTIO_NET_F_CTRL_GUEST_OFFLOADS in SVQ
>
>  hw/net/virtio-net.c|  2 +-
>  include/hw/virtio/virtio-net.h |  1 +
>  include/hw/virtio/virtio.h |  2 +-
>  net/vhost-vdpa.c   | 45 +++---
>  4 files changed, 44 insertions(+), 6 deletions(-)
>
> --
> 2.25.1
>
>




Re: [PATCH v2 0/6] Vhost-vdpa Shadow Virtqueue Offloads support

2023-06-01 Thread Lei Yang
I'm a QE responsible for vhost_vdpa parts. Could you please provide me
with the test steps for this series? I can test it in my environment
and update the test results.





On Thu, Jun 1, 2023 at 4:29 PM Hawkins Jiawei  wrote:
>
> This series enables shadowed CVQ to intercept Offloads commands
> through shadowed CVQ, update the virtio NIC device model so qemu
> send it in a migration, and the restore of that Offloads state
> in the destination.
>
> Changelog
> =
> v2:
>   - make some function arguments const
>   - reuse virtio_vdev_has_feature() suggested by Eugenio and Jason
>   - avoid sending CVQ command in default state suggested by Eugenio
>
> v1: https://lore.kernel.org/all/cover.1685359572.git.yin31...@gmail.com/
>
> Hawkins Jiawei (6):
>   include/hw/virtio: make some VirtIODevice const
>   vdpa: reuse virtio_vdev_has_feature()
>   hw/net/virtio-net: make some VirtIONet const
>   virtio-net: expose virtio_net_supported_guest_offloads()
>   vdpa: Add vhost_vdpa_net_load_offloads()
>   vdpa: Allow VIRTIO_NET_F_CTRL_GUEST_OFFLOADS in SVQ
>
>  hw/net/virtio-net.c|  2 +-
>  include/hw/virtio/virtio-net.h |  1 +
>  include/hw/virtio/virtio.h |  2 +-
>  net/vhost-vdpa.c   | 45 +++---
>  4 files changed, 44 insertions(+), 6 deletions(-)
>
> --
> 2.25.1
>
>




Re: [PATCH v3 0/5] Move ASID test to vhost-vdpa net initialization

2023-05-17 Thread Lei Yang
QE tested this series with sanity testing on the vdpa_sim device,
everything are works fine and there is no any new regression problems.

Tested-by: Lei Yang 



On Tue, May 9, 2023 at 11:44 PM Eugenio Pérez  wrote:
>
> QEMU v8.0 is able to switch dynamically between vhost-vdpa passthrough
> and SVQ mode as long as the net device does not have CVQ.  The net device
> state followed (and migrated) by CVQ requires special care.
>
> A pre-requisite to add CVQ to that framework is to determine if devices with
> CVQ are migratable or not at initialization time.  The solution to it is to
> always shadow only CVQ, and vq groups and ASID are used for that.
>
> However, current qemu version only checks ASID at device start (as "driver set
> DRIVER_OK status bit"), not at device initialization.  A check at
> initialization time is required.  Otherwise, the guest would be able to set
> and remove migration blockers at will [1].
>
> This series is a requisite for migration of vhost-vdpa net devices with CVQ.
> However it already makes sense by its own, as it reduces the number of ioctls
> at migration time, decreasing the error paths there.
>
> [1] 
> https://lore.kernel.org/qemu-devel/2616f0cd-f9e8-d183-ea78-db1be4825...@redhat.com/
> ---
> v3:
> * Only record cvq_isolated, true if the device have cvq isolated in both !MQ
> * and MQ configurations.
> * Drop the cache of cvq group, it can be done on top
>
> v2:
> * Take out the reset of the device from vhost_vdpa_cvq_is_isolated
>   (reported by Lei Yang).
> * Expand patch messages by Stefano G. questions.
>
> Eugenio Pérez (5):
>   vdpa: Remove status in reset tracing
>   vdpa: add vhost_vdpa_reset_status_fd
>   vdpa: add vhost_vdpa_set_dev_features_fd
>   vdpa: return errno in vhost_vdpa_get_vring_group error
>   vdpa: move CVQ isolation check to net_init_vhost_vdpa
>
>  include/hw/virtio/vhost-vdpa.h |   2 +
>  hw/virtio/vhost-vdpa.c |  78 ++-
>  net/vhost-vdpa.c   | 171 ++---
>  hw/virtio/trace-events |   2 +-
>  4 files changed, 192 insertions(+), 61 deletions(-)
>
> --
> 2.31.1
>
>




Re: [PATCH RESEND] vhost: fix possible wrap in SVQ descriptor ring

2023-05-10 Thread Lei Yang
QE applied this patch to do sanity testing on vhost-vdpa, there is no any
regression problem.

Tested-by: Lei Yang 


On Wed, May 10, 2023 at 9:32 AM Lei Yang  wrote:

> QE applied this patch to do sanity testing on vhost-net, there is no
> any regression problem.
>
> Tested-by: Lei Yang 
>
>
>
> On Tue, May 9, 2023 at 1:28 AM Eugenio Perez Martin 
> wrote:
> >
> > On Sat, May 6, 2023 at 5:01 PM Hawkins Jiawei 
> wrote:
> > >
> > > QEMU invokes vhost_svq_add() when adding a guest's element into SVQ.
> > > In vhost_svq_add(), it uses vhost_svq_available_slots() to check
> > > whether QEMU can add the element into the SVQ. If there is
> > > enough space, then QEMU combines some out descriptors and
> > > some in descriptors into one descriptor chain, and add it into
> > > svq->vring.desc by vhost_svq_vring_write_descs().
> > >
> > > Yet the problem is that, `svq->shadow_avail_idx - svq->shadow_used_idx`
> > > in vhost_svq_available_slots() return the number of occupied elements,
> > > or the number of descriptor chains, instead of the number of occupied
> > > descriptors, which may cause wrapping in SVQ descriptor ring.
> > >
> > > Here is an example. In vhost_handle_guest_kick(), QEMU forwards
> > > as many available buffers to device by virtqueue_pop() and
> > > vhost_svq_add_element(). virtqueue_pop() return a guest's element,
> > > and use vhost_svq_add_elemnt(), a wrapper to vhost_svq_add(), to
> > > add this element into SVQ. If QEMU invokes virtqueue_pop() and
> > > vhost_svq_add_element() `svq->vring.num` times,
> vhost_svq_available_slots()
> > > thinks QEMU just ran out of slots and everything should work fine.
> > > But in fact, virtqueue_pop() return `svq-vring.num` elements or
> > > descriptor chains, more than `svq->vring.num` descriptors, due to
> > > guest memory fragmentation, and this cause wrapping in SVQ descriptor
> ring.
> > >
> >
> > The bug is valid even before marking the descriptors used. If the
> > guest memory is fragmented, SVQ must add chains so it can try to add
> > more descriptors than possible.
> >
> > > Therefore, this patch adds `num_free` field in VhostShadowVirtqueue
> > > structure, updates this field in vhost_svq_add() and
> > > vhost_svq_get_buf(), to record the number of free descriptors.
> > > Then we can avoid wrap in SVQ descriptor ring by refactoring
> > > vhost_svq_available_slots().
> > >
> > > Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding")
> > > Signed-off-by: Hawkins Jiawei 
> > > ---
> > >  hw/virtio/vhost-shadow-virtqueue.c | 9 -
> > >  hw/virtio/vhost-shadow-virtqueue.h | 3 +++
> > >  2 files changed, 11 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c
> b/hw/virtio/vhost-shadow-virtqueue.c
> > > index 8361e70d1b..e1c6952b10 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > @@ -68,7 +68,7 @@ bool vhost_svq_valid_features(uint64_t features,
> Error **errp)
> > >   */
> > >  static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue
> *svq)
> > >  {
> > > -return svq->vring.num - (svq->shadow_avail_idx -
> svq->shadow_used_idx);
> > > +return svq->num_free;
> > >  }
> > >
> > >  /**
> > > @@ -263,6 +263,9 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const
> struct iovec *out_sg,
> > >  return -EINVAL;
> > >  }
> > >
> > > +/* Update the size of SVQ vring free descriptors */
> > > +svq->num_free -= ndescs;
> > > +
> > >  svq->desc_state[qemu_head].elem = elem;
> > >  svq->desc_state[qemu_head].ndescs = ndescs;
> > >  vhost_svq_kick(svq);
> > > @@ -450,6 +453,9 @@ static VirtQueueElement
> *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
> > >  svq->desc_next[last_used_chain] = svq->free_head;
> > >  svq->free_head = used_elem.id;
> > >
> > > +/* Update the size of SVQ vring free descriptors */
> >
> > No need for this comment.
> >
> > Apart from that,
> >
> > Acked-by: Eugenio Pérez 
> >
> > > +svq->num_free += num;
> > > +
> > >  *len = used_elem.len;
> > >  return g_steal_pointer(>desc_state[used_elem.id].elem);
> > >  }
> > > @@ -659,6 +665,

Re: [PATCH RESEND] vhost: fix possible wrap in SVQ descriptor ring

2023-05-09 Thread Lei Yang
QE applied this patch to do sanity testing on vhost-net, there is no
any regression problem.

Tested-by: Lei Yang 



On Tue, May 9, 2023 at 1:28 AM Eugenio Perez Martin  wrote:
>
> On Sat, May 6, 2023 at 5:01 PM Hawkins Jiawei  wrote:
> >
> > QEMU invokes vhost_svq_add() when adding a guest's element into SVQ.
> > In vhost_svq_add(), it uses vhost_svq_available_slots() to check
> > whether QEMU can add the element into the SVQ. If there is
> > enough space, then QEMU combines some out descriptors and
> > some in descriptors into one descriptor chain, and add it into
> > svq->vring.desc by vhost_svq_vring_write_descs().
> >
> > Yet the problem is that, `svq->shadow_avail_idx - svq->shadow_used_idx`
> > in vhost_svq_available_slots() return the number of occupied elements,
> > or the number of descriptor chains, instead of the number of occupied
> > descriptors, which may cause wrapping in SVQ descriptor ring.
> >
> > Here is an example. In vhost_handle_guest_kick(), QEMU forwards
> > as many available buffers to device by virtqueue_pop() and
> > vhost_svq_add_element(). virtqueue_pop() return a guest's element,
> > and use vhost_svq_add_elemnt(), a wrapper to vhost_svq_add(), to
> > add this element into SVQ. If QEMU invokes virtqueue_pop() and
> > vhost_svq_add_element() `svq->vring.num` times, vhost_svq_available_slots()
> > thinks QEMU just ran out of slots and everything should work fine.
> > But in fact, virtqueue_pop() return `svq-vring.num` elements or
> > descriptor chains, more than `svq->vring.num` descriptors, due to
> > guest memory fragmentation, and this cause wrapping in SVQ descriptor ring.
> >
>
> The bug is valid even before marking the descriptors used. If the
> guest memory is fragmented, SVQ must add chains so it can try to add
> more descriptors than possible.
>
> > Therefore, this patch adds `num_free` field in VhostShadowVirtqueue
> > structure, updates this field in vhost_svq_add() and
> > vhost_svq_get_buf(), to record the number of free descriptors.
> > Then we can avoid wrap in SVQ descriptor ring by refactoring
> > vhost_svq_available_slots().
> >
> > Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding")
> > Signed-off-by: Hawkins Jiawei 
> > ---
> >  hw/virtio/vhost-shadow-virtqueue.c | 9 -
> >  hw/virtio/vhost-shadow-virtqueue.h | 3 +++
> >  2 files changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> > b/hw/virtio/vhost-shadow-virtqueue.c
> > index 8361e70d1b..e1c6952b10 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -68,7 +68,7 @@ bool vhost_svq_valid_features(uint64_t features, Error 
> > **errp)
> >   */
> >  static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
> >  {
> > -return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx);
> > +return svq->num_free;
> >  }
> >
> >  /**
> > @@ -263,6 +263,9 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const 
> > struct iovec *out_sg,
> >  return -EINVAL;
> >  }
> >
> > +/* Update the size of SVQ vring free descriptors */
> > +svq->num_free -= ndescs;
> > +
> >  svq->desc_state[qemu_head].elem = elem;
> >  svq->desc_state[qemu_head].ndescs = ndescs;
> >  vhost_svq_kick(svq);
> > @@ -450,6 +453,9 @@ static VirtQueueElement 
> > *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
> >  svq->desc_next[last_used_chain] = svq->free_head;
> >  svq->free_head = used_elem.id;
> >
> > +/* Update the size of SVQ vring free descriptors */
>
> No need for this comment.
>
> Apart from that,
>
> Acked-by: Eugenio Pérez 
>
> > +svq->num_free += num;
> > +
> >  *len = used_elem.len;
> >  return g_steal_pointer(>desc_state[used_elem.id].elem);
> >  }
> > @@ -659,6 +665,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, 
> > VirtIODevice *vdev,
> >  svq->iova_tree = iova_tree;
> >
> >  svq->vring.num = virtio_queue_get_num(vdev, 
> > virtio_get_queue_index(vq));
> > +svq->num_free = svq->vring.num;
> >  driver_size = vhost_svq_driver_area_size(svq);
> >  device_size = vhost_svq_device_area_size(svq);
> >  svq->vring.desc = qemu_memalign(qemu_real_host_page_size(), 
> > driver_size);
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
> > b/hw/virtio/vhost-shadow-virtqueue.h
> > index 926a4897b1..6efe051a70 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -107,6 +107,9 @@ typedef struct VhostShadowVirtqueue {
> >
> >  /* Next head to consume from the device */
> >  uint16_t last_used_idx;
> > +
> > +/* Size of SVQ vring free descriptors */
> > +uint16_t num_free;
> >  } VhostShadowVirtqueue;
> >
> >  bool vhost_svq_valid_features(uint64_t features, Error **errp);
> > --
> > 2.25.1
> >
>
>




Re: [PATCH v5 00/14] Dynamically switch to vhost shadow virtqueues at vdpa net migration

2023-03-07 Thread Lei Yang
QE tested this series's V5 again. Creating two vdpa_sim device, and boot
two VMs without shadow virtqueues. The migration was successful and
everything worked fine.

Tested-by: Lei Yang 

Eugenio Pérez  于2023年3月4日周六 01:24写道:

> It's possible to migrate vdpa net devices if they are shadowed from the
> start.  But to always shadow the dataplane is to effectively break its host
> passthrough, so its not efficient in vDPA scenarios.
>
> This series enables dynamically switching to shadow mode only at
> migration time.  This allows full data virtqueues passthrough all the
> time qemu is not migrating.
>
> In this series only net devices with no CVQ are migratable.  CVQ adds
> additional state that would make the series bigger and still had some
> controversy on previous RFC, so let's split it.
>
> Successfully tested with vdpa_sim_net with patch [1] applied and with the
> qemu
> emulated device with vp_vdpa with some restrictions:
> * No CVQ. No feature that didn't work with SVQ previously (packed, ...)
> * VIRTIO_RING_F_STATE patches implementing [2].
>
> Previous versions were tested by many vendors. Not carrying Tested-by
> because
> of code changes, so re-testing would be appreciated.
>
> A ready to clone tag named vdpa.d/stop-nocvq-v5 with this version of the
> series
> is available at https://gitlab.com/eperezmartin/qemu-kvm.git, with the
> commit
> 863d54ff00c558ffe54ed2c7ee148ab7f89d4864 ("vdpa: return VHOST_F_LOG_ALL in
> vhost-vdpa devices").
>
> Comments are welcome.
>
> v5:
> - Reverse SUSPEND polarity check, as qemu was never suspending devices with
>   suspend capability.
> - Reorder suspend patch so it comes before the reset reorder after
>   get_vring_base.
> - Remove patch to stop SVQ at vdpa stop, already present in staging
>
> v4:
> - Recover used_idx from guest's vring if device cannot suspend.
> - Fix starting device in the middle of a migration.  Removed some
>   duplication in setting / clearing enable_shadow_vqs and shadow_data
>   members of vhost_vdpa.
> - Fix (again) "Check for SUSPEND in vhost_dev.backend_cap, as
>   .backend_features is empty at the check moment.". It was reverted by
>   mistake in v3.
> - Fix memory leak of iova tree.
> - Properly rewind SVQ as in flight descriptors were still being accounted
>   in vq base.
> - Expand documentation.
>
> v3:
> - Start datapatch in SVQ in device started while migrating.
> - Properly register migration blockers if device present unsupported
> features.
> - Fix race condition because of not stopping the SVQ until device cleanup.
> - Explain purpose of iova tree in the first patch message.
> - s/dynamycally/dynamically/ in cover letter.
> - at
> lore.kernel.org/qemu-devel/20230215173850.298832-14-epere...@redhat.com
>
> v2:
> - Check for SUSPEND in vhost_dev.backend_cap, as .backend_features is
> empty at
>   the check moment.
> - at
> https://lore.kernel.org/all/20230208094253.702672-12-epere...@redhat.com/T/
>
> v1:
> - Omit all code working with CVQ and block migration if the device supports
>   CVQ.
> - Remove spurious kick.
> - Move all possible checks for migration to vhost-vdpa instead of the net
>   backend. Move them to init code from start code.
> - Suspend on vhost_vdpa_dev_start(false) instead of in vhost-vdpa net
> backend.
> - Properly split suspend after geting base and adding of status_reset
> patches.
> - Add possible TODOs to points where this series can improve in the future.
> - Check the state of migration using migration_in_setup and
>   migration_has_failed instead of checking all the possible migration
> status in
>   a switch.
> - Add TODO with possible low hand fruit using RESUME ops.
> - Always offer _F_LOG from virtio/vhost-vdpa and let migration blockers do
>   their thing instead of adding a variable.
> - RFC v2 at
> https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg02574.html
>
> RFC v2:
> - Use a migration listener instead of a memory listener to know when
>   the migration starts.
> - Add stuff not picked with ASID patches, like enable rings after
>   driver_ok
> - Add rewinding on the migration src, not in dst
> - RFC v1 at
> https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg01664.html
>
> [1]
> https://lore.kernel.org/lkml/20230203142501.300125-1-epere...@redhat.com/T/
> [2]
> https://lists.oasis-open.org/archives/virtio-comment/202103/msg00036.html
>
> Eugenio Pérez (14):
>   vdpa net: move iova tree creation from init to start
>   vdpa: Remember last call fd set
>   vdpa: Negotiate _F_SUSPEND feature
>   vdpa: rewind at get_base, not set_base
>   vdpa: add vhost_vdpa->suspended parameter
>   vdpa: add vhost_vdpa_suspend
>   vdpa

Re: [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration

2023-02-16 Thread Lei Yang
QE tested this series's v3 again. Creating two vdpa_sim devices, and
boot two VMs without shadow virtqueues. The migration was successful and
everything worked fine.

Tested-by: Lei Yang 

Eugenio Perez Martin  于2023年2月16日周四 02:41写道:
>
> On Fri, Feb 10, 2023 at 1:58 PM Gautam Dawar  wrote:
> >
> > Hi Eugenio,
> >
> > I've tested this patch series on Xilinx/AMD SN1022 device without
> > control vq and VM Live Migration between two hosts worked fine.
> >
> > Tested-by: Gautam Dawar 
> >
>
> Thanks for the testing!
>
> >
> > Here is some minor feedback:
> >
> > Pls fix the typo (Dynamycally -> Dynamically) in the Subject.
> >
> > On 2/8/23 15:12, Eugenio Pérez wrote:
> > > CAUTION: This message has originated from an External Source. Please use 
> > > proper judgment and caution when opening attachments, clicking links, or 
> > > responding to this email.
> > >
> > >
> > > It's possible to migrate vdpa net devices if they are shadowed from the
> > >
> > > start.  But to always shadow the dataplane is to effectively break its 
> > > host
> > >
> > > passthrough, so its not convenient in vDPA scenarios.
> > I believe you meant efficient instead of convenient.
> > >
> > >
> > >
> > > This series enables dynamically switching to shadow mode only at
> > >
> > > migration time.  This allows full data virtqueues passthrough all the
> > >
> > > time qemu is not migrating.
> > >
> > >
> > >
> > > In this series only net devices with no CVQ are migratable.  CVQ adds
> > >
> > > additional state that would make the series bigger and still had some
> > >
> > > controversy on previous RFC, so let's split it.
> > >
> > >
> > >
> > > The first patch delays the creation of the iova tree until it is really 
> > > needed,
> > >
> > > and makes it easier to dynamically move from and to SVQ mode.
> > It would help adding some detail on the iova tree being referred to here.
> > >
> > >
> > >
> > > Next patches from 02 to 05 handle the suspending and getting of vq state 
> > > (base)
> > >
> > > of the device at the switch to SVQ mode.  The new _F_SUSPEND feature is
> > >
> > > negotiated and stop device flow is changed so the state can be fetched 
> > > trusting
> > >
> > > the device will not modify it.
> > >
> > >
> > >
> > > Since vhost backend must offer VHOST_F_LOG_ALL to be migratable, last 
> > > patches
> > >
> > > but the last one add the needed migration blockers so vhost-vdpa can 
> > > offer it
> >
> > "last patches but the last one"?
> >
>
> I think I solved all of the above in v3, thanks for notifying them!
>
> Would it be possible to test with v3 too?
>
> > Thanks.
> >
> > >
> > > safely.  They also add the handling of this feature.
> > >
> > >
> > >
> > > Finally, the last patch makes virtio vhost-vdpa backend to offer
> > >
> > > VHOST_F_LOG_ALL so qemu migrate the device as long as no other blocker 
> > > has been
> > >
> > > added.
> > >
> > >
> > >
> > > Successfully tested with vdpa_sim_net with patch [1] applied and with the 
> > > qemu
> > >
> > > emulated device with vp_vdpa with some restrictions:
> > >
> > > * No CVQ. No feature that didn't work with SVQ previously (packed, ...)
> > >
> > > * VIRTIO_RING_F_STATE patches implementing [2].
> > >
> > > * Expose _F_SUSPEND, but ignore it and suspend on ring state fetch like
> > >
> > >DPDK.
> > >
> > >
> > >
> > > Comments are welcome.
> > >
> > >
> > >
> > > v2:
> > >
> > > - Check for SUSPEND in vhost_dev.backend_cap, as .backend_features is 
> > > empty at
> > >
> > >the check moment.
> > >
> > >
> > >
> > > v1:
> > >
> > > - Omit all code working with CVQ and block migration if the device 
> > > supports
> > >
> > >CVQ.
> > >
> > > - Remove spurious kick.
> > Even with the spurious kick, datapath didn't resume at destination VM
> > after LM as kick happened before DRIVER_OK. So IMO, it will be required
> > that the vdpa parent driver s

Re: [PATCH] virtio-net: clear guest_announce feature if no cvq backend

2023-02-14 Thread Lei Yang
QE uses the sim_vdpa device to test this patch and add "ctrl_vq=off"
in the qemu command line. Guest can find this device, there are no any
error messages in guest dmesg, and can migrate successfully.

Tested-by: Lei Yang 

Eugenio Perez Martin  于2023年2月14日周二 14:53写道:
>
> On Tue, Jan 24, 2023 at 5:32 PM Eugenio Pérez  wrote:
> >
> > Since GUEST_ANNOUNCE is emulated the feature bit could be set without
> > backend support.  This happens in the vDPA case.
> >
> > However, backend vDPA parent may not have CVQ support.  This causes an
> > incoherent feature set, and the driver may refuse to start.  This
> > happens in virtio-net Linux driver.
> >
> > This may be solved differently in the future.  Qemu is able to emulate a
> > CVQ just for guest_announce purposes, helping guest to notify the new
> > location with vDPA devices that does not support it.  However, this is
> > left as a TODO as it is way more complex to backport.
> >
> > Tested with vdpa_net_sim, toggling manually VIRTIO_NET_F_CTRL_VQ in the
> > driver and migrating it with x-svq=on.
> >
>
> Friendly ping about this patch, as it fell through the cracks if I'm not 
> wrong.
>
> Thanks!
>
> > Fixes: 980003debddd ("vdpa: do not handle VIRTIO_NET_F_GUEST_ANNOUNCE in 
> > vhost-vdpa")
> > Reported-by: Dawar, Gautam 
> > Signed-off-by: Eugenio Pérez 
> > ---
> >  hw/net/virtio-net.c | 15 +++
> >  1 file changed, 15 insertions(+)
> >
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index 3ae909041a..09d5c7a664 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -820,6 +820,21 @@ static uint64_t virtio_net_get_features(VirtIODevice 
> > *vdev, uint64_t features,
> >  features |= (1ULL << VIRTIO_NET_F_MTU);
> >  }
> >
> > +/*
> > + * Since GUEST_ANNOUNCE is emulated the feature bit could be set 
> > without
> > + * enabled. This happens in the vDPA case.
> > + *
> > + * Make sure the feature set is not incoherent, as the driver could 
> > refuse
> > + * to start.
> > + *
> > + * TODO: QEMU is able to emulate a CVQ just for guest_announce 
> > purposes,
> > + * helping guest to notify the new location with vDPA devices that 
> > does not
> > + * support it.
> > + */
> > +if (!virtio_has_feature(vdev->backend_features, VIRTIO_NET_F_CTRL_VQ)) 
> > {
> > +virtio_clear_feature(, VIRTIO_NET_F_GUEST_ANNOUNCE);
> > +}
> > +
> >  return features;
> >  }
> >
> > --
> > 2.31.1
> >
> >
>




Re: [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration

2023-02-09 Thread Lei Yang
QE tested this series on the rhel. Creating two vdpa_sim devices, and
boot two VMs without shadow vq. The migration was successful and
everything worked fine

Tested-by: Lei Yang 

Alvaro Karsz  于2023年2月8日周三 18:29写道:
>
> HI Eugenio, thanks for the series!
>
> I tested the series with our DPU, SolidNET.
>
> The test went as follow:
>
> - Create 2 virtio net vdpa devices, every device in a separated VF.
> - Start 2 VMs with the vdpa device as a single network device, without
> shadow vq.
>The source VM with "-netdev
> vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=hostnet0"
>The destination VM with "-netdev
> vhost-vdpa,vhostdev=/dev/vhost-vdpa-1,id=hostnet0"
> - Boot the source VM, test the network by pinging.
> - Migrate
> - Test the destination VM network.
>
> Everything worked fine.
>
> Tested-by: Alvaro Karsz 
>




[UAI] [Call for Participation] IEEE MSN 2022 (Virtual) Dec. 14-16 Exciting Keynotes, Panel, and Paper Sessions

2022-12-09 Thread Lei Yang
*** Please accept our apologies if you receive multiple copies of this call
for participation***

It is our great pleasure to invite you to participate in the IEEE MSN 2022
this year.
Due to the COVID-19 pandemic, MSN 2022 will be held virtually during Dec.
14 – Dec. 16, 2022.
https://ieee-msn.org/2022/

The registration and program information is provided as follows.
Register at: https://ieee-msn.org/2022/registration.php
Non-author Registration Fee:  $100
Registration Deadline: December 12
Program At Glance: https://ieee-msn.org/2022/glance.php
Online Program: https://msn2022.info/

We are excited to introduce the technical program of MSN 2022, which is
dedicated to exploring new, innovative and diverse directions in the fields
of mobility, sensing and networking. We received 246 technical submissions,
which includes 227 regular submissions and 19 invited submissions. Each
submission was reviewed by at least three technical program committee (TPC)
members or selected external reviewers. Additional reviews were solicited
as needed.
Among the regular submissions, we have selected 62 full papers. Hence the
acceptance rate is 27%. In addition, we have selected 29 short papers for
inclusion in the technical program of the main conference. Together with
the 19 invited papers, the technical program thus includes 110 papers and
covers a wide range of topics, including mobile/edge/fog computing,
networking, algorithms, ubiquitous sensing, big data and AI, security,
trust and privacy, and experiments.

In the main conference, three exciting keynote speeches and one panel
discussion are arranged. Prof. Daqing Zhang, Prof. Falko Dressler and Prof.
Guoliang Xing deliver the exciting keynotes on sensing limits, virtualized
edge computing and real-time AI, respectively.
Besides, the conference has six tracks this year: Track 1: Mobile &
Wireless Sensing and Networking; Track 2: Edge Computing, IoT and Digital
Twins; Track 3: Security, Privacy, Trust, and Blockchain; Track 4: Big Data
and AI; Track 5: Systems, Tools, Testbed; Track 6: Applications in Smart
Cities, Healthcare and Other Areas.
For all tracks, 111 accepted papers have been arranged in 24 technical
sessions in main conference.
Additionally, we selected five workshops: The 1st International Workshop on
Cryptographic Security and Information Hiding Technology for IoT System
(CSIHTIS 2022), The 3th International Workshop on Ubiquitous Electric
Internet of Things (UEIoT 2022), The 4th International Workshop on Edge
Computing and Artificial Intelligence based Sensor-Cloud System (ECAISS
2022), The 4th International Workshop on Network Meets Intelligent
Computations (NMIC 2022), The 4th International Workshop on Artificial
Intelligence Applications in Internet of Things (AI2OT 2022). There is a
total of 41 technical papers to be presented in the workshops.

Looking forward to seeing you at the MSN 2022!

IEEE MSN 2021 Organizing Committee

TPC Co-Chairs:
Giuseppe Anastasi, University of Pisa, Italy
Weigang Wu, Sun Yat-sen University, China

General Co-Chairs:
Michel Raynal,  IRISA, University of Rennes, France
Jie Wu,  Temple University, USA

Steering Committee Co-Chairs:
Jiannong Cao, The Hong Kong Polytechnic University, HK
Xiaohua Jia, City University of Hong Kong, HK
___
uai mailing list
uai@engr.orst.edu
https://it.engineering.oregonstate.edu/mailman/listinfo/uai


[Tinyos-help] [Call for Participation] IEEE MSN 2022 (Virtual) Dec. 14-16 Exciting Keynotes, Panel, and Paper Sessions

2022-12-07 Thread Lei Yang
*** Please accept our apologies if you receive multiple copies of this call
for participation***

It is our great pleasure to invite you to participate in the IEEE MSN 2022
this year.
Due to the COVID-19 pandemic, MSN 2022 will be held virtually during Dec.
14 – Dec. 16, 2022.
https://ieee-msn.org/2022/

The registration and program information is provided as follows.
Register at: https://ieee-msn.org/2022/registration.php
Non-author Registration Fee:  $100
Registration Deadline: December 12
Program At Glance: https://ieee-msn.org/2022/glance.php
Online Program: https://msn2022.info/

We are excited to introduce the technical program of MSN 2022, which is
dedicated to exploring new, innovative and diverse directions in the fields
of mobility, sensing and networking. We received 246 technical submissions,
which includes 227 regular submissions and 19 invited submissions. Each
submission was reviewed by at least three technical program committee (TPC)
members or selected external reviewers. Additional reviews were solicited
as needed.
Among the regular submissions, we have selected 62 full papers. Hence the
acceptance rate is 27%. In addition, we have selected 29 short papers for
inclusion in the technical program of the main conference. Together with
the 19 invited papers, the technical program thus includes 110 papers and
covers a wide range of topics, including mobile/edge/fog computing,
networking, algorithms, ubiquitous sensing, big data and AI, security,
trust and privacy, and experiments.

In the main conference, three exciting keynote speeches and one panel
discussion are arranged. Prof. Daqing Zhang, Prof. Falko Dressler and Prof.
Guoliang Xing deliver the exciting keynotes on sensing limits, virtualized
edge computing and real-time AI, respectively.
Besides, the conference has six tracks this year: Track 1: Mobile &
Wireless Sensing and Networking; Track 2: Edge Computing, IoT and Digital
Twins; Track 3: Security, Privacy, Trust, and Blockchain; Track 4: Big Data
and AI; Track 5: Systems, Tools, Testbed; Track 6: Applications in Smart
Cities, Healthcare and Other Areas.
For all tracks, 111 accepted papers have been arranged in 24 technical
sessions in main conference.
Additionally, we selected five workshops: The 1st International Workshop on
Cryptographic Security and Information Hiding Technology for IoT System
(CSIHTIS 2022), The 3th International Workshop on Ubiquitous Electric
Internet of Things (UEIoT 2022), The 4th International Workshop on Edge
Computing and Artificial Intelligence based Sensor-Cloud System (ECAISS
2022), The 4th International Workshop on Network Meets Intelligent
Computations (NMIC 2022), The 4th International Workshop on Artificial
Intelligence Applications in Internet of Things (AI2OT 2022). There is a
total of 41 technical papers to be presented in the workshops.

Looking forward to seeing you at the MSN 2022!

IEEE MSN 2021 Organizing Committee

TPC Co-Chairs:
Giuseppe Anastasi, University of Pisa, Italy
Weigang Wu, Sun Yat-sen University, China

General Co-Chairs:
Michel Raynal,  IRISA, University of Rennes, France
Jie Wu,  Temple University, USA

Steering Committee Co-Chairs:
Jiannong Cao, The Hong Kong Polytechnic University, HK
Xiaohua Jia, City University of Hong Kong, HK
___
Tinyos-help mailing list
Tinyos-help@millennium.berkeley.edu
https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help

Re: [PATCH 0/4] Endianess and coding style fixes for SVQ event idx support

2022-11-17 Thread Lei Yang
QE tries to test the parameter "event_idx=on". In both environments
"virtio-vdpa + vp_vdpa" and "vhost_vdpa + vp_vdpa", there is no
network connectivity issue after the guest boot up.

Tested-by: Lei Yang 


> From: Jason Wang 
> Date: Tue, Nov 1, 2022 at 10:42 AM
> Subject: Re: [PATCH 0/4] Endianess and coding style fixes for SVQ
> event idx support
> To: Eugenio Pérez 
> Cc: , Stefan Hajnoczi ,
> Michael S. Tsirkin 
>
>
> On Sat, Oct 29, 2022 at 12:02 AM Eugenio Pérez  wrote:
> >
> > Some fixes that did not get in time for the last net pull request.
> >
> > Eugenio Pérez (4):
> >   vhost: Delete useless casting
> >   vhost: convert byte order on SVQ used event write
> >   vhost: Fix lines over 80 characters
> >   vhost: convert byte order on avail_event read
>
> I've queued this for rc1.
>
> Thanks
>
> >
> >  hw/virtio/vhost-shadow-virtqueue.c | 12 
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > --
> > 2.31.1
> >
> >
>




[jira] [Comment Edited] (HDFS-16836) StandbyCheckpointer can still trigger rollback fs image after RU is finalized

2022-11-15 Thread Lei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17634479#comment-17634479
 ] 

Lei Yang edited comment on HDFS-16836 at 11/15/22 6:32 PM:
---

Thanks [~hexiaoqiao] for taking look.  docheckpoint() can fail for many reasons 
and it does two things: 1. replay edit log to generate fsimage in standby; 2. 
upload to active.

What we have seen is #1 succeeded but #2 failed so needRollbackCheckpoint is 
never set back to false and all the subsequent checkpointings are just 
continuously triggering rollback fsimage for RU even after RU is finalized. 
This bypasses the checkpoint period and threshold check.

The current logic only reset needRollbackCheckpoint to false if 1 and 2 are all 
met:
 # docheckpoint succeeds
 # RU is in progress. Because RU finalize would rename the fsimage from 
IMAGE_ROLLBACK to IMAGE which means 
*namesystem.getFSImage().hasRollbackFSImage()* would be false.

 
{code:java}
if (needRollbackCheckpoint && namesystem.getFSImage().hasRollbackFSImage()) 
{code}
 

 

After RU is finalized, rollback cannot happen. It would not make sense to 
generate a rollback image after RU is finalized, right?

 

 
{code:java}
private void doWork()
...
boolean needCheckpoint = needRollbackCheckpoint;
if (needCheckpoint) {
  LOG.info("Triggering a rollback fsimage for rolling upgrade.");
} else if (uncheckpointed >= checkpointConf.getTxnCount()) {
  LOG.info("Triggering checkpoint because there have been " + 
  uncheckpointed + " txns since the last checkpoint, which " +
  "exceeds the configured threshold " +
  checkpointConf.getTxnCount());
  needCheckpoint = true;
} else if (secsSinceLast >= checkpointConf.getPeriod()) {
  LOG.info("Triggering checkpoint because it has been " +
  secsSinceLast + " seconds since the last checkpoint, which " +
  "exceeds the configured interval " + checkpointConf.getPeriod());
  needCheckpoint = true;
} 

if (needCheckpoint) {
// on all nodes, we build the checkpoint. However, we only ship the 
checkpoint if have a
// rollback request, are the checkpointer, are outside the quiet period.
doCheckpoint();

// reset needRollbackCheckpoint to false only when we finish a ckpt
// for rollback image
if (needRollbackCheckpoint
&& namesystem.getFSImage().hasRollbackFSImage()) {
  namesystem.setCreatedRollbackImages(true);
  namesystem.setNeedRollbackFsImage(false);
}
lastCheckpointTime = now;
  }
} catch (Throwable t) {
  LOG.error("Exception in doCheckpoint", t);
}{code}
 

 

 


was (Author: JIRAUSER286942):
Thanks [~hexiaoqiao] for taking look.  docheckpoint() can fail for many reasons 
and it does two things: 1. replay edit log to generate fsimage in standby; 2. 
upload to active.

What we have seen is #1 succeeded but #2 failed so needRollbackCheckpoint is 
never set back to false and all the subsequent checkpointings are just 
continuously triggering rollback fsimage for RU even after RU is finalized. 
This bypasses the checkpoint period and threshold check.

The current logic only reset needRollbackCheckpoint to false if 1 and 2 are all 
met:
 # docheckpoint succeeds
 # RU is in progress. Because RU finalize would rename the fsimage from 
IMAGE_ROLLBACK to IMAGE which means 
*namesystem.getFSImage().hasRollbackFSImage()* would be false.

 
{code:java}
if (needRollbackCheckpoint && namesystem.getFSImage().hasRollbackFSImage()) 
{code}
 

 

After RU is finalized, rollback cannot happen. It would not make sense to 
generate a rollback image after RU is finalized, right?

 

 
{code:java}
private void doWork()
...
boolean needCheckpoint = needRollbackCheckpoint;
if (needCheckpoint) {
  LOG.info("Triggering a rollback fsimage for rolling upgrade.");
} else if (uncheckpointed >= checkpointConf.getTxnCount()) {
  LOG.info("Triggering checkpoint because there have been " + 
  uncheckpointed + " txns since the last checkpoint, which " +
  "exceeds the configured threshold " +
  checkpointConf.getTxnCount());
  needCheckpoint = true;
} else if (secsSinceLast >= checkpointConf.getPeriod()) {
  LOG.info("Triggering checkpoint because it has been " +
  secsSinceLast + " seconds since the last checkpoint, which " +
  "exceeds the configured interval " + checkpointConf.getPeriod());
  needCheckpoint = true;
} 

if (needCheckpoint) {
// on all nodes, we build the checkpoint. However, we only ship the 
checkpoint if have a
// rollback request, are the checkpointer, are outside the quiet period.
doCheckpoint();

// reset needRollbackCheckpoint to false only when we finish a ckpt
// for rollback image
if (needRollbackCheckpoint
&& namesystem.

[jira] [Commented] (HDFS-16836) StandbyCheckpointer can still trigger rollback fs image after RU is finalized

2022-11-15 Thread Lei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17634479#comment-17634479
 ] 

Lei Yang commented on HDFS-16836:
-

Thanks [~hexiaoqiao] for taking look.  docheckpoint() can fail for many reasons 
and it does two things: 1. replay edit log to generate fsimage in standby; 2. 
upload to active.

What we have seen is #1 succeeded but #2 failed so needRollbackCheckpoint is 
never set back to false and all the subsequent checkpointings are just 
continuously triggering rollback fsimage for RU even after RU is finalized. 
This bypasses the checkpoint period and threshold check.

The current logic only reset needRollbackCheckpoint to false if 1 and 2 are all 
met:
 # docheckpoint succeeds
 # RU is in progress. Because RU finalize would rename the fsimage from 
IMAGE_ROLLBACK to IMAGE which means 
*namesystem.getFSImage().hasRollbackFSImage()* would be false.

 
{code:java}
if (needRollbackCheckpoint && namesystem.getFSImage().hasRollbackFSImage()) 
{code}
 

 

After RU is finalized, rollback cannot happen. It would not make sense to 
generate a rollback image after RU is finalized, right?

 

 
{code:java}
private void doWork()
...
boolean needCheckpoint = needRollbackCheckpoint;
if (needCheckpoint) {
  LOG.info("Triggering a rollback fsimage for rolling upgrade.");
} else if (uncheckpointed >= checkpointConf.getTxnCount()) {
  LOG.info("Triggering checkpoint because there have been " + 
  uncheckpointed + " txns since the last checkpoint, which " +
  "exceeds the configured threshold " +
  checkpointConf.getTxnCount());
  needCheckpoint = true;
} else if (secsSinceLast >= checkpointConf.getPeriod()) {
  LOG.info("Triggering checkpoint because it has been " +
  secsSinceLast + " seconds since the last checkpoint, which " +
  "exceeds the configured interval " + checkpointConf.getPeriod());
  needCheckpoint = true;
} 

if (needCheckpoint) {
// on all nodes, we build the checkpoint. However, we only ship the 
checkpoint if have a
// rollback request, are the checkpointer, are outside the quiet period.
doCheckpoint();

// reset needRollbackCheckpoint to false only when we finish a ckpt
// for rollback image
if (needRollbackCheckpoint
&& namesystem.getFSImage().hasRollbackFSImage()) {
  namesystem.setCreatedRollbackImages(true);
  namesystem.setNeedRollbackFsImage(false);
}
lastCheckpointTime = now;
  }
} catch (Throwable t) {
  LOG.error("Exception in doCheckpoint", t);
}{code}
 

 

The point is RU finalize would terminate RU and would not rollback.

 

> StandbyCheckpointer can still trigger rollback fs image after RU is finalized
> -
>
> Key: HDFS-16836
> URL: https://issues.apache.org/jira/browse/HDFS-16836
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> StandbyCheckpointer trigger rollback fsimage when RU is started.
> When ru is started, a flag (needRollbackImage) was set to true during edit 
> log replay.
> And it only gets reset to false when doCheckpoint() succeeded.
> Think about following scenario:
>  # Start RU, needRollbackImage is set to true.
>  # doCheckpoint() failed.
>  # RU is finalized.
>  # namesystem.getFSImage().hasRollbackFSImage() is always false since 
> rollback image cannot be generated once RU is over.
>  # needRollbackImage was never set to false.
>  # Checkpoints threshold(1m txns) and period(1hr) are not honored.
> {code:java}
> StandbyCheckpointer:
> void doWork() {
>  
>   doCheckpoint();
>   // reset needRollbackCheckpoint to false only when we finish a ckpt
>   // for rollback image
>   if (needRollbackCheckpoint
>   && namesystem.getFSImage().hasRollbackFSImage()) {
> namesystem.setCreatedRollbackImages(true);
> namesystem.setNeedRollbackFsImage(false);
>   }
>   lastCheckpointTime = now;
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16836) StandbyCheckpointer can still trigger rollback fs image after RU is finalized

2022-11-14 Thread Lei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17634022#comment-17634022
 ] 

Lei Yang commented on HDFS-16836:
-

[~hexiaoqiao]  PR: [https://github.com/apache/hadoop/pull/5135]

I mean doCheckpoint can fail and throw exception hence needRollbackImage is 
never reset to false and can leak after RU is done.

 

> StandbyCheckpointer can still trigger rollback fs image after RU is finalized
> -
>
> Key: HDFS-16836
> URL: https://issues.apache.org/jira/browse/HDFS-16836
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>    Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> StandbyCheckpointer trigger rollback fsimage when RU is started.
> When ru is started, a flag (needRollbackImage) was set to true during edit 
> log replay.
> And it only gets reset to false when doCheckpoint() succeeded.
> Think about following scenario:
>  # Start RU, needRollbackImage is set to true.
>  # doCheckpoint() failed.
>  # RU is finalized.
>  # namesystem.getFSImage().hasRollbackFSImage() is always false since 
> rollback image cannot be generated once RU is over.
>  # needRollbackImage was never set to false.
>  # Checkpoints threshold(1m txns) and period(1hr) are not honored.
> {code:java}
> StandbyCheckpointer:
> void doWork() {
>  
>   doCheckpoint();
>   // reset needRollbackCheckpoint to false only when we finish a ckpt
>   // for rollback image
>   if (needRollbackCheckpoint
>   && namesystem.getFSImage().hasRollbackFSImage()) {
> namesystem.setCreatedRollbackImages(true);
> namesystem.setNeedRollbackFsImage(false);
>   }
>   lastCheckpointTime = now;
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16836) StandbyCheckpointer can still trigger rollback fs image after RU is finalized

2022-11-09 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16836:

Summary: StandbyCheckpointer can still trigger rollback fs image after RU 
is finalized  (was: StandbyCheckpointer can still trigger rollback fs image 
after RU is finalized, leaving checkpoint threshold/period were missing.)

> StandbyCheckpointer can still trigger rollback fs image after RU is finalized
> -
>
> Key: HDFS-16836
> URL: https://issues.apache.org/jira/browse/HDFS-16836
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>    Reporter: Lei Yang
>Priority: Major
>
> StandbyCheckpointer trigger rollback fsimage when RU is started.
> When ru is started, a flag (needRollbackImage) was set to true during edit 
> log replay.
> And it only gets reset to false when doCheckpoint() succeeded.
> Think about following scenario:
>  # Start RU, needRollbackImage is set to true.
>  # doCheckpoint() failed.
>  # RU is finalized.
>  # namesystem.getFSImage().hasRollbackFSImage() is always false since 
> rollback image cannot be generated once RU is over.
>  # needRollbackImage was never set to false.
>  # Checkpoints threshold(1m txns) and period(1hr) are not honored.
> {code:java}
> StandbyCheckpointer:
> void doWork() {
>  
>   doCheckpoint();
>   // reset needRollbackCheckpoint to false only when we finish a ckpt
>   // for rollback image
>   if (needRollbackCheckpoint
>   && namesystem.getFSImage().hasRollbackFSImage()) {
> namesystem.setCreatedRollbackImages(true);
> namesystem.setNeedRollbackFsImage(false);
>   }
>   lastCheckpointTime = now;
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16836) StandbyCheckpointer can still trigger rollback fs image after RU is finalized, leaving checkpoint threshold/period were missing.

2022-11-09 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16836:

Summary: StandbyCheckpointer can still trigger rollback fs image after RU 
is finalized, leaving checkpoint threshold/period were missing.  (was: 
StandbyCheckpointer can still trigger rollback fs image after RU is finalized)

> StandbyCheckpointer can still trigger rollback fs image after RU is 
> finalized, leaving checkpoint threshold/period were missing.
> 
>
> Key: HDFS-16836
> URL: https://issues.apache.org/jira/browse/HDFS-16836
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Lei Yang
>Priority: Major
>
> StandbyCheckpointer trigger rollback fsimage when RU is started.
> When ru is started, a flag (needRollbackImage) was set to true during edit 
> log replay.
> And it only gets reset to false when doCheckpoint() succeeded.
> Think about following scenario:
>  # Start RU, needRollbackImage is set to true.
>  # doCheckpoint() failed.
>  # RU is finalized.
>  # namesystem.getFSImage().hasRollbackFSImage() is always false since 
> rollback image cannot be generated once RU is over.
>  # needRollbackImage was never set to false.
>  # Checkpoints threshold(1m txns) and period(1hr) are not honored.
> {code:java}
> StandbyCheckpointer:
> void doWork() {
>  
>   doCheckpoint();
>   // reset needRollbackCheckpoint to false only when we finish a ckpt
>   // for rollback image
>   if (needRollbackCheckpoint
>   && namesystem.getFSImage().hasRollbackFSImage()) {
> namesystem.setCreatedRollbackImages(true);
> namesystem.setNeedRollbackFsImage(false);
>   }
>   lastCheckpointTime = now;
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16836) StandbyCheckpointer can still trigger rollback fs image after RU is finalized

2022-11-09 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16836:

Description: 
StandbyCheckpointer trigger rollback fsimage when RU is started.

When ru is started, a flag (needRollbackImage) was set to true during edit log 
replay.

And it only gets reset to false when doCheckpoint() succeeded.

Think about following scenario:
 # Start RU, needRollbackImage is set to true.
 # doCheckpoint() failed.
 # RU is finalized.
 # namesystem.getFSImage().hasRollbackFSImage() is always false since rollback 
image cannot be generated once RU is over.
 # needRollbackImage was never set to false.
 # Checkpoints threshold(1m txns) and period(1hr) are not honored.

{code:java}
StandbyCheckpointer:
void doWork() {
 
  doCheckpoint();

  // reset needRollbackCheckpoint to false only when we finish a ckpt
  // for rollback image
  if (needRollbackCheckpoint
  && namesystem.getFSImage().hasRollbackFSImage()) {
namesystem.setCreatedRollbackImages(true);
namesystem.setNeedRollbackFsImage(false);
  }
  lastCheckpointTime = now;
} {code}

  was:
StandbyCheckpointer trigger rollback fsimage when RU is started.

When ru is started, a flag (needRollbackImage) was set to true during edit log 
replay.

And it only gets reset to false when doCheckpoint() succeeded.

Think about following scenario:
 # Start RU, needRollbackImage is set to true.
 # doCheckpoint() failed.  needRollbackImage was never set to false.
 # RU is finalized.
 # namesystem.getFSImage().hasRollbackFSImage() is always false since rollback 
image cannot be generated once RU is over.

{code:java}
StandbyCheckpointer:
void doWork() {
 
  doCheckpoint();

  // reset needRollbackCheckpoint to false only when we finish a ckpt
  // for rollback image
  if (needRollbackCheckpoint
  && namesystem.getFSImage().hasRollbackFSImage()) {
namesystem.setCreatedRollbackImages(true);
namesystem.setNeedRollbackFsImage(false);
  }
  lastCheckpointTime = now;
} {code}


> StandbyCheckpointer can still trigger rollback fs image after RU is finalized
> -
>
> Key: HDFS-16836
> URL: https://issues.apache.org/jira/browse/HDFS-16836
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Lei Yang
>Priority: Major
>
> StandbyCheckpointer trigger rollback fsimage when RU is started.
> When ru is started, a flag (needRollbackImage) was set to true during edit 
> log replay.
> And it only gets reset to false when doCheckpoint() succeeded.
> Think about following scenario:
>  # Start RU, needRollbackImage is set to true.
>  # doCheckpoint() failed.
>  # RU is finalized.
>  # namesystem.getFSImage().hasRollbackFSImage() is always false since 
> rollback image cannot be generated once RU is over.
>  # needRollbackImage was never set to false.
>  # Checkpoints threshold(1m txns) and period(1hr) are not honored.
> {code:java}
> StandbyCheckpointer:
> void doWork() {
>  
>   doCheckpoint();
>   // reset needRollbackCheckpoint to false only when we finish a ckpt
>   // for rollback image
>   if (needRollbackCheckpoint
>   && namesystem.getFSImage().hasRollbackFSImage()) {
> namesystem.setCreatedRollbackImages(true);
> namesystem.setNeedRollbackFsImage(false);
>   }
>   lastCheckpointTime = now;
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16836) StandbyCheckpointer can still trigger rollback fs image after RU is finalized

2022-11-09 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16836:

Description: 
StandbyCheckpointer trigger rollback fsimage when RU is started.

When ru is started, a flag (needRollbackImage) was set to true during edit log 
replay.

And it only gets reset to false when doCheckpoint() succeeded.

Think about following scenario:
 # Start RU, needRollbackImage is set to true.
 # doCheckpoint() failed.  needRollbackImage was never set to false.
 # RU is finalized.
 # namesystem.getFSImage().hasRollbackFSImage() is always false since rollback 
image cannot be generated once RU is over.

{code:java}
StandbyCheckpointer:
void doWork() {
 
  doCheckpoint();

  // reset needRollbackCheckpoint to false only when we finish a ckpt
  // for rollback image
  if (needRollbackCheckpoint
  && namesystem.getFSImage().hasRollbackFSImage()) {
namesystem.setCreatedRollbackImages(true);
namesystem.setNeedRollbackFsImage(false);
  }
  lastCheckpointTime = now;
} {code}

  was:
StandbyCheckpointer trigger rollback fsimage when RU is started.

When ru is started, a flag (needRollbackImage) was set to true during edit log 
replay.

And it only gets reset to false when doCheckpoint() succeeded.

Think about following scenario:
 # Start RU, needRollbackImage is set to true.
 # doCheckpoint() failed.  needRollbackImage was never set to false.
 # RU is finalized.
 # needRollbackImage is still true so the checkpoint period and threshold were 
not honored.

{code:java}
StandbyCheckpointer:
void doWork() {
 
  doCheckpoint();

  // reset needRollbackCheckpoint to false only when we finish a ckpt
  // for rollback image
  if (needRollbackCheckpoint
  && namesystem.getFSImage().hasRollbackFSImage()) {
namesystem.setCreatedRollbackImages(true);
namesystem.setNeedRollbackFsImage(false);
  }
  lastCheckpointTime = now;
} {code}


> StandbyCheckpointer can still trigger rollback fs image after RU is finalized
> -
>
> Key: HDFS-16836
> URL: https://issues.apache.org/jira/browse/HDFS-16836
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Lei Yang
>Priority: Major
>
> StandbyCheckpointer trigger rollback fsimage when RU is started.
> When ru is started, a flag (needRollbackImage) was set to true during edit 
> log replay.
> And it only gets reset to false when doCheckpoint() succeeded.
> Think about following scenario:
>  # Start RU, needRollbackImage is set to true.
>  # doCheckpoint() failed.  needRollbackImage was never set to false.
>  # RU is finalized.
>  # namesystem.getFSImage().hasRollbackFSImage() is always false since 
> rollback image cannot be generated once RU is over.
> {code:java}
> StandbyCheckpointer:
> void doWork() {
>  
>   doCheckpoint();
>   // reset needRollbackCheckpoint to false only when we finish a ckpt
>   // for rollback image
>   if (needRollbackCheckpoint
>   && namesystem.getFSImage().hasRollbackFSImage()) {
> namesystem.setCreatedRollbackImages(true);
> namesystem.setNeedRollbackFsImage(false);
>   }
>   lastCheckpointTime = now;
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16836) StandbyCheckpointer can still trigger rollback fs image after RU is finalized

2022-11-09 Thread Lei Yang (Jira)
Lei Yang created HDFS-16836:
---

 Summary: StandbyCheckpointer can still trigger rollback fs image 
after RU is finalized
 Key: HDFS-16836
 URL: https://issues.apache.org/jira/browse/HDFS-16836
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Reporter: Lei Yang


StandbyCheckpointer trigger rollback fsimage when RU is started.

When ru is started, a flag (needRollbackImage) was set to true during edit log 
replay.

And it only gets reset to false when doCheckpoint() succeeded.

Think about following scenario:
 # Start RU, needRollbackImage is set to true.
 # doCheckpoint() failed.  needRollbackImage was never set to false.
 # RU is finalized.
 # needRollbackImage is still true so the checkpoint period and threshold were 
not honored.

{code:java}
StandbyCheckpointer:
void doWork() {
 
  doCheckpoint();

  // reset needRollbackCheckpoint to false only when we finish a ckpt
  // for rollback image
  if (needRollbackCheckpoint
  && namesystem.getFSImage().hasRollbackFSImage()) {
namesystem.setCreatedRollbackImages(true);
namesystem.setNeedRollbackFsImage(false);
  }
  lastCheckpointTime = now;
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16836) StandbyCheckpointer can still trigger rollback fs image after RU is finalized

2022-11-09 Thread Lei Yang (Jira)
Lei Yang created HDFS-16836:
---

 Summary: StandbyCheckpointer can still trigger rollback fs image 
after RU is finalized
 Key: HDFS-16836
 URL: https://issues.apache.org/jira/browse/HDFS-16836
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Reporter: Lei Yang


StandbyCheckpointer trigger rollback fsimage when RU is started.

When ru is started, a flag (needRollbackImage) was set to true during edit log 
replay.

And it only gets reset to false when doCheckpoint() succeeded.

Think about following scenario:
 # Start RU, needRollbackImage is set to true.
 # doCheckpoint() failed.  needRollbackImage was never set to false.
 # RU is finalized.
 # needRollbackImage is still true so the checkpoint period and threshold were 
not honored.

{code:java}
StandbyCheckpointer:
void doWork() {
 
  doCheckpoint();

  // reset needRollbackCheckpoint to false only when we finish a ckpt
  // for rollback image
  if (needRollbackCheckpoint
  && namesystem.getFSImage().hasRollbackFSImage()) {
namesystem.setCreatedRollbackImages(true);
namesystem.setNeedRollbackFsImage(false);
  }
  lastCheckpointTime = now;
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16828) Fsck doesn't count orphaned missing blocks.

2022-10-28 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16828:

Description: 
Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() to 
get missing blocks. This creates inconsistency with metasave.

In the case where orphaned block is present, metasave and fsck show different 
result:

metasave shows:

 
{code:java}
Metasave: Blocks currently missing: 1
[orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 e: 
0){code}
 

but fsck -list-corruptfileblocks  shows different count:
{noformat}
The filesystem under path '/' has 0 CORRUPT files
{noformat}
This also created inconsistency in dfshealth ui. In dfshealth UI:
 * Missing blocks count comes from blockManager.getMissinngBlockCount() which 
is 1
 * Corrupted file comes from fsck: NamenodeFsck.listCorruptFileBlocks() which 
doesn't count orphaned blocks so it is empty.

!image-2022-10-28-15-18-24-944.png!

  was:
Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() to 
get missing blocks. This creates inconsistency with metasave.

In the case where orphaned block is present, metasave and fsck show different 
result:

metasave shows:

 
{code:java}
Metasave: Blocks currently missing: 1
[orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 e: 
0){code}
 

but fsck -list-corruptfileblocks  shows different count:
{noformat}
The filesystem under path '/' has 0 CORRUPT files
{noformat}
This also created inconsistency between dfshealth ui. In dfshealth UI:
 * Missing blocks count comes from blockManager.getMissinngBlockCount()
 * Corrupted file comes from fsck: NamenodeFsck.listCorruptFileBlocks() which 
doesn't count orphaned blocks.

!image-2022-10-28-15-18-24-944.png!


> Fsck doesn't count orphaned missing blocks.
> ---
>
> Key: HDFS-16828
> URL: https://issues.apache.org/jira/browse/HDFS-16828
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0
>    Reporter: Lei Yang
>Priority: Minor
> Attachments: image-2022-10-28-15-18-24-944.png
>
>
> Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() 
> to get missing blocks. This creates inconsistency with metasave.
> In the case where orphaned block is present, metasave and fsck show different 
> result:
> metasave shows:
>  
> {code:java}
> Metasave: Blocks currently missing: 1
> [orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 
> e: 0){code}
>  
> but fsck -list-corruptfileblocks  shows different count:
> {noformat}
> The filesystem under path '/' has 0 CORRUPT files
> {noformat}
> This also created inconsistency in dfshealth ui. In dfshealth UI:
>  * Missing blocks count comes from blockManager.getMissinngBlockCount() which 
> is 1
>  * Corrupted file comes from fsck: NamenodeFsck.listCorruptFileBlocks() which 
> doesn't count orphaned blocks so it is empty.
> !image-2022-10-28-15-18-24-944.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16828) Fsck doesn't count orphaned missing blocks.

2022-10-28 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16828:

Description: 
Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() to 
get missing blocks. This creates inconsistency with metasave.

In the case where orphaned block is present, metasave and fsck show different 
result:

metasave shows:

 
{code:java}
Metasave: Blocks currently missing: 1
[orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 e: 
0){code}
 

but fsck -list-corruptfileblocks  shows different count:
{noformat}
The filesystem under path '/' has 0 CORRUPT files
{noformat}
This also created inconsistency between dfshealth ui. In dfshealth UI:
 * Missing blocks count comes from blockManager.getMissinngBlockCount()
 * Corrupted file comes from fsck: NamenodeFsck.listCorruptFileBlocks() which 
doesn't count orphaned blocks.

!image-2022-10-28-15-18-24-944.png!

  was:
Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() to 
get missing blocks. This creates inconsistency with metasave.

In the case where orphaned block is present, metasave and fsck show different 
result:

metasave shows:

 
{code:java}
Metasave: Blocks currently missing: 1
[orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 e: 
0){code}
 

but fsck -list-corruptfileblocks  shows different count:
{noformat}
The filesystem under path '/' has 0 CORRUPT files
{noformat}
 


> Fsck doesn't count orphaned missing blocks.
> ---
>
> Key: HDFS-16828
> URL: https://issues.apache.org/jira/browse/HDFS-16828
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0
>    Reporter: Lei Yang
>Priority: Minor
> Attachments: image-2022-10-28-15-18-24-944.png
>
>
> Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() 
> to get missing blocks. This creates inconsistency with metasave.
> In the case where orphaned block is present, metasave and fsck show different 
> result:
> metasave shows:
>  
> {code:java}
> Metasave: Blocks currently missing: 1
> [orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 
> e: 0){code}
>  
> but fsck -list-corruptfileblocks  shows different count:
> {noformat}
> The filesystem under path '/' has 0 CORRUPT files
> {noformat}
> This also created inconsistency between dfshealth ui. In dfshealth UI:
>  * Missing blocks count comes from blockManager.getMissinngBlockCount()
>  * Corrupted file comes from fsck: NamenodeFsck.listCorruptFileBlocks() which 
> doesn't count orphaned blocks.
> !image-2022-10-28-15-18-24-944.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16828) Fsck doesn't count orphaned missing blocks.

2022-10-28 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16828:

Attachment: image-2022-10-28-15-18-24-944.png

> Fsck doesn't count orphaned missing blocks.
> ---
>
> Key: HDFS-16828
> URL: https://issues.apache.org/jira/browse/HDFS-16828
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0
>    Reporter: Lei Yang
>Priority: Minor
> Attachments: image-2022-10-28-15-18-24-944.png
>
>
> Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() 
> to get missing blocks. This creates inconsistency with metasave.
> In the case where orphaned block is present, metasave and fsck show different 
> result:
> metasave shows:
>  
> {code:java}
> Metasave: Blocks currently missing: 1
> [orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 
> e: 0){code}
>  
> but fsck -list-corruptfileblocks  shows different count:
> {noformat}
> The filesystem under path '/' has 0 CORRUPT files
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16828) Fsck doesn't count orphaned missing blocks.

2022-10-28 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16828:

Affects Version/s: 2.10.0

> Fsck doesn't count orphaned missing blocks.
> ---
>
> Key: HDFS-16828
> URL: https://issues.apache.org/jira/browse/HDFS-16828
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0
>    Reporter: Lei Yang
>Priority: Minor
>
> Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() 
> to get missing blocks. This creates inconsistency with metasave.
> In the case where orphaned block is present, metasave and fsck show different 
> result:
> metasave shows:
>  
> {code:java}
> Metasave: Blocks currently missing: 1
> [orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 
> e: 0){code}
>  
> but fsck -list-corruptfileblocks  shows different count:
> {noformat}
> The filesystem under path '/' has 0 CORRUPT files
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16828) Fsck doesn't count orphaned missing blocks.

2022-10-28 Thread Lei Yang (Jira)
Lei Yang created HDFS-16828:
---

 Summary: Fsck doesn't count orphaned missing blocks.
 Key: HDFS-16828
 URL: https://issues.apache.org/jira/browse/HDFS-16828
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lei Yang


Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() to 
get missing blocks. This creates inconsistency with metasave.

In the case where orphaned block is present, metasave and fsck show different 
result:

metasave shows:

 
{code:java}
Metasave: Blocks currently missing: 1
[orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 e: 
0){code}
 

but fsck -list-corruptfileblocks  shows different count:
{noformat}
The filesystem under path '/' has 0 CORRUPT files
{noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16828) Fsck doesn't count orphaned missing blocks.

2022-10-28 Thread Lei Yang (Jira)
Lei Yang created HDFS-16828:
---

 Summary: Fsck doesn't count orphaned missing blocks.
 Key: HDFS-16828
 URL: https://issues.apache.org/jira/browse/HDFS-16828
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lei Yang


Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() to 
get missing blocks. This creates inconsistency with metasave.

In the case where orphaned block is present, metasave and fsck show different 
result:

metasave shows:

 
{code:java}
Metasave: Blocks currently missing: 1
[orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 e: 
0){code}
 

but fsck -list-corruptfileblocks  shows different count:
{noformat}
The filesystem under path '/' has 0 CORRUPT files
{noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



Re: [PULL 3/3] vdpa: Fix memory listener deletions of iova tree

2022-07-28 Thread Lei Yang
Jason Wang  于2022年7月28日周四 14:27写道:

> On Thu, Jul 28, 2022 at 2:14 PM Lei Yang  wrote:
> >
> > I tried to manually changed this line then test this branch on local
> host. After the migration is successful, the qemu core dump occurs on the
> shutdown inside guest.
> >
> > Compiled qemu Steps:
> > # git clone https://gitlab.com/eperezmartin/qemu-kvm.git
> > # cd qemu-kvm/
> > # mkdir build
> > # cd build/
> > # git checkout bd85496c2a8c1ebf34f908fca2be2ab9852fd0e9
>
> I got this
>
> fatal: reference is not a tree: bd85496c2a8c1ebf34f908fca2be2ab9852fd0e9
>
> and my HEAD is:
>
> commit 7b17a1a841fc2336eba53afade9cadb14bd3dd9a (HEAD -> master, tag:
> v7.1.0-rc0, origin/master, origin/HEAD)
> Author: Richard Henderson 
> Date:   Tue Jul 26 18:03:16 2022 -0700
>
> Update version for v7.1.0-rc0 release
>
> Signed-off-by: Richard Henderson 
>

I tried to recompile it use you mentioned commit, but the problem is
reproduced again:
# git clone git://git.qemu.org/qemu.git
# cd qemu/
# git log
# mkdir build
# cd build/
# vim /root/qemu/hw/virtio/vhost-vdpa.c
# ../configure --target-list=x86_64-softmmu --enable-debug
# make

Latest commit:
commit 7b17a1a841fc2336eba53afade9cadb14bd3dd9a (HEAD -> master, tag:
v7.1.0-rc0, origin/master, origin/HEAD)
Author: Richard Henderson 
Date:   Tue Jul 26 18:03:16 2022 -0700

Update version for v7.1.0-rc0 release

Signed-off-by: Richard Henderson 

>
> > # vim /root/qemu-kvm/hw/virtio/vhost-vdpa.c
> > (Chanege "vhost_iova_tree_remove(v->iova_tree, _region);" to
> "vhost_iova_tree_remove(v->iova_tree, result);")
>
> Any reason you need to manually change the line since it has been merged?
>
> > # ../configure --target-list=x86_64-softmmu --enable-debug
> > # make
>
> So if I understand you correctly, you meant the issue is not fixed?
>

>From my side, this is a new issue. Because the guest can boot up normally
and complete the migration. It is just that after the migration is
successful, after shutdown in the guest, a core dump occurs

Thanks

>
> Thanks
>
> >
> > Core dump messages:
> > # gdb /root/qemu-kvm/build/x86_64-softmmu/qemu-system-x86_64
> core.qemu-system-x86.7419
> > (gdb) bt full
> > #0  0x56107c19afa9 in vhost_vdpa_listener_region_del
> (listener=0x7ff9a9c691a0, section=0x7ffd3889ad20)
> > at ../hw/virtio/vhost-vdpa.c:290
> > result = 0x0
> > vaddr = 0x7ff29be0
> > mem_region = {iova = 0, translated_addr = 140679973961728, size
> = 30064771071, perm = IOMMU_NONE}
> > v = 0x7ff9a9c69190
> > iova = 4294967296
> > llend = 34359738368
> > llsize = 30064771072
> > ret = 32765
> > __func__ = "vhost_vdpa_listener_region_del"
> > #1  0x56107c1ca915 in listener_del_address_space
> (listener=0x7ff9a9c691a0, as=0x56107cccbc00 )
> > at ../softmmu/memory.c:2939
> > section =
> >   {size = 30064771072, mr = 0x56107e116270, fv = 0x7ff1e02a4090,
> offset_within_region = 2147483648, offset_within_address_space =
> 4294967296, readonly = false, nonvolatile = false}
> > view = 0x7ff1e02a4090
> > fr = 0x7ff1e04027f0
> > #2  0x56107c1cac39 in memory_listener_unregister
> (listener=0x7ff9a9c691a0) at ../softmmu/memory.c:2989
> > #3  0x56107c19d007 in vhost_vdpa_dev_start (dev=0x56107e126ea0,
> started=false) at ../hw/virtio/vhost-vdpa.c:1134
> > v = 0x7ff9a9c69190
> > ok = true
> > #4  0x56107c190252 in vhost_dev_stop (hdev=0x56107e126ea0,
> vdev=0x56107f40cb50) at ../hw/virtio/vhost.c:1828
> > i = 32761
> > __PRETTY_FUNCTION__ = "vhost_dev_stop"
> > #5  0x56107bebe26c in vhost_net_stop_one (net=0x56107e126ea0,
> dev=0x56107f40cb50) at ../hw/net/vhost_net.c:315
> > file = {index = 0, fd = -1}
> > __PRETTY_FUNCTION__ = "vhost_net_stop_one"
> > #6  0x56107bebe6bf in vhost_net_stop (dev=0x56107f40cb50,
> ncs=0x56107f421850, data_queue_pairs=1, cvq=0)
> > at ../hw/net/vhost_net.c:425
> > qbus = 0x56107f40cac8
> > vbus = 0x56107f40cac8
> > k = 0x56107df1a220
> > n = 0x56107f40cb50
> > peer = 0x7ff9a9c69010
> > total_notifiers = 2
> > nvhosts = 1
> > i = 0
> > --Type  for more, q to quit, c to continue without paging--
> > r = 32765
> > __PRETTY_FUNCTION__ = "vhost_net_stop"
> > #7  0x56107c14af24 in virtio_net_vhost_status (n=0x56107f40cb50,
> status=15 '\01

Re: [PULL 3/3] vdpa: Fix memory listener deletions of iova tree

2022-07-28 Thread Lei Yang
 fill the first region with .iova = 0, causing a mapping
>   with the same iova and device complains, if the next action is a map.
> * Next unmap will cause to try to unmap again iova = 0, causing the
>   device to complain that no region was mapped at iova = 0.
>
> Fixes: 34e3c94edaef ("vdpa: Add custom IOTLB translations to SVQ")
> Reported-by: Lei Yang 
> Signed-off-by: Eugenio Pérez 
> Signed-off-by: Jason Wang 
> ---
>  hw/virtio/vhost-vdpa.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index bce64f4..3ff9ce3 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -290,7 +290,7 @@ static void
> vhost_vdpa_listener_region_del(MemoryListener *listener,
>
>  result = vhost_iova_tree_find_iova(v->iova_tree, _region);
>  iova = result->iova;
> -vhost_iova_tree_remove(v->iova_tree, _region);
> +vhost_iova_tree_remove(v->iova_tree, result);
>  }
>  vhost_vdpa_iotlb_batch_begin_once(v);
>  ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
> --
> 2.7.4
>
>


[Tinyos-help] [CFP] MSN 2022 Submission Deadline Extended to July 31, 2022 (Firm Deadline)

2022-07-15 Thread Lei Yang
[Apologies if you receive multiple copies of this message]

[Firm deadline: July 31, AoE time]
[Hybrid of in-person and virtual]

The 18th International Conference on Mobility, Sensing and Networking (MSN
2022)
December 14-16, 2022 - Guangzhou, China
https://ieee-msn.org/2022

MSN 2022 provides a forum for academic researchers and industry
practitioners to present research progresses, exchange new ideas, and
identify future directions in the field of Mobility, Sensing and
Networking. MSN 2022 is technically sponsored by IEEE.


SCOPES & OBJECTIVES

Mobility, sensing and networking are the key areas of enabling technologies
for the next-generation networks, Internet of Things and Cyber-Physical
Systems.
Recent years have witnessed the increasing convergence of algorithms,
protocols, and applications for mobility, sensing and networking in a range
of applications including connected vehicles, smart cities, smart
manufacturing, smart healthcare, smart agriculture, and digital twins.
Building on the past 17 years of success, the 18th International Conference
on Mobility, Sensing and Networking (MSN 2022) provides a forum for
academic researchers and industry practitioners to exchange new research
ideas, present their progress, and identify future directions in the field
of mobility, sensing and networking.
The conference solicits submissions from all research areas related to
mobility, sensing and networking, as well as their corresponding systems
and applications. Topics of interests are covered by the following tracks:

- Mobile & Wireless Sensing and Networking
- Edge Computing, IoT and Digital Twins
- Security, Privacy, Trust, and Blockchain
- Big Data and AI
- Systems, Tools and Testbeds
- Applications in Smart Cities, Healthcare, and Other Areas


SUBMISSION PROCEDURE
All manuscripts need to be submitted via EasyChair (
https://easychair.org/conferences/?conf=msn2022).
Submitted manuscripts must be prepared according to IEEE Computer Society
Proceedings Format (double column, 10pt font, letter paper) and submitted
in the PDF format. The manuscript submitted for review should be no longer
than 8 pages. After the manuscript is accepted, the camera-ready paper may
have up to 10 pages, subject to an additional fee per extra page.
Manuscripts should be submitted to one of the research tracks. Submitted
manuscripts must not contain previously published material or be under
consideration for publication in another conference or journal at the time
of submission. The accepted papers will be included in IEEE Xplore.


IMPORTANT DATES
Submission due: Jul 31, 2022 (extended)
Notification: Sep 15, 2022
Camera-ready due: Oct 15, 2022
Conference date: Dec 14-16, 2022
___
Tinyos-help mailing list
Tinyos-help@millennium.berkeley.edu
https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help

[Tinyos-help] [CFP] The 18th Int'l Conf. on Mobility, Sensing and Networking (MSN 2022) Dec 14-16, 2022, Guangzhou, China

2022-07-13 Thread Lei Yang
[Apologies if you receive multiple copies of this message]

[Last Three Days! Deadline: July 15, AoE time]
[Hybrid of in-person and virtual]

The 18th International Conference on Mobility, Sensing and Networking (MSN
2022)
December 14-16, 2022 - Guangzhou, China
https://ieee-msn.org/2022

MSN 2022 provides a forum for academic researchers and industry
practitioners to present research progresses, exchange new ideas, and
identify future directions in the field of Mobility, Sensing and
Networking. MSN 2022 is technically sponsored by IEEE.


SCOPES & OBJECTIVES

Mobility, sensing and networking are the key areas of enabling technologies
for the next-generation networks, Internet of Things and Cyber-Physical
Systems.
Recent years have witnessed the increasing convergence of algorithms,
protocols, and applications for mobility, sensing and networking in a range
of applications including connected vehicles, smart cities, smart
manufacturing, smart healthcare, smart agriculture, and digital twins.
Building on the past 17 years of success, the 18th International Conference
on Mobility, Sensing and Networking (MSN 2022) provides a forum for
academic researchers and industry practitioners to exchange new research
ideas, present their progress, and identify future directions in the field
of mobility, sensing and networking.
The conference solicits submissions from all research areas related to
mobility, sensing and networking, as well as their corresponding systems
and applications. Topics of interests are covered by the following tracks:

- Mobile & Wireless Sensing and Networking
- Edge Computing, IoT and Digital Twins
- Security, Privacy, Trust, and Blockchain
- Big Data and AI
- Systems, Tools and Testbeds
- Applications in Smart Cities, Healthcare, and Other Areas


SUBMISSION PROCEDURE
All manuscripts need to be submitted via EasyChair (
https://easychair.org/conferences/?conf=msn2022).
Submitted manuscripts must be prepared according to IEEE Computer Society
Proceedings Format (double column, 10pt font, letter paper) and submitted
in the PDF format. The manuscript submitted for review should be no longer
than 8 pages. After the manuscript is accepted, the camera-ready paper may
have up to 10 pages, subject to an additional fee per extra page.
Manuscripts should be submitted to one of the research tracks. Submitted
manuscripts must not contain previously published material or be under
consideration for publication in another conference or journal at the time
of submission. The accepted papers will be included in IEEE Xplore.


IMPORTANT DATES
Submission due: Jul 15, 2022 (extended)
Notification: Sep 15, 2022
Camera-ready due: Oct 15, 2022
Conference date: Dec 14-16, 2022
___
Tinyos-help mailing list
Tinyos-help@millennium.berkeley.edu
https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help

[UAI] [CFP] The 18th Int'l Conf. on Mobility, Sensing and Networking (MSN 2022) Dec 14-16, 2022, Guangzhou, China

2022-07-12 Thread Lei Yang
[Apologies if you receive multiple copies of this message]

[Last Three Days! Deadline: July 15, AoE time]
[Hybrid of in-person and virtual]

The 18th International Conference on Mobility, Sensing and Networking (MSN
2022)
December 14-16, 2022 - Guangzhou, China
https://ieee-msn.org/2022

MSN 2022 provides a forum for academic researchers and industry
practitioners to present research progresses, exchange new ideas, and
identify future directions in the field of Mobility, Sensing and
Networking. MSN 2022 is technically sponsored by IEEE.


SCOPES & OBJECTIVES

Mobility, sensing and networking are the key areas of enabling technologies
for the next-generation networks, Internet of Things and Cyber-Physical
Systems.
Recent years have witnessed the increasing convergence of algorithms,
protocols, and applications for mobility, sensing and networking in a range
of applications including connected vehicles, smart cities, smart
manufacturing, smart healthcare, smart agriculture, and digital twins.
Building on the past 17 years of success, the 18th International Conference
on Mobility, Sensing and Networking (MSN 2022) provides a forum for
academic researchers and industry practitioners to exchange new research
ideas, present their progress, and identify future directions in the field
of mobility, sensing and networking.
The conference solicits submissions from all research areas related to
mobility, sensing and networking, as well as their corresponding systems
and applications. Topics of interests are covered by the following tracks:

- Mobile & Wireless Sensing and Networking
- Edge Computing, IoT and Digital Twins
- Security, Privacy, Trust, and Blockchain
- Big Data and AI
- Systems, Tools and Testbeds
- Applications in Smart Cities, Healthcare, and Other Areas


SUBMISSION PROCEDURE
All manuscripts need to be submitted via EasyChair (
https://easychair.org/conferences/?conf=msn2022).
Submitted manuscripts must be prepared according to IEEE Computer Society
Proceedings Format (double column, 10pt font, letter paper) and submitted
in the PDF format. The manuscript submitted for review should be no longer
than 8 pages. After the manuscript is accepted, the camera-ready paper may
have up to 10 pages, subject to an additional fee per extra page.
Manuscripts should be submitted to one of the research tracks. Submitted
manuscripts must not contain previously published material or be under
consideration for publication in another conference or journal at the time
of submission. The accepted papers will be included in IEEE Xplore.


IMPORTANT DATES
Submission due: Jul 15, 2022 (extended)
Notification: Sep 15, 2022
Camera-ready due: Oct 15, 2022
Conference date: Dec 14-16, 2022
___
uai mailing list
uai@engr.orst.edu
https://it.engineering.oregonstate.edu/mailman/listinfo/uai


[Tinyos-help] [CFP] MSN 2022 Submission Deadline Extended to July 15, 2022

2022-06-30 Thread Lei Yang
[Apologies if you receive multiple copies of this message]


The 18th International Conference on Mobility, Sensing and Networking (MSN
2022)
December 14-16, 2022 - Guangzhou, China
https://ieee-msn.org/2022

MSN 2022 provides a forum for academic researchers and industry
practitioners to present research progresses, exchange new ideas, and
identify future directions in the field of Mobility, Sensing and
Networking. MSN 2022 is technically sponsored by IEEE.


SCOPES & OBJECTIVES

Mobility, sensing and networking are the key areas of enabling technologies
for the next-generation networks, Internet of Things and Cyber-Physical
Systems.
Recent years have witnessed the increasing convergence of algorithms,
protocols, and applications for mobility, sensing and networking in a range
of applications including connected vehicles, smart cities, smart
manufacturing, smart healthcare, smart agriculture, and digital twins.
Building on the past 17 years of success, the 18th International Conference
on Mobility, Sensing and Networking (MSN 2022) provides a forum for
academic researchers and industry practitioners to exchange new research
ideas, present their progress, and identify future directions in the field
of mobility, sensing and networking.
The conference solicits submissions from all research areas related to
mobility, sensing and networking, as well as their corresponding systems
and applications. Topics of interests are covered by the following tracks:

- Mobile & Wireless Sensing and Networking
- Edge Computing, IoT and Digital Twins
- Security, Privacy, Trust, and Blockchain
- Big Data and AI
- Systems, Tools and Testbeds
- Applications in Smart Cities, Healthcare, and Other Areas


SUBMISSION PROCEDURE
All manuscripts need to be submitted via EasyChair (
https://easychair.org/conferences/?conf=msn2022).
Submitted manuscripts must be prepared according to IEEE Computer Society
Proceedings Format (double column, 10pt font, letter paper) and submitted
in the PDF format. The manuscript submitted for review should be no longer
than 8 pages. After the manuscript is accepted, the camera-ready paper may
have up to 10 pages, subject to an additional fee per extra page.
Manuscripts should be submitted to one of the research tracks. Submitted
manuscripts must not contain previously published material or be under
consideration for publication in another conference or journal at the time
of submission. The accepted papers will be included in IEEE Xplore.


IMPORTANT DATES
Submission due: July 15, 2022 (extended)
Notification: Sep 15, 2022
Camera-ready due: Oct 15, 2022
Conference date: Dec 14-16, 2022
___
Tinyos-help mailing list
Tinyos-help@millennium.berkeley.edu
https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help

[UAI] [CFP] The 18th Int'l Conf. on Mobility, Sensing and Networking (MSN 2022) Dec 14-16, 2022, Guangzhou, China

2022-06-21 Thread Lei Yang
[Apologies if you receive multiple copies of this message]


The 18th International Conference on Mobility, Sensing and Networking (MSN
2022)
December 14-16, 2022 - Guangzhou, China
https://ieee-msn.org/2022

MSN 2022 provides a forum for academic researchers and industry
practitioners to present research progresses, exchange new ideas, and
identify future directions in the field of Mobility, Sensing and
Networking. MSN 2022 is technically sponsored by IEEE.


SCOPES & OBJECTIVES

Mobility, sensing and networking are the key areas of enabling technologies
for the next-generation networks, Internet of Things and Cyber-Physical
Systems.
Recent years have witnessed the increasing convergence of algorithms,
protocols, and applications for mobility, sensing and networking in a range
of applications including connected vehicles, smart cities, smart
manufacturing, smart healthcare, smart agriculture, and digital twins.
Building on the past 17 years of success, the 18th International Conference
on Mobility, Sensing and Networking (MSN 2022) provides a forum for
academic researchers and industry practitioners to exchange new research
ideas, present their progress, and identify future directions in the field
of mobility, sensing and networking.
The conference solicits submissions from all research areas related to
mobility, sensing and networking, as well as their corresponding systems
and applications. Topics of interests are covered by the following tracks:

- Mobile & Wireless Sensing and Networking
- Edge Computing, IoT and Digital Twins
- Security, Privacy, Trust, and Blockchain
- Big Data and AI
- Systems, Tools and Testbeds
- Applications in Smart Cities, Healthcare, and Other Areas


SUBMISSION PROCEDURE
All manuscripts need to be submitted via EasyChair (
https://easychair.org/conferences/?conf=msn2022).
Submitted manuscripts must be prepared according to IEEE Computer Society
Proceedings Format (double column, 10pt font, letter paper) and submitted
in the PDF format. The manuscript submitted for review should be no longer
than 8 pages. After the manuscript is accepted, the camera-ready paper may
have up to 10 pages, subject to an additional fee per extra page.
Manuscripts should be submitted to one of the research tracks. Submitted
manuscripts must not contain previously published material or be under
consideration for publication in another conference or journal at the time
of submission. The accepted papers will be included in IEEE Xplore.


IMPORTANT DATES
Submission due: Jul 1, 2022
Notification: Sep 15, 2022
Camera-ready due: Oct 15, 2022
Conference date: Dec 14-16, 2022
___
uai mailing list
uai@engr.orst.edu
https://it.engineering.oregonstate.edu/mailman/listinfo/uai


[Tinyos-help] [CFP] The 18th Int'l Conf. on Mobility, Sensing and Networking (MSN 2022) Dec 14-16, 2022, Guangzhou, China

2022-06-21 Thread Lei Yang
[Apologies if you receive multiple copies of this message]


The 18th International Conference on Mobility, Sensing and Networking (MSN
2022)
December 14-16, 2022 - Guangzhou, China
https://ieee-msn.org/2022

MSN 2022 provides a forum for academic researchers and industry
practitioners to present research progresses, exchange new ideas, and
identify future directions in the field of Mobility, Sensing and
Networking. MSN 2022 is technically sponsored by IEEE.


SCOPES & OBJECTIVES

Mobility, sensing and networking are the key areas of enabling technologies
for the next-generation networks, Internet of Things and Cyber-Physical
Systems.
Recent years have witnessed the increasing convergence of algorithms,
protocols, and applications for mobility, sensing and networking in a range
of applications including connected vehicles, smart cities, smart
manufacturing, smart healthcare, smart agriculture, and digital twins.
Building on the past 17 years of success, the 18th International Conference
on Mobility, Sensing and Networking (MSN 2022) provides a forum for
academic researchers and industry practitioners to exchange new research
ideas, present their progress, and identify future directions in the field
of mobility, sensing and networking.
The conference solicits submissions from all research areas related to
mobility, sensing and networking, as well as their corresponding systems
and applications. Topics of interests are covered by the following tracks:

- Mobile & Wireless Sensing and Networking
- Edge Computing, IoT and Digital Twins
- Security, Privacy, Trust, and Blockchain
- Big Data and AI
- Systems, Tools and Testbeds
- Applications in Smart Cities, Healthcare, and Other Areas


SUBMISSION PROCEDURE
All manuscripts need to be submitted via EasyChair (
https://easychair.org/conferences/?conf=msn2022).
Submitted manuscripts must be prepared according to IEEE Computer Society
Proceedings Format (double column, 10pt font, letter paper) and submitted
in the PDF format. The manuscript submitted for review should be no longer
than 8 pages. After the manuscript is accepted, the camera-ready paper may
have up to 10 pages, subject to an additional fee per extra page.
Manuscripts should be submitted to one of the research tracks. Submitted
manuscripts must not contain previously published material or be under
consideration for publication in another conference or journal at the time
of submission. The accepted papers will be included in IEEE Xplore.


IMPORTANT DATES
Submission due: Jul 1, 2022
Notification: Sep 15, 2022
Camera-ready due: Oct 15, 2022
Conference date: Dec 14-16, 2022
___
Tinyos-help mailing list
Tinyos-help@millennium.berkeley.edu
https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help

[Tinyos-help] [CFP] The 18th Int'l Conf. on Mobility, Sensing and Networking (MSN 2022) Dec 14-16, 2022, Guangzhou, China

2022-06-03 Thread Lei Yang
[Apologies if you receive multiple copies of this message]


The 18th International Conference on Mobility, Sensing and Networking (MSN
2022)
December 14-16, 2022 - Guangzhou, China
https://ieee-msn.org/2022

MSN 2022 provides a forum for academic researchers and industry
practitioners to present research progresses, exchange new ideas, and
identify future directions in the field of Mobility, Sensing and
Networking. MSN 2022 is technically sponsored by IEEE.


SCOPES & OBJECTIVES

Mobility, sensing and networking are the key areas of enabling technologies
for the next-generation networks, Internet of Things and Cyber-Physical
Systems.
Recent years have witnessed the increasing convergence of algorithms,
protocols, and applications for mobility, sensing and networking in a range
of applications including connected vehicles, smart cities, smart
manufacturing, smart healthcare, smart agriculture, and digital twins.
Building on the past 17 years of success, the 18th International Conference
on Mobility, Sensing and Networking (MSN 2022) provides a forum for
academic researchers and industry practitioners to exchange new research
ideas, present their progress, and identify future directions in the field
of mobility, sensing and networking.
The conference solicits submissions from all research areas related to
mobility, sensing and networking, as well as their corresponding systems
and applications. Topics of interests are covered by the following tracks:

- Mobile & Wireless Sensing and Networking
- Edge Computing, IoT and Digital Twins
- Security, Privacy, Trust, and Blockchain
- Big Data and AI
- Systems, Tools and Testbeds
- Applications in Smart Cities, Healthcare, and Other Areas


SUBMISSION PROCEDURE
All manuscripts need to be submitted via EasyChair (
https://easychair.org/conferences/?conf=msn2022).
Submitted manuscripts must be prepared according to IEEE Computer Society
Proceedings Format (double column, 10pt font, letter paper) and submitted
in the PDF format. The manuscript submitted for review should be no longer
than 8 pages. After the manuscript is accepted, the camera-ready paper may
have up to 10 pages, subject to an additional fee per extra page.
Manuscripts should be submitted to one of the research tracks. Submitted
manuscripts must not contain previously published material or be under
consideration for publication in another conference or journal at the time
of submission. The accepted papers will be included in IEEE Xplore.


IMPORTANT DATES
Submission due: Jul 1, 2022
Notification: Sep 15, 2022
Camera-ready due: Oct 15, 2022
Conference date: Dec 14-16, 2022
___
Tinyos-help mailing list
Tinyos-help@millennium.berkeley.edu
https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help

[UAI] [CFP] The 18th Int'l Conf. on Mobility, Sensing and Networking (MSN 2022) Dec 14-16, 2022, Guangzhou, China

2022-05-09 Thread Lei Yang
Dear All,

[Apologies if you receive multiple copies of this email.]

You are welcome to submit papers to the 18th International Conference on
Mobility, Sensing and Networking (MSN 2022) December 14-16, 2022 ·
Guangzhou, China.
https://ieee-msn.org/2022

MSN 2022 provides a forum for academic researchers and industry
practitioners to present research progresses, exchange new ideas, and
identify future directions in the field of Mobility, Sensing and
Networking. MSN 2022 is technically sponsored by IEEE.

[Scope and Objectives]
Mobility, sensing and networking are the key areas of enabling technologies
for the next-generation networks, Internet of Things and Cyber-Physical
Systems. Recent years have witnessed the increasing convergence of
algorithms, protocols, and applications for mobility, sensing and
networking in a range of applications including connected vehicles, smart
cities, smart manufacturing, smart healthcare, smart agriculture, and
digital twins. Building on the past 17 years of success, the 18th
International Conference on Mobility, Sensing and Networking (MSN 2022)
provides a forum for academic researchers and industry practitioners to
exchange new research ideas, present their progress, and identify future
directions in the field of mobility, sensing and networking.
The conference solicits submissions from all research areas related to
mobility, sensing and networking, as well as their corresponding systems
and applications.
Topics of interests are covered by the following tracks:
• Mobile & Wireless Sensing and Networking
• Edge Computing, IoT and Digital Twins
• Security, Privacy, Trust, and Blockchain
• Big Data and AI
• Systems, Tools and Testbed
• Applications in Smart Cities, Healthcare and Other Areas

[Submission Procedures]
Submitted manuscripts must be prepared according to IEEE Computer Society
Proceedings Format (double column, 10pt font, letter paper) and submitted
in the PDF format. The manuscript submitted for review should be no longer
than 8 pages. After the manuscript is accepted, the camera-ready paper may
have up to 10 pages, subject to an additional fee per extra page.
Manuscripts should be submitted to one of the research tracks. Submitted
manuscripts must not contain previously published material or be under
consideration for publication in another conference or journal at the time
of submission. The accepted papers will be included in IEEE Xplore.

[Important Dates]
Submission due:  Jul 1, 2022
Notification: Sep 15, 2022
Camera-ready due: Oct 15, 2022
Conference date: Dec 14-16, 2022


Best Regards,
Lei Yang
___
uai mailing list
uai@engr.orst.edu
https://it.engineering.oregonstate.edu/mailman/listinfo/uai


[Tinyos-help] [CFP] The 18th Int'l Conf. on Mobility, Sensing and Networking (MSN 2022) Dec 14-16, 2022, Guangzhou, China

2022-05-06 Thread Lei Yang
Dear All,

[Apologies if you receive multiple copies of this email.]

You are welcome to submit papers to the 18th International Conference on
Mobility, Sensing and Networking (MSN 2022) December 14-16, 2022 ·
Guangzhou, China.
https://ieee-msn.org/2022

MSN 2022 provides a forum for academic researchers and industry
practitioners to present research progresses, exchange new ideas, and
identify future directions in the field of Mobility, Sensing and
Networking. MSN 2022 is technically sponsored by IEEE.

[Scope and Objectives]
Mobility, sensing and networking are the key areas of enabling technologies
for the next-generation networks, Internet of Things and Cyber-Physical
Systems. Recent years have witnessed the increasing convergence of
algorithms, protocols, and applications for mobility, sensing and
networking in a range of applications including connected vehicles, smart
cities, smart manufacturing, smart healthcare, smart agriculture, and
digital twins. Building on the past 17 years of success, the 18th
International Conference on Mobility, Sensing and Networking (MSN 2022)
provides a forum for academic researchers and industry practitioners to
exchange new research ideas, present their progress, and identify future
directions in the field of mobility, sensing and networking.
The conference solicits submissions from all research areas related to
mobility, sensing and networking, as well as their corresponding systems
and applications.
Topics of interests are covered by the following tracks:
• Mobile & Wireless Sensing and Networking
• Edge Computing, IoT and Digital Twins
• Security, Privacy, Trust, and Blockchain
• Big Data and AI
• Systems, Tools and Testbed
• Applications in Smart Cities, Healthcare and Other Areas

[Submission Procedures]
Submitted manuscripts must be prepared according to IEEE Computer Society
Proceedings Format (double column, 10pt font, letter paper) and submitted
in the PDF format. The manuscript submitted for review should be no longer
than 8 pages. After the manuscript is accepted, the camera-ready paper may
have up to 10 pages, subject to an additional fee per extra page.
Manuscripts should be submitted to one of the research tracks. Submitted
manuscripts must not contain previously published material or be under
consideration for publication in another conference or journal at the time
of submission. The accepted papers will be included in IEEE Xplore.

[Important Dates]
Submission due:  Jul 1, 2022
Notification: Sep 15, 2022
Camera-ready due: Oct 15, 2022
Conference date: Dec 14-16, 2022


Best Regards,
Lei Yang
___
Tinyos-help mailing list
Tinyos-help@millennium.berkeley.edu
https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help

[jira] [Updated] (HADOOP-18193) Support nested mount points in INodeTree

2022-05-03 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HADOOP-18193:
--
Description: 
Defining following client mount table config is not supported in  INodeTree and 
will throw FileAlreadyExistsException

 
{code:java}
fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar
fs.viewfs.mounttable.link./foo=hdfs://nn02/foo
{code}
INodeTree has 2 methods that need change to support nested mount points.
{code:java}
createLink(): build INodeTree during fs init.
resolve(): resolve path in INodeTree with viewfs apis.
{code}
ViewFileSystem and ViewFs maintains an INodeTree instance(fsState) in both 
classes and call fsState.resolve(..) to resolve path to specific mount point. 
INodeTree.resolve encapsulates the logic of nested mount point resolving. So no 
changes are expected in both classes. 

AC:
 # INodeTree.createlink should support creating nested mount points.(INodeTree 
is constructed during fs init)
 # INodeTree.resolve should support resolve path based on nested mount points. 
(INodeTree.resolve is used in viewfs apis)
 # No regression in existing ViewFileSystem and ViewFs apis.
 # Ensure some important apis are not broken with nested mount points. (Rename, 
getContentSummary, listStatus...)

 

Spec:

Please review attached pdf for spec about this feature.

  was:
Defining following client mount table config is not supported in  INodeTree and 
will throw FileAlreadyExistsException

 
{code:java}
fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar
fs.viewfs.mounttable.link./foo=hdfs://nn02/foo
{code}
 

 

INodeTree has 2 methods that need change to support nested mount points.

 
{code:java}
createLink(): build INodeTree during fs init.
resolve(): resolve path in INodeTree with viewfs apis.
{code}
 

 

ViewFileSystem and ViewFs maintains an INodeTree instance fsState in both 
classes and call fsState.resolve(..) to resolve path to specific mount point. 
INodeTree.resolve encapsulates the logic of nested mount point resolving. So no 
changes are expected in both classes. 

 

AC:
 # INodeTree.createlink should support creating nested mount points.(INodeTree 
is constructed during fs init)
 # INodeTree.resolve should support resolve path based on nested mount points. 
(INodeTree.resolve is used in viewfs apis)
 # No regression in existing ViewFileSystem and ViewFs apis.
 # Ensure some important apis are not broken with nested mount points. (Rename, 
getContentSummary, listStatus...)

 

Spec:

Please review attached pdf for spec about this feature.


> Support nested mount points in INodeTree
> 
>
> Key: HADOOP-18193
> URL: https://issues.apache.org/jira/browse/HADOOP-18193
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: viewfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Nested Mount Point in ViewFs.pdf
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Defining following client mount table config is not supported in  INodeTree 
> and will throw FileAlreadyExistsException
>  
> {code:java}
> fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar
> fs.viewfs.mounttable.link./foo=hdfs://nn02/foo
> {code}
> INodeTree has 2 methods that need change to support nested mount points.
> {code:java}
> createLink(): build INodeTree during fs init.
> resolve(): resolve path in INodeTree with viewfs apis.
> {code}
> ViewFileSystem and ViewFs maintains an INodeTree instance(fsState) in both 
> classes and call fsState.resolve(..) to resolve path to specific mount point. 
> INodeTree.resolve encapsulates the logic of nested mount point resolving. So 
> no changes are expected in both classes. 
> AC:
>  # INodeTree.createlink should support creating nested mount 
> points.(INodeTree is constructed during fs init)
>  # INodeTree.resolve should support resolve path based on nested mount 
> points. (INodeTree.resolve is used in viewfs apis)
>  # No regression in existing ViewFileSystem and ViewFs apis.
>  # Ensure some important apis are not broken with nested mount points. 
> (Rename, getContentSummary, listStatus...)
>  
> Spec:
> Please review attached pdf for spec about this feature.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18193) Support nested mount points in INodeTree

2022-05-03 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HADOOP-18193:
--
Description: 
Defining following client mount table config is not supported in  INodeTree and 
will throw FileAlreadyExistsException

 
{code:java}
fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar
fs.viewfs.mounttable.link./foo=hdfs://nn02/foo
{code}
 

 

INodeTree has 2 methods that need change to support nested mount points.

 
{code:java}
createLink(): build INodeTree during fs init.
resolve(): resolve path in INodeTree with viewfs apis.
{code}
 

 

ViewFileSystem and ViewFs maintains an INodeTree instance fsState in both 
classes and call fsState.resolve(..) to resolve path to specific mount point. 
INodeTree.resolve encapsulates the logic of nested mount point resolving. So no 
changes are expected in both classes. 

 

AC:
 # INodeTree.createlink should support creating nested mount points.(INodeTree 
is constructed during fs init)
 # INodeTree.resolve should support resolve path based on nested mount points. 
(INodeTree.resolve is used in viewfs apis)
 # No regression in existing ViewFileSystem and ViewFs apis.
 # Ensure some important apis are not broken with nested mount points. (Rename, 
getContentSummary, listStatus...)

 

Spec:

Please review attached pdf for spec about this feature.

  was:
Defining following client mount table config is not supported in  INodeTree and 
will throw FileAlreadyExistsException

fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar

fs.viewfs.mounttable.link./foo=hdfs://nn02/foo

 

INodeTree has 2 methods that need change to support nested mount points.

createLink(..): build INodeTree during fs init.

resolve(..): resolve path in INodeTree with viewfs apis.

 

ViewFileSystem and ViewFs referes INodeTree.resolve(..) to resolve path to 
specific mount point. No changes are expected in both classes. However, we need 
to support existing use cases and make sure no regression are caused.

 

AC:
 # INodeTree.createlink should support creating nested mount points.(INodeTree 
is constructed during fs init)
 # INodeTree.resolve should support resolve path based on nested mount points. 
(INodeTree.resolve is used in viewfs apis)
 # No regression in existing ViewFileSystem and ViewFs apis.
 # Ensure some important apis are not broken with nested mount points. (Rename, 
getContentSummary, listStatus...)

 

Spec:

Please review attached pdf for spec about this feature.


> Support nested mount points in INodeTree
> 
>
> Key: HADOOP-18193
> URL: https://issues.apache.org/jira/browse/HADOOP-18193
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: viewfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Nested Mount Point in ViewFs.pdf
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Defining following client mount table config is not supported in  INodeTree 
> and will throw FileAlreadyExistsException
>  
> {code:java}
> fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar
> fs.viewfs.mounttable.link./foo=hdfs://nn02/foo
> {code}
>  
>  
> INodeTree has 2 methods that need change to support nested mount points.
>  
> {code:java}
> createLink(): build INodeTree during fs init.
> resolve(): resolve path in INodeTree with viewfs apis.
> {code}
>  
>  
> ViewFileSystem and ViewFs maintains an INodeTree instance fsState in both 
> classes and call fsState.resolve(..) to resolve path to specific mount point. 
> INodeTree.resolve encapsulates the logic of nested mount point resolving. So 
> no changes are expected in both classes. 
>  
> AC:
>  # INodeTree.createlink should support creating nested mount 
> points.(INodeTree is constructed during fs init)
>  # INodeTree.resolve should support resolve path based on nested mount 
> points. (INodeTree.resolve is used in viewfs apis)
>  # No regression in existing ViewFileSystem and ViewFs apis.
>  # Ensure some important apis are not broken with nested mount points. 
> (Rename, getContentSummary, listStatus...)
>  
> Spec:
> Please review attached pdf for spec about this feature.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18193) Support nested mount points in INodeTree

2022-05-03 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HADOOP-18193:
--
Description: 
Defining following client mount table config is not supported in  INodeTree and 
will throw FileAlreadyExistsException

fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar

fs.viewfs.mounttable.link./foo=hdfs://nn02/foo

 

INodeTree has 2 methods that need change to support nested mount points.

createLink(..): build INodeTree during fs init.

resolve(..): resolve path in INodeTree with viewfs apis.

 

ViewFileSystem and ViewFs referes INodeTree.resolve(..) to resolve path to 
specific mount point. No changes are expected in both classes. However, we need 
to support existing use cases and make sure no regression are caused.

 

AC:
 # INodeTree.createlink should support creating nested mount points.(INodeTree 
is constructed during fs init)
 # INodeTree.resolve should support resolve path based on nested mount points. 
(INodeTree.resolve is used in viewfs apis)
 # No regression in existing ViewFileSystem and ViewFs apis.
 # Ensure some important apis are not broken with nested mount points. (Rename, 
getContentSummary, listStatus...)

 

Spec:

Please review attached pdf for spec about this feature.

  was:
Defining following client mount table config is not supported in  INodeTree and 
will throw FileAlreadyExistsException

fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar

fs.viewfs.mounttable.link./foo=hdfs://nn02/foo

 

INodeTree has 2 methods that need change to support nested mount points.

createLink(..): build INodeTree during fs init.

resolve(..): resolve path in INodeTree with viewfs apis.

 

ViewFileSystem and ViewFs referes INodeTree.resolve(..) to resolve path to 
specific mount point. No changes are expected in both classes. However, we need 
to support existing use cases and make sure no regression are caused.

 

AC:
 # INodeTree.createlink should support creating nested mount points.(INodeTree 
is constructed during fs init)
 # INodeTree.resolve should support resolve path based on nested mount points. 
(INodeTree.resolve is used in viewfs apis)
 # No regression in existing ViewFileSystem and ViewFs apis.
 # Ensure some important apis are not broken with nested mount points. (Rename, 
getContentSummary, listStatus...)


> Support nested mount points in INodeTree
> 
>
> Key: HADOOP-18193
> URL: https://issues.apache.org/jira/browse/HADOOP-18193
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: viewfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Nested Mount Point in ViewFs.pdf
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Defining following client mount table config is not supported in  INodeTree 
> and will throw FileAlreadyExistsException
> fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar
> fs.viewfs.mounttable.link./foo=hdfs://nn02/foo
>  
> INodeTree has 2 methods that need change to support nested mount points.
> createLink(..): build INodeTree during fs init.
> resolve(..): resolve path in INodeTree with viewfs apis.
>  
> ViewFileSystem and ViewFs referes INodeTree.resolve(..) to resolve path to 
> specific mount point. No changes are expected in both classes. However, we 
> need to support existing use cases and make sure no regression are caused.
>  
> AC:
>  # INodeTree.createlink should support creating nested mount 
> points.(INodeTree is constructed during fs init)
>  # INodeTree.resolve should support resolve path based on nested mount 
> points. (INodeTree.resolve is used in viewfs apis)
>  # No regression in existing ViewFileSystem and ViewFs apis.
>  # Ensure some important apis are not broken with nested mount points. 
> (Rename, getContentSummary, listStatus...)
>  
> Spec:
> Please review attached pdf for spec about this feature.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18193) Support nested mount points in INodeTree

2022-05-02 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HADOOP-18193:
--
Attachment: Nested Mount Point in ViewFs.pdf

> Support nested mount points in INodeTree
> 
>
> Key: HADOOP-18193
> URL: https://issues.apache.org/jira/browse/HADOOP-18193
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: viewfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Nested Mount Point in ViewFs.pdf
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Defining following client mount table config is not supported in  INodeTree 
> and will throw FileAlreadyExistsException
> fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar
> fs.viewfs.mounttable.link./foo=hdfs://nn02/foo
>  
> INodeTree has 2 methods that need change to support nested mount points.
> createLink(..): build INodeTree during fs init.
> resolve(..): resolve path in INodeTree with viewfs apis.
>  
> ViewFileSystem and ViewFs referes INodeTree.resolve(..) to resolve path to 
> specific mount point. No changes are expected in both classes. However, we 
> need to support existing use cases and make sure no regression are caused.
>  
> AC:
>  # INodeTree.createlink should support creating nested mount 
> points.(INodeTree is constructed during fs init)
>  # INodeTree.resolve should support resolve path based on nested mount 
> points. (INodeTree.resolve is used in viewfs apis)
>  # No regression in existing ViewFileSystem and ViewFs apis.
>  # Ensure some important apis are not broken with nested mount points. 
> (Rename, getContentSummary, listStatus...)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18193) Support nested mount points in INodeTree

2022-04-15 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HADOOP-18193:
--
Description: 
Defining following client mount table config is not supported in  INodeTree and 
will throw FileAlreadyExistsException

fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar

fs.viewfs.mounttable.link./foo=hdfs://nn02/foo

 

INodeTree has 2 methods that need change to support nested mount points.

createLink(..): build INodeTree during fs init.

resolve(..): resolve path in INodeTree with viewfs apis.

 

ViewFileSystem and ViewFs referes INodeTree.resolve(..) to resolve path to 
specific mount point. No changes are expected in both classes. However, we need 
to support existing use cases and make sure no regression are caused.

 

AC:
 # INodeTree.createlink should support creating nested mount points.(INodeTree 
is constructed during fs init)
 # INodeTree.resolve should support resolve path based on nested mount points. 
(INodeTree.resolve is used in viewfs apis)
 # No regression in existing ViewFileSystem and ViewFs apis.
 # Ensure some important apis are not broken with nested mount points. (Rename, 
getContentSummary, listStatus...)

  was:
Defining following client mount table config is not supported in  INodeTree and 
will throw FileAlreadyExistsException

fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar

fs.viewfs.mounttable.link./foo=hdfs://nn02/foo

 

INodeTree has 2 methods that need change to support nested mount points.

createLink(..): build INodeTree during fs init.

resolve(..): resolve path in INodeTree with viewfs apis.

 

ViewFileSystem and ViewFs referes INodeTree.resolve(..) to resolve path to 
specific mount point. No changes are expected in both classes. However, we need 
to support existing use cases and make sure no regression.

 

AC:
 # INodeTree.createlink should support creating nested mount points.(INodeTree 
is constructed during fs init)
 # INodeTree.resolve should support resolve path based on nested mount points. 
(INodeTree.resolve is used in viewfs apis)
 # No regression in existing ViewFileSystem and ViewFs apis.
 # Ensure some important apis are not broken with nested mount points. (Rename, 
getContentSummary, listStatus...)


> Support nested mount points in INodeTree
> 
>
> Key: HADOOP-18193
> URL: https://issues.apache.org/jira/browse/HADOOP-18193
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: viewfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Defining following client mount table config is not supported in  INodeTree 
> and will throw FileAlreadyExistsException
> fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar
> fs.viewfs.mounttable.link./foo=hdfs://nn02/foo
>  
> INodeTree has 2 methods that need change to support nested mount points.
> createLink(..): build INodeTree during fs init.
> resolve(..): resolve path in INodeTree with viewfs apis.
>  
> ViewFileSystem and ViewFs referes INodeTree.resolve(..) to resolve path to 
> specific mount point. No changes are expected in both classes. However, we 
> need to support existing use cases and make sure no regression are caused.
>  
> AC:
>  # INodeTree.createlink should support creating nested mount 
> points.(INodeTree is constructed during fs init)
>  # INodeTree.resolve should support resolve path based on nested mount 
> points. (INodeTree.resolve is used in viewfs apis)
>  # No regression in existing ViewFileSystem and ViewFs apis.
>  # Ensure some important apis are not broken with nested mount points. 
> (Rename, getContentSummary, listStatus...)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18193) Support nested mount points in INodeTree

2022-04-13 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HADOOP-18193:
--
Description: 
Defining following client mount table config is not supported in  INodeTree and 
will throw FileAlreadyExistsException

fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar

fs.viewfs.mounttable.link./foo=hdfs://nn02/foo

 

INodeTree has 2 methods that need change to support nested mount points.

createLink(..)

resolve(..)

 

ViewFileSystem and ViewFs referes INodeTree.resolve(..) to resolve path to 
specific mount point. No changes are expected in both classes. However, we need 
to support existing use cases and make sure no regression.

 

AC:
 # INodeTree.createlink should support creating nested mount points.(INodeTree 
is constructed during fs init)
 # INodeTree.resolve should support resolve path based on nested mount points. 
(INodeTree.resolve is used in viewfs apis)
 # No regression in existing ViewFileSystem and ViewFs apis.
 # Ensure some important apis are not broken with nested mount points. (Rename, 
getContentSummary, listStatus...)

  was:
Defining following client mount table config is not supported in  INodeTree and 
will throw FileAlreadyExistsException

fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar

fs.viewfs.mounttable.link./foo=hdfs://nn02/foo

 

INodeTree has 2 methods that need change to support nested mount points.

createLink(..)

resolve(..)

 

ViewFileSystem and ViewFs referes INodeTree.resolve(..) to resolve path to 
specific mount point. No changes are expected in both classes. 


> Support nested mount points in INodeTree
> 
>
> Key: HADOOP-18193
> URL: https://issues.apache.org/jira/browse/HADOOP-18193
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: viewfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>
> Defining following client mount table config is not supported in  INodeTree 
> and will throw FileAlreadyExistsException
> fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar
> fs.viewfs.mounttable.link./foo=hdfs://nn02/foo
>  
> INodeTree has 2 methods that need change to support nested mount points.
> createLink(..)
> resolve(..)
>  
> ViewFileSystem and ViewFs referes INodeTree.resolve(..) to resolve path to 
> specific mount point. No changes are expected in both classes. However, we 
> need to support existing use cases and make sure no regression.
>  
> AC:
>  # INodeTree.createlink should support creating nested mount 
> points.(INodeTree is constructed during fs init)
>  # INodeTree.resolve should support resolve path based on nested mount 
> points. (INodeTree.resolve is used in viewfs apis)
>  # No regression in existing ViewFileSystem and ViewFs apis.
>  # Ensure some important apis are not broken with nested mount points. 
> (Rename, getContentSummary, listStatus...)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18193) Support nested mount points in INodeTree

2022-04-13 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HADOOP-18193:
--
Description: 
Defining following client mount table config is not supported in  INodeTree and 
will throw FileAlreadyExistsException

fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar

fs.viewfs.mounttable.link./foo=hdfs://nn02/foo

 

INodeTree has 2 methods that need change to support nested mount points.

createLink(..): build INodeTree during fs init.

resolve(..): resolve path in INodeTree with viewfs apis.

 

ViewFileSystem and ViewFs referes INodeTree.resolve(..) to resolve path to 
specific mount point. No changes are expected in both classes. However, we need 
to support existing use cases and make sure no regression.

 

AC:
 # INodeTree.createlink should support creating nested mount points.(INodeTree 
is constructed during fs init)
 # INodeTree.resolve should support resolve path based on nested mount points. 
(INodeTree.resolve is used in viewfs apis)
 # No regression in existing ViewFileSystem and ViewFs apis.
 # Ensure some important apis are not broken with nested mount points. (Rename, 
getContentSummary, listStatus...)

  was:
Defining following client mount table config is not supported in  INodeTree and 
will throw FileAlreadyExistsException

fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar

fs.viewfs.mounttable.link./foo=hdfs://nn02/foo

 

INodeTree has 2 methods that need change to support nested mount points.

createLink(..)

resolve(..)

 

ViewFileSystem and ViewFs referes INodeTree.resolve(..) to resolve path to 
specific mount point. No changes are expected in both classes. However, we need 
to support existing use cases and make sure no regression.

 

AC:
 # INodeTree.createlink should support creating nested mount points.(INodeTree 
is constructed during fs init)
 # INodeTree.resolve should support resolve path based on nested mount points. 
(INodeTree.resolve is used in viewfs apis)
 # No regression in existing ViewFileSystem and ViewFs apis.
 # Ensure some important apis are not broken with nested mount points. (Rename, 
getContentSummary, listStatus...)


> Support nested mount points in INodeTree
> 
>
> Key: HADOOP-18193
> URL: https://issues.apache.org/jira/browse/HADOOP-18193
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: viewfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>
> Defining following client mount table config is not supported in  INodeTree 
> and will throw FileAlreadyExistsException
> fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar
> fs.viewfs.mounttable.link./foo=hdfs://nn02/foo
>  
> INodeTree has 2 methods that need change to support nested mount points.
> createLink(..): build INodeTree during fs init.
> resolve(..): resolve path in INodeTree with viewfs apis.
>  
> ViewFileSystem and ViewFs referes INodeTree.resolve(..) to resolve path to 
> specific mount point. No changes are expected in both classes. However, we 
> need to support existing use cases and make sure no regression.
>  
> AC:
>  # INodeTree.createlink should support creating nested mount 
> points.(INodeTree is constructed during fs init)
>  # INodeTree.resolve should support resolve path based on nested mount 
> points. (INodeTree.resolve is used in viewfs apis)
>  # No regression in existing ViewFileSystem and ViewFs apis.
>  # Ensure some important apis are not broken with nested mount points. 
> (Rename, getContentSummary, listStatus...)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18193) Support nested mount points in INodeTree

2022-04-13 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HADOOP-18193:
--
Description: 
Defining following client mount table config is not supported in  INodeTree and 
will throw FileAlreadyExistsException

fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar

fs.viewfs.mounttable.link./foo=hdfs://nn02/foo

 

INodeTree has 2 methods that need change to support nested mount points.

createLink(..)

resolve(..)

 

ViewFileSystem and ViewFs referes INodeTree.resolve(..) to resolve path to 
specific mount point. No changes are expected in both classes. 

  was:
Defining following client mount table config is not supported in  INodeTree and 
will throw FileAlreadyExistsException

fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar

fs.viewfs.mounttable.link./foo=hdfs://nn02/foo

 


> Support nested mount points in INodeTree
> 
>
> Key: HADOOP-18193
> URL: https://issues.apache.org/jira/browse/HADOOP-18193
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: viewfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>
> Defining following client mount table config is not supported in  INodeTree 
> and will throw FileAlreadyExistsException
> fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar
> fs.viewfs.mounttable.link./foo=hdfs://nn02/foo
>  
> INodeTree has 2 methods that need change to support nested mount points.
> createLink(..)
> resolve(..)
>  
> ViewFileSystem and ViewFs referes INodeTree.resolve(..) to resolve path to 
> specific mount point. No changes are expected in both classes. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18193) Support nested mount points in INodeTree

2022-04-05 Thread Lei Yang (Jira)
Lei Yang created HADOOP-18193:
-

 Summary: Support nested mount points in INodeTree
 Key: HADOOP-18193
 URL: https://issues.apache.org/jira/browse/HADOOP-18193
 Project: Hadoop Common
  Issue Type: Improvement
  Components: viewfs
Affects Versions: 2.10.0
Reporter: Lei Yang


Defining following mount table config is not supported in INodeTree. 

fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar

fs.viewfs.mounttable.link./foo=hdfs://nn02/foo

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18193) Support nested mount points in INodeTree

2022-04-05 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HADOOP-18193:
--
Description: 
Defining following client mount table config is not supported in  INodeTree and 
will throw FileAlreadyExistsException

fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar

fs.viewfs.mounttable.link./foo=hdfs://nn02/foo

 

  was:
Defining following mount table config is not supported in INodeTree. 

fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar

fs.viewfs.mounttable.link./foo=hdfs://nn02/foo

 


> Support nested mount points in INodeTree
> 
>
> Key: HADOOP-18193
> URL: https://issues.apache.org/jira/browse/HADOOP-18193
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: viewfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>
> Defining following client mount table config is not supported in  INodeTree 
> and will throw FileAlreadyExistsException
> fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar
> fs.viewfs.mounttable.link./foo=hdfs://nn02/foo
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18193) Support nested mount points in INodeTree

2022-04-05 Thread Lei Yang (Jira)
Lei Yang created HADOOP-18193:
-

 Summary: Support nested mount points in INodeTree
 Key: HADOOP-18193
 URL: https://issues.apache.org/jira/browse/HADOOP-18193
 Project: Hadoop Common
  Issue Type: Improvement
  Components: viewfs
Affects Versions: 2.10.0
Reporter: Lei Yang


Defining following mount table config is not supported in INodeTree. 

fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar

fs.viewfs.mounttable.link./foo=hdfs://nn02/foo

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16518) KeyProviderCache close cached KeyProvider with Hadoop ShutdownHookManager

2022-03-28 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16518:

Fix Version/s: 2.10.0

> KeyProviderCache close cached KeyProvider with Hadoop ShutdownHookManager
> -
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.10.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> KeyProvider implements Closable interface but some custom implementation of 
> KeyProvider also needs explicit close in KeyProviderCache. An example is to 
> use custom KeyProvider in DFSClient to integrate read encrypted file on HDFS. 
> KeyProvider  currently gets closed in KeyProviderCache only when cache entry 
> is expired or invalidated. In some cases, this is not happening. This seems 
> related to guava cache.
> This patch is to use hadoop JVM shutdownhookManager to globally cleanup cache 
> entries and thus close KeyProvider using cache hook right after filesystem 
> instance gets closed in a deterministic way.
> {code:java}
> Class KeyProviderCache
> ...
>  public KeyProviderCache(long expiryMs) {
>   cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); 
> }{code}
> We could have made a new function KeyProviderCache#close, have each DFSClient 
> call this function and close KeyProvider at the end of each DFSClient#close 
> call but it will expose another problem to potentially close global cache 
> among different DFSClient instances.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16518) KeyProviderCache close cached KeyProvider with Hadoop ShutdownHookManager

2022-03-28 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16518:

Description: 
KeyProvider implements Closable interface but some custom implementation of 
KeyProvider also needs explicit close in KeyProviderCache. An example is to use 
custom KeyProvider in DFSClient to integrate read encrypted file on HDFS. 

KeyProvider  currently gets closed in KeyProviderCache only when cache entry is 
expired or invalidated. In some cases, this is not happening. This seems 
related to guava cache.

This patch is to use hadoop JVM shutdownhookManager to globally cleanup cache 
entries and thus close KeyProvider using cache hook right after filesystem 
instance gets closed in a deterministic way.
{code:java}
Class KeyProviderCache

...
 public KeyProviderCache(long expiryMs) {
  cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); 
}{code}
We could have made a new function KeyProviderCache#close, have each DFSClient 
call this function and close KeyProvider at the end of each DFSClient#close 
call but it will expose another problem to potentially close global cache among 
different DFSClient instances.

 

  was:
KeyProvider implements Closable interface but some custom implementation of 
KeyProvider also needs explicit close in KeyProviderCache. An example is to use 
custom KeyProvider in DFSClient to integrate read encrypted file on HDFS. 

KeyProvider  currently gets closed in KeyProviderCache only when cache entry is 
expired or invalidated. In some cases, this is not happening. This seems 
related to guava cache.

This patch is to use hadoop JVM shutdownhookManager to globally cleanup cache 
entries and thus close KeyProvider using cache hook right after filesystem 
instance gets closed in a deterministic way.
{code:java}
Class KeyProviderCache

...
 public KeyProviderCache(long expiryMs) {
  cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); 
}{code}
We could have made a new function KeyProviderCache#close, have each DFSClient 
call this function and close KeyProvider at the end of each DFSClient#close 
call but it will expose another problem to potentially close global cache among 
different DFSClient instances or make the KeyProvider unusable.

 


> KeyProviderCache close cached KeyProvider with Hadoop ShutdownHookManager
> -
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> KeyProvider implements Closable interface but some custom implementation of 
> KeyProvider also needs explicit close in KeyProviderCache. An example is to 
> use custom KeyProvider in DFSClient to integrate read encrypted file on HDFS. 
> KeyProvider  currently gets closed in KeyProviderCache only when cache entry 
> is expired or invalidated. In some cases, this is not happening. This seems 
> related to guava cache.
> This patch is to use hadoop JVM shutdownhookManager to globally cleanup cache 
> entries and thus close KeyProvider using cache hook right after filesystem 
> instance gets closed in a deterministic way.
> {code:java}
> Class KeyProviderCache
> ...
>  public KeyProviderCache(long expiryMs) {
>   cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getV

  1   2   3   4   >