This bug is missing log files that will aid in diagnosing the problem.
While running an Ubuntu kernel (not a mainline or third-party kernel)
please enter the following command in a terminal window:

apport-collect 1763269

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable
to run this command, please add a comment stating that fact and change
the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the
Ubuntu Kernel Team.

** Changed in: linux (Ubuntu)
       Status: New => Incomplete

** Tags added: bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1763269

Title:
  Mellanox [mlx5] [bionic] UBSAN: Undefined behaviour in
  ./include/linux/net_dim.h

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  We see UBSAN: Undefined behaviour in ./include/linux/net_dim.h:243:6
   we saw the following trace during traffic in the regression:

  [12885.292500] UBSAN: Undefined behaviour in ./include/linux/net_dim.h:243:6
  [12885.296358] signed integer overflow:
  [12885.300100] 358869104 * 100 cannot be represented in type 'int'
  [12885.304001] CPU: 2 PID: 19630 Comm: sock_stream_tes Tainted: G           
OE    4.15.0-rc8-for-upstream-dbg-2018-01-25_19-31-23-61 #1
  [12885.311856] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu2 04/01/2014
  [12885.316091] Call Trace:
  [12885.320234]  <IRQ>
  [12885.324366]  dump_stack+0xd1/0x159
  [12885.328586]  ? dma_virt_map_sg+0x147/0x147
  [12885.332804]  ? val_to_string.constprop.4+0x88/0xd1
  [12885.337055]  ubsan_epilogue+0x9/0x49
  [12885.341345]  handle_overflow+0x15e/0x189
  [12885.345636]  ? __ubsan_handle_negate_overflow+0x108/0x108
  [12885.349891]  ? kvm_clock_read+0x1f/0x30
  [12885.354230]  ? ktime_get+0x18d/0x280
  [12885.358654]  ? getrawmonotonic64+0x320/0x320
  [12885.363116]  ? mark_lock+0x1cf/0xc50
  [12885.367624]  ? inet_recvmsg+0x121/0x4a0
  [12885.372114]  mlx5e_napi_poll+0x1199/0x15c0 [mlx5_core]
  [12885.376774]  ? mlx5e_rx_dim_work+0x160/0x160 [mlx5_core]
  [12885.381406]  ? print_irqtrace_events+0x120/0x120
  [12885.385907]  ? mark_held_locks+0x93/0x100
  [12885.392099]  ? print_irqtrace_events+0x120/0x120
  [12885.396589]  ? trace_hardirqs_on_caller+0x206/0x390
  [12885.401278]  ? kasan_slab_free+0x87/0xc0
  [12885.406000]  ? pvclock_clocksource_read+0x146/0x280
  [12885.410608]  ? mark_held_locks+0x71/0x100
  [12885.415251]  net_rx_action+0x58c/0x10a0
  [12885.419873]  ? napi_complete_done+0x3d0/0x3d0
  [12885.424385]  ? check_chain_key+0x150/0x260
  [12885.428784]  ? debug_check_no_locks_freed+0x200/0x200
  [12885.433041]  ? match_held_lock+0x8a/0x4f0
  [12885.437215]  ? match_held_lock+0x8a/0x4f0
  [12885.441249]  ? lock_downgrade+0x3e0/0x3e0
  [12885.445151]  ? do_raw_spin_unlock+0x14d/0x230
  [12885.448970]  ? save_trace+0x1f0/0x1f0
  [12885.452664]  ? save_trace+0x1f0/0x1f0
  [12885.456224]  ? match_held_lock+0xa2/0x4f0
  [12885.459668]  ? pvclock_clocksource_read+0x146/0x280
  [12885.463085]  ? save_trace+0x1f0/0x1f0
  [12885.466361]  ? preempt_count_sub+0x14/0xd0
  [12885.469566]  ? __lock_is_held+0x5d/0x110
  [12885.472665]  ? preempt_count_sub+0x14/0xd0
  [12885.475653]  ? __lock_is_held+0x5d/0x110
  [12885.478529]  ? mark_lock+0x1cf/0xc50
  [12885.481276]  ? match_held_lock+0xa2/0x4f0
  [12885.483984]  ? print_irqtrace_events+0x120/0x120
  [12885.486679]  ? save_trace+0x1f0/0x1f0
  [12885.490891]  ? irq_exit+0x150/0x150
  [12885.493454]  ? __napi_schedule+0x1ae/0x220
  [12885.495936]  ? netdev_master_upper_dev_link+0x20/0x20
  [12885.498402]  ? check_chain_key+0x150/0x260
  [12885.500774]  ? __tasklet_schedule+0x22/0xf0
  [12885.503086]  ? match_held_lock+0xa2/0x4f0
  [12885.505431]  ? mlx5_eq_int+0x821/0xb50 [mlx5_core]
  [12885.507775]  ? save_trace+0x1f0/0x1f0
  [12885.510082]  ? pvclock_clocksource_read+0x146/0x280
  [12885.512416]  ? pvclock_read_flags+0x80/0x80
  [12885.514705]  ? save_trace+0x1f0/0x1f0
  [12885.516995]  ? __handle_irq_event_percpu+0x1b0/0x800
  [12885.519305]  ? __lock_is_held+0x5d/0x110
  [12885.521630]  __do_softirq+0x248/0xba9
  [12885.523913]  ? __irqentry_text_end+0x1f8a70/0x1f8a70
  [12885.526234]  ? pvclock_clocksource_read+0x146/0x280
  [12885.528563]  ? pvclock_read_flags+0x80/0x80
  [12885.530843]  ? do_raw_spin_trylock+0x120/0x120
  [12885.533178]  ? kvm_clock_read+0x1f/0x30
  [12885.535432]  ? kvm_sched_clock_read+0x5/0x10
  [12885.537702]  ? sched_clock_cpu+0x14/0x1f0
  [12885.539968]  irq_exit+0xf4/0x150
  [12885.542186]  do_IRQ+0xe8/0x1e0
  [12885.544390]  common_interrupt+0xa2/0xa2
  [12885.546607]  </IRQ>
  There is int overflow in:
  include/linux/net_dim.h 
  #define IS_SIGNIFICANT_DIFF(val, ref) \
  (((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */

  
  The include/linux/net_dim.h library in new in kernel 4.16, in 4.15 kernel 
this code was in drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 

  The upstream fix that fix this issue is 
  commit f97c3dc3c0e8d23a5c4357d182afeef4c67f5c33
  Author: Tal Gilboa <ta...@mellanox.com>
  Date:   Thu Mar 29 13:53:52 2018 +0300

      net/dim: Fix int overflow

      When calculating difference between samples, the values
      are multiplied by 100. Large values may cause int overflow
      when multiplied (usually on first iteration).
      Fixed by forcing 100 to be of type unsigned long.

      Fixes: 4c4dbb4a7363 ("net/mlx5e: Move dynamic interrupt coalescing code 
to include/linux")
      Signed-off-by: Tal Gilboa <ta...@mellanox.com>
      Reviewed-by: Andy Gospodarek <go...@broadcom.com>
      Signed-off-by: David S. Miller <da...@davemloft.net>

  diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
  index bebeaad..29ed8fd 100644
  --- a/include/linux/net_dim.h
  +++ b/include/linux/net_dim.h
  @@ -231,7 +231,7 @@ static inline void net_dim_exit_parking(struct net_dim 
*dim)
   }

   #define IS_SIGNIFICANT_DIFF(val, ref) \
  -   (((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */
  + (((100UL * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */

   static inline int net_dim_stats_compare(struct net_dim_stats *curr,
                                          struct net_dim_stats *prev)


  Will sent a patch to Ubuntu kernel mailing list with a backported
  patch to the old location

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1763269/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to