On 07/26/2012 11:55 PM, Pekka Riikonen wrote: > > On Thu, 26 Jul 2012, Alexander Duyck wrote: >> I suspect the reason why you are seeing so many interrupts is simply >> because you have so many queues. At 16 queues per port with 12 ports >> you are looking at 192 queues. If they are all active you are going to >> see hundreds of thousands of interrupts per second simply due to the >> fact that the work is spread so thin. You would likely be much better >> served by cutting the number of queues in half and possibly even >> disabling hyper threading on the system. >> > You have a good point. Indeed 192 queues is too many and when the > number of ports go up we should tweak the number of queues down. > There's couple of issues here, though. > > Currently, if we cap the queues to 8, it caps also the Tx queues to 8. > This is no-no for us, has always been. What we want is that each core > always has Tx queue available so that packets don't go to other core's > Tx queue (and doesn't take the spinlock of other core's Txq, etc.). > One of our modifications to ixgbe has been to always create > num_online_cpus() (or max 64) many Tx queues (in > ixgbe_set_rss_queues). When different ports are affinitized > differently, forwarding from one port to another still have Tx queue > available in the destination device. > > And because the ports are affinitized differently, another > modification (to ixgbe and kernel) is a Tx CPU rmap, similar to what > Linux has for Rx. This way we can just say > > if (dev->tx_cpu_rmap) > queue_index = cpu_rmap_lookup_index(dev->tx_cpu_rmap, > smp_processor_id()); > > and get the correct queue on that device. > > Why is the num_tx_queues by default always same as num_rx_queues in > ixgbe? > > Pekka The main reason is because of the ATR feature. It expects us to be able to receive packets on the same queue index as the queue we transmitted it on.
Another option if you cannot lower the number of queues would be to reduce the number of q_vectors. What you could do is modify ixgbe_set_interrupt_capability so that instead of limiting itself to the number of CPUs it simply limits itself to 8. This would cause the number of queues to remain the same, but reduce the number of interrupt vectors to 8. You could then distribute the interrupt vectors over both sockets and this would greatly help to reduce your interrupt workload on the system while still allowing you a transmit queue per CPU. Thanks, Alex ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired