> On 30 Aug 2016, at 16:31, Mark Kettenis <mark.kette...@xs4all.nl> wrote:
> 
>> Date: Tue, 30 Aug 2016 07:48:09 +0200
>> From: Mike Belopuhov <m...@belopuhov.com>
>> 
>> On Tue, Aug 30, 2016 at 09:58 +1000, David Gwynne wrote:
>>> On Mon, Aug 29, 2016 at 08:30:37PM +0200, Alexander Bluhm wrote:
>>>> On Mon, Aug 29, 2016 at 07:10:48PM +0200, Mike Belopuhov wrote:
>>>>> Due to a recent change in -current the socket sending routine
>>>>> has started producing small data packets crossing memory page
>>>>> boundary.  This is not supported by Xen and kernels with this
>>>>> change will experience broken bulk TCP transmit behaviour.
>>>>> We're working on fixing it.
>>>> 
>>>> For the same reason some old i386 machines from 2006 and 2005 have
>>>> performance problems when sending data with tcpbench.
>>>> 
>>>> em 82573E drops to 200 MBit/sec output, 82546GB and 82540EM do only
>>>> 10 MBit anymore.
>>>> 
>>>> With the patch below I get 946, 642, 422 MBit/sec output performance
>>>> over these chips respectively.
>>>> 
>>>> Don't know wether PAGE_SIZE is the correct fix as I think the problem
>>>> is more related to the network chip than to the processor's page
>>>> size.
>>> 
>>> does this diff help those chips?
>>> 
>> 
>> This diff defeats the purpose of the sosend change by punishing
>> every other chip not suffering from the aforementioned problem.
>> Lots of packets from the bulk TCP transfer will have to be
>> defragmented for no good reason.

the sosend change that demonstrated the performance difference was going to 
punish all chips instead of just all em(4) chips.

the em diff is quick and simple so we can see if the driver could be fixed 
without having to revert sosend. if it does work on bluhms test systems, i was 
going to make the change only apply to the specific chips in question.

> 
> No, this em diff will still do proper scatter/gather.  It might
> consume more descriptors as it will use two descriptors for packets
> crossing a page boundary.  But the fact that we collect more data into
> an mbuf will actually reduce the number of descriptors in other cases.
> 
> Regarding the xnf(4) issue; I think any driver that can't properly
> deal with an mbuf crossing a page boundary is broken.  I can't think
> of any modern dma engine that can't handle that properly, or doesn't
> at least support scatter/gather of some sort.  There may be old crufty
> stuff though that can't deal with it, but those probably already have
> "bcopy" drivers.  Now there may be drivers that don't enforce the
> boundary properly.  Those will mysteriously stop working.  Will we be
> able to fix all of those before 6.1 gets released?

if a single packet can use multiple descriptors but each descriptor cannot 
cross a page boundary, then bus_dma is able to represent that just fine. if xnf 
can only use a single descriptor per packet, then it deserves bcopy.

dlg

> 
>>> Index: if_em.c
>>> ===================================================================
>>> RCS file: /cvs/src/sys/dev/pci/if_em.c,v
>>> retrieving revision 1.331
>>> diff -u -p -r1.331 if_em.c
>>> --- if_em.c 13 Apr 2016 10:34:32 -0000      1.331
>>> +++ if_em.c 29 Aug 2016 23:52:07 -0000
>>> @@ -2134,7 +2134,7 @@ em_setup_transmit_structures(struct em_s
>>>             pkt = &sc->sc_tx_pkts_ring[i];
>>>             error = bus_dmamap_create(sc->sc_dmat, MAX_JUMBO_FRAME_SIZE,
>>>                 EM_MAX_SCATTER / (sc->pcix_82544 ? 2 : 1),
>>> -               MAX_JUMBO_FRAME_SIZE, 0, BUS_DMA_NOWAIT, &pkt->pkt_map);
>>> +               MAX_JUMBO_FRAME_SIZE, 4096, BUS_DMA_NOWAIT, &pkt->pkt_map);
>>>             if (error != 0) {
>>>                     printf("%s: Unable to create TX DMA map\n",
>>>                         DEVNAME(sc));

Reply via email to