Hello!
> However, could we not have dev_queue_xmit behave as such (not free
> frame on failure)?
If you need to hold original skb, you may hold its refcnt.
However, this feature inevitably results in big troubles: dev_queue_xmit()
is allowed to change skb and you cannot assume anything about
Hello!
SOme short comment on the patch:
> - dev_queue_xmit(skb);
> + /* The skb we are to transmit may be a copy (see above). If
> + * this fails, then the caller is responsible for the original
> + * skb, otherwise we must free it. Also if this fails we must
> + * free
Hello!
> Soft irqs should definitely not be much heavier than an irq handler,
> if they are then we have implemented them wrongly somehow.
For example, all the networking nicely fits to this class. :-)
Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body
Hello!
> 2) no significant restrictions (==this)
When user asks to create some object, the only required thing
of any reasonable interface is to return an error when the object
is not added.
KAME's one is broken, ours is _one_ of right ones.
Another example of bad mistake is mine: I have mad
Hello!
> Note that using dev->name during probe was always incorrect. Think
> about the error case:
...
> So, using interface name in this manner was always buggy because it
> conveys no useful information to the user.
I used to think about cases of success. 8)
In any case the question follows:
Hello!
> Jeff has introduced `alloc_etherdev()' which allocates storage
> for a netdev but doesn't register it. The one quirk with this
> approach (and why it's vastly simpler than my thing)
I do not see where it is simpler. The only difference is that
name is unknown. 8)
> Not many drivers
Hello!
> Hmmm... I don't see how not touching buffer values can solve his
> problem at all. His MTU is really HUGE, and in this case 300 byte
> packet eats 10k or so space in receive buffer.
Default rcvbuf is ~64K, it is enough to receive up to mtu of a bit less 64K.
When application says rcvbu
Hello!
> > Any suggestions on heuristics for this ?
Not to set rcvbuf to ridiculously low values. The best variant is not
to touch SO_*BUF options at all.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majord
Hello!
> I believe these events get sent to the cardmgr daemon and it does
> all the ifconf magic to change the device state.
Compare this also to the situation with netif_present().
After Linus said that it is called from thread context, I prepared
corresponding code for netif_present (and for
Hello!
> It appears you can add _exactly_ same IPv6 address on an interface many
> times:
Yes. BTW, look here:
kuznet@dust:~ # ip -6 a ls sit0
7: sit0@NONE: mtu 1480 qdisc noqueue
inet6 ::127.0.0.1/96 scope host
inet6 ::193.233.7.100/96 scope global
inet6 ::193.233.7.100/96
Hello!
> If send_head doesn't point to skb then it is before it (and it cannot
> advance under us of course because we hold the sock lock) and so in such
> case we didn't clobbered the send_head at all in skb_entail, and so we
> don't need to touch send_head in order to undo (we only need to unli
Hello!
> zero and we are running in such slow path, it is obvious the send_head
> _was_ NULL when we entered the critical section, so it's perfectly fine
It is not only not obvious, it is not true almost always.
On normally working tcp send_head is almost never NULL,
it is NULL only when applica
Hello!
> this is the strict fix:
Andrea, you caught the problem!
The fix is not right though (it is equivalent to straight
tp->send_head=NULL, as you noticed. It also corrupts queue in
an opposite manner.) Right fix is appended.
Explanation: in do_fault we must undo effect of enqueueing new se
Hello!
> My current theory is that tcpblast does something erratic when the
> error occurs.
It has buffer size of 32K, so that it faults at enough large chunk sizes.
Erratic errno is because this applet prints errno on partial write.
Oops is apparently because I did something wrong in do_fault
Hello!
> Im my case P-MTU discovery
Sorry, I lied. Not pmtu discovery but exaclty opposite effect
is important here: collapsing of small frames to larger ones.
Each such merge results in loss of 1 "sack" in 2.2.
> I only wrote that it was active when got stuck. It may be idle before -
> I do
Hello!
> 1. for tp_frame_size, I dont want to truncate any data on ethernet, I
> need 1514 bytes, is this the best way to do it and not waste space?
To select small snapsize (obtained from later experiments),
to set PACKET_COPY_THRESH to read larger packets via recvmsg().
> 2. what is tp_block_
Hello!
>mtu 382 + keepalive yes -> loss
>mtu 382 + keepalive no -> ok
Well, I ignored this because it looked as full sense. Sorry. 8)
> such a picture? If the answer is "yes", I am almost satisfied. :-)
No, the answer is strict "no". Until keepalive is triggered the first
time, it ca
Hello!
> If your model does not cover such situation, pls, take it in mind. :)
Taken.
Is the machine UP? The only other known dubious place is smp specific...
BTW if that cursed socket is still alive, try to make the experiment
with filling window on it. It must stuck, or my theory is complet
Hello!
> In my experiments linux simply sets mss=mtu-40 at the start of ethernet
> connections. I do not know why, but belive it's ok. How the version of
> kernel and configuration options can affect mss later?
You can figure out this yourself. In fact you measured this.
With mss=1460 the pr
Hello!
> At last, I tried several MTUs on 3d computer, running "right" 2.2.17, and
> could not find conditions, under which any loss of ACKs can be detected.
8)8)8)
ppp also inclined to the mss/mtu bug, it allocates too large buffers
and never breaks them. The difference between kernels looks
Hello!
> > If my guess is right, you can easily put this socket to funny state
> > just catting a large file and kill -STOP'ing ssh. ssh will close window,
> > but sshd will not send zero probes.
>
> [1] I have checked your statement on 2 different machines, running 2.2.17.
> No confirmation.
Hello!
> In brief: a stale state of the tcp send queue was observed for 2.2.17
> while send-q counter and connection window sizes are not zero:
I think I pinned down this. The patch is appended.
> diagnostic, I'll try to get it. In any case, I plan to run something through
> this connecti
Hello!
> Btw, you don't schedule the ksoftirqd thread if do_softirq() returns
> from the 'if(in_interrupt())' check.
ksoftirqd will not be switched to before the first schedule
or ret form syscall, when softirqs will be processed in any case.
So, wake up in this case would be mistake.
> I assu
Hello!
> But with a huge overhead. I'd prefer to call it directly from within the
> idle functions, the overhead of schedule is IMHO too high.
+ if (current->need_resched) {
+ return 0;
+ }
+ if (softirq_active(smp_processor_id()) & softi
Hello!
> empty, except for occasional ACKs. The utilization of the channel is about 4%.
1. tcpdump is required.
2. exact vesion of used kernel is required too.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More maj
Hello!
> packet in the queue. No other conditions i found. But i need repeatedly test
> the top packet in the queue.
>
> How to accomplish it?
Look into sch_tbf.c for example. Hint: timer.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message
Hello!
> + if (softirq_active(smp_processor_id()) & softirq_mask(smp_processor_id())) {
> + do_softirq();
> + return 0;
BTW you may delete do_softirq()... schedule() will call this.
> + *
> + * Isn't this identical to default_idle with the 'no-hlt' boot
> + * option
Hello!
> I want to bind to non-local IP and send/receive UDP packets.
This is impossible, apparently.
> but in tcp_v4_connect:
> tmp = ip_route_connect(&rt, nexthop, sk->saddr,
> RT_TOS(sk->ip_tos)|RTO_CONN|sk->localroute, sk->bound_dev_if);
> ^^
Hello!
> Hm. But comment in linux/skbuff.h says:
The comment is about more difficult case: transmit path,
where cb is used both by top level protocol and lower layers:
f.e. TCP -> IP -> device. cb is dirty from the moment of skb
creation in this case.
Also, note that the second sentence in the
Hello!
>For now I workarounded it with filling skb->cb with zeroes before
>netif_rx(),
This is right. For another examples look into tunnels.
> but I believe it is a kludge and networking layer should be fixed instead.
No.
alloc_skb() creates skb with clean cb. ip_rcv() and other prot
Hello!
> Well, since I moved the rsync to 5pm, and then back to 9pm, I haven't
> seen this problem - everything is again working as expected (touch wood)
> with 2.2.15pre13 and 2.4.0.
>
> This is odd, since it wasn't a one-off problem, but something that happened
> each and every day of a partic
Hello!
> Sure, workarounds exist, but they just complicates
> things.
Working around --- what?
An example of application hitting the case is enough to make
me completely agreed.
But genarally we are not going to match any os and even yourselves
yesterday or tomorrow in the cases when behaviour
Hello!
> True, this behavior was changed from 2.2.x. We now match the behavior
> of other svr4 systems, in particular Solaris.
Damn, we did not test behaviour on absolutely new clean never
connected socket... Solaris really may return 0 on it.
However, looking from other hand the issue looks a
Hello!
> freebsd-4.0 doesn't use direct transfers for PAGE_SIZE'd pipe write()s:
> it uses MINDIRECT=8192.
I see.
> (and PIPE_BUF is 512, so 4096 was possible for
> them)
8) I see.
Thank you for patience. 8)
Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Hello!
> freebsd
Very funny, the idea is borrowed from there.
As you could understand your patch kills it. PAGE_SIZE is one of the most
frequently used transfer unit.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Hello!
> It returns immediately on all unix platforms I tested
I see. It is essential moment. PAGE_SIZE was really bad threshold value.
Sigh and alas.
Alexey
PS BTW "all unix" is unlikely to include freebsd. 8)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the b
Hello!
> * davem's patch breaks apps that assume that write(,PIPE_BUF) after
> poll(POLLOUT) never blocks, even for blocking pipes.
Pardon, but PIPE_BUF <= PAGE_SIZE yet, so that fears have no reasons.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body
Hello!
> I've scrolled through various code in net/ipv4, and I can't see how to query
> the TOS of an incoming TCP stream (or at the least, the TOS of the SYN which
> initiated the connection).
No way. Formally it is IP_RECVTOS, followed by IP_PKTOPTIONS.
But getting TOS via IP_PKTOPTIONS is no
Hello!
> The manual specifies the following flag to be returned by the
> kernel
> > #define POLLHUP 0x0010/* Hung up */
>
> Hanging up is ambiguous. Does it mean that the client is dead,
> that he closed his end of the socket, or that he shut down one or
> both directions of the data flo
Hello!
> this kernel was compiled with GCC 2.95.2,
This is a hint.
Could you make the following things:
1. to disassemble tcp_poll() (the easiest way is to gdb vmlinux, to
say x/i tcp_poll and to hold enter pressed long enough, copying screen
to file) and to send the result to me.
2. to
Hello!
> same means its not the same bug?
It is the same, I think.
> If you still insist that it is purely a 2.2.15pre13 bug
I never said this. I said that your strace is _wrong_, how can I be
sure that tcpdump is not wrong too? You could understand this. 8)
> together to put 2.2.18 on this
Hello!
> I've also reported
The report by Scott Laird is sane unlike your one.
It can be explained by bug rather than only by poltergeist. 8)
> Thanks for confirming that 2.2.15pre13 is not the cause.
Russel, you are warned that kernels<2.2.17 and rsync is an incompatible
combination.
Alexey
Hello!
> They know that iMimic's polymix performance on Linux 2.2.* is half what it is on
> BSD.
What is "iMimic's polymix"? I am almost sure, it is simply buggy
and was not _debugged_ under linux.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of
Hello!
> I'll see if I can strace it from the start until it hangs tomorrow.
Please...
Also, try to make binary tcpdump.
> I was running at one point a 2.4.0-test kernel, but I didn't see these
Yes, it did not result in full stall. Lost wakeups were recovered
f.e. by any keyboard activity. 8
Hello!
> I've seen hanging rsync over ssh more than once, while sending much data
> from an x86 running Linux (late 2.3.x) to Sparc/Solaris2.5.1
I remember this your report. However, recent news force to suspect
that the reason was in Solaris yet. Actually, if you send tcpdump of
failed session,
Hello!
> netstat on isdn-gw shows the following:
>
> Proto Recv-Q Send-Q Local Address Foreign Address State
> tcp72868 0 isdn-gw.piltdown.a:1023 pilt-gw.piltdown.at:ssh ESTABLISHED
plus
> select(4, [3], [3], NULL, NULL) = 2 (in [3], out [3])
> Ma
Hello!
> > 3) Enforce correct usage of it in all the networking :-)
>
> ,) -- the tricky part.
No tricks, IP[v6] is already enforced to be clever; all the rest are free
to do this, if they desire. And btw, driver need not to parse anything,
but its internal stuff and even aligning eth II header
Hello!
> Feb 23 12:42:30 rcc2 kernel: Warning: kfree_skb passed an skb still on a list (from
>c01f58dc).
BTW, that's didactic example of bug which results in similar behaviour.
Alexey
> From: [EMAIL PROTECTED] (Andrew Morton)
> Subject: Re: Failed assertion
> Date: 27 Feb 2001 04:15:01 +0300
Hello!
> > Yes its a SHOULD in RFC1122, but in any normal environment pretty much a
> > must and I know of no stack significantly violating it.
>
> I didn't know there was such a thing as a normal environment :)
Jokes apart, such "normal" environments are rare today.
>From tcpdumps it is clear
Hello!
> OK! I actually expected 2.4 to be somewhat selftuning.
Defaults for these numbers (X,Y,Z) are very conservative.
> Interesting you say that, I looked at the logs and I see over 5000 sockets
> used, does'nt look peaceful to me. But you are absolutely right about the
> orphans. The erro
Hello!
> of errors a bit but I'm not sure I fully understand the implications of
> doing so.
Until these numbers do not exceed total amount of RAM, this is exactly
the action required in this case.
Dumps, which you sent to me, show nothing pathological. Actually,
they are made in some period of
Hello!
> We are implementing an IP stack.
Alan, please, tell me what is wrong. And we will repair this.
The implementation follows RFCs and even relaxes their requirements
in the cases, when they are far from reality.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-ker
Hello!
> You are right - our sendfile() implementation is broken. I have fixed it
Thank you!
> Investigation shows that the Linux network layer is behaving oddly. It
> seems that we are writing 4096 bytes to a socket. This proceeds in 4096
> byte chunks until the send buffer on the socket is f
Hello!
> Please cite an exact RFC reference.
Imagine, I found this reference yet. This is rfc1191, of course. 8)
in the MSS option. The MSS option should be 40 octets less than the
size of the largest datagram the host is able to reassemble (MMS_R,
as defined in [1]); in many cases, t
Hello!
> This smells bad. Datagram protocol send sizes are only limited by
> socket buffer size, nothing more. Fragmentation makes it work.
The thread was started from the observation that fragmented frames
do _not_ pass through router. See? 8)
Path mtu discovery exists exactly to help to sol
Hello!
> Wouldn't it be simpler to just fix the bugs
There are no bugs.
There is phylosophical discussion about current state of internet
communications.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordom
Hello!
> Message size != MTU.
Alan, you misunderstand _sense_ of the problem.
Fragmentation does _not_ work on poor internet more. At all.
Look at original report. It failed _only_ because his intemediate
node failed to forward fragmented packets.
Alexey
-
To unsubscribe from this list: send t
Hello!
> .. unless that page was partially written, in which case a short write
> count is returned (rather than a timeout error), and the loop goes around
> again.
sendfile() does not return on partial write and tries to push more
until error. On fast link it most likely succeeds, so that it is
Hello!
> So the actual timeout would be 2 * SO_SNDTIMEO.
It will timeout if write of some page blocks for SO_SNDTIMEO.
If transmission of any page never takes more than SO_SNDTIMEO it never
times out.
You can think about sendfile() as subroutine doing:
for (;;) {
read(4
Hello!
> Yes. The 5.6.7.8 machine is connected to the Internet via a Linksys
> "router" that is also performing masquerade.
>
> I will be very angry if this turns out to be the culprit.
I am afraid it is. It corrupts packets preserving their checksum.
Look:
> Trace taken from 1.2.3.4 machi
Hello!
> Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile():
None of the options apply to sendfile(). It is not socket level
operation. You have to use alarm for it.
BTW, if you have enough fast network, you probably can observe
that sendfile() is even not interrupted by signals. 8)
Hello!
> Unfortunately, it seems to be very buggy. Here are two buggy scenarios.
--- ../vger3-010210/linux/net/ipv4/tcp.cSat Feb 10 23:16:51 2001
+++ linux/net/ipv4/tcp.cSat Feb 17 23:27:43 2001
@@ -691,6 +691,8 @@
set_current_state(TASK_INTERRUPTIBLE);
+
Hello!
> I ran DNS reliably over AX.25 networks. They have an MTU of 216. They work.
Please, Alan, distinguish two things: "works" and "works, until
I ask X". The second is equal to "does not".
512 is maximal message size, which is transmitted without troubles,
hardwired to almost all the datag
Hello!
> Maybe someone want to say me what does it mean and how serious it is?
It means that debugging messages are still not disabled in 2.4.x 8)
> Any fixes?
These ones can be ignored.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message
Hello!
> Please cite an exact RFC reference.
No need to cite RFC, this is plain sillogism.
A. Datagram protocols do not work with mtus not allowing to send
512 byte frames (even DNS).
B. Accoutning, classification, resource reervation does not work on
fragmented packets.
-> IP suite is n
Hello!
> Kernel 2.4.x apparently disregards my ppp options MTU setting of 552
> and sets mss=536 (=> MTU=576).
Yes, default configuration is not allowed to advertise mss<536.
The limit is controlled via /proc/sys/net/ipv4/route/min_adv_mss,
you can change it to 256.
Default of 536 is sadistic (
Hello!
> Then we did the following tuning attempt:
Please, cat /proc/net/tcp and /proc/net/sockstat
and send result to me (gzipped), together with /proc/sys/net/ipv4/tcp_*
values
> echo "20480" > /proc/sys/net/ipv4/tcp_max_orphans
Also, you want to increase memory allowed for TCP
echo "X/2 X/
Hello!
> I'm not seeing shutdown(2) block on a TCP socket. This is Linux kernel
> 2.2.16 (RH7.0). Is this a kernel bug, a documentation bug,
Man page is wrong.
What's about kernel... Hmm, actually, it is worth to test genuine bsd.
Such feature could be useful.
Alexey
-
To unsubscribe from this
Hello!
> Here is a patch which may not solve the underlying
This does not. refcnt cannot be <1 at this point.
> assuming that the latter messages aren't serious?
They are fatal. Machine must be rebooted after them.
> I hope the networking gurus can find the real bugs here.
Well, someone fo
Hello!
> 20:56:26.172532 ppp0 < mc105-v-2.royaume.com.6699 > ppp12.shiny.it.33148: .
> 88073:89533(1460) ack 77 win 8684 (DF)
> Ok, it has just received the missing part, so why it does not ack 98313 ?
Apparently, because this segment has not been received.
Look at checksum errros statisti
Hello!
>/* Not-A-Trick#2 : Classic rule... */
>if (tcp_fackets_out(tp) > tp->reordering)
> ^
>return 1;
...
> Shouldn't it be a >= instead of > ?
No. fackets_out is equivalent of Reno dupacks+1.
F.e. look at the most common case, where FACK is equiv
Hello!
> > How close is TCP_NOPUSH to behaving identically to TCP_CORK now?
They have not so much of common.
TCP_NOPUSH enables T/TCP and its presense used to mean that
T/TCP is possible on this system. Linux headers cannot
even contain TCP_NOPUSH.
Alexey
-
To unsubscribe from this list: send
Hello!
> Looking at net/core/datagram.c:wait_for_packet the code will return 0
Yep... Damn, specially split errno and ready values and forgot to use
this. 8) Sorry.
Alexey
--- ../vger3-010130/linux/net/core/datagram.c Thu Dec 28 22:44:08 2000
+++ linux/net/core/datagram.c Thu Feb 1 22:4
Hello!
> 1. small socket send/receive buffers result in data not being transferred
I know why your test does not work in 2.4 (sort of bug, fix is appended).
But I have no idea, why it does not work with 2.2. Please, make
tcpdump in this case.
> and SO_RCVBUF (see small attached programs), I f
Hello!
> Has anyone decided to code a SFB (Stochastic Fair Blue) queue implementation
> for Linux?
I did not hear anything about this.
> (http://www.eecs.umich.edu/~wuchang/blue/). The paper for it shows it
> performing very well in comparison to RED.
Yes, the algorithm looks interesting.
Ale
Hello!
> verify this? The only way I can think of is to verify that the checksum
> field is zero initially, correct?
It is not zero. It contains checksum of pseudoheader.
> fits the new Linux model a bit better, as it has one descriptor per
> packet, not one per fragment (like the current impl
Hello!
> Why is it a bug to accept the ACK from it? RFC793 page 69 says
>
> If the RCV.WND is zero, no segments will be acceptable, but
> special allowance should be made to accept valid ACKs, URGs and
> RSTs.
8) This obscure place is discussed for ages. The question is:
What is "
Hello!
> drivers use it at this time, I see a grand total of 2 (hamachi and hme) in
Plus acenic in zerocopy.
Plus patch to do this is available for eepro100.
> I'm just wondering, if a card supports sg but *not* TX csum, is it worth
> it to make use of sg? eepro100 falls into this category..
Hello!
> Starfire card does, maybe the 3com is different. :-)
3com _is_ different. 8)
I is not an issue, we do not make zerocopy on IP fragments.
> Are we even bothering with the partial checksums at this point, or
> are we falling back to CPU checksumming if the packet is fragmented?
Of cou
Hello!
> > What exaclty were the issues with the intel cards and sg+csum?
> >
> > Any idea how much work it'd require to surmount them?
>
> Getting Intel to release full specs on how to make use of
> TX hardware checksum assist with the eepro100.
It simply does not exist for 82559* in all t
Hello!
> no problems. I simply mounted an NFS server with rsize=wsize=8192
> and read a few files - I assume this is sufficient?
This is orthogonal.
Only TCP uses this and you need not to do something special
to test it. Any TCP connection going through 3c tests it.
> rather than using the I
Hello!
I take my words back. Manfred is right, this requirement is not a MUST.
Real problem is much worse, and it is wholly on the shame of solaris.
Tcpdump shows at least two different bugs there.
2060 16:31:42.879337 eth0 < dynamic.ih.lucent.com.39406 > static.8664: . 675
80:67580(0) ack
Hello!
> must be). Is there another RFC?
It is exactly this place.
As soon as BSD uses this feature, it is must for us.
> Could you check what happened in line 2066 of this tcpdump?
> 2066 16:31:43.108759 eth0 > static.8664 > dynamic.ih.lucent.com.39406:
> . 1583720:1583720(0) ack 69041
Hello!
> I read through the tcpdump, and it seems that Linux completely ignores
> packets with out-of-window sequence numbers:
Yes, Linux is __very__ not right doing this. RFC requires to accept
ACK, URG and RST on any segment adjacent to window, even if window
is zero.
Solaris also does thing,
Hello!
> So now the question is: when does this new nagle algorithm delay packets in the
> write queue? It _must_ do something, otherwise TCP_NODELAY would obviously be a
> noop.
It allows _one_ incomplete segment to fly. Minshall and BSD behave absolutely
similarly in all the curcumstances exce
Hello!
> "struct page" tricks, some macros etc WILL NOT WORK. In particular, we do
> not currently have a good "page_to_bus/phys()" function. That means that
> anybody trying to do DMA to this page is currently screwed, simply because
> he has no good way of getting the physical address.
We alre
Hello!
> > write(10*MSS)
> > write(1)
> > write(1)
...
> As far as I can tell, the second "write(1)" will always merge with the
> first one
This would be true, if Andrea wrote not exactly 10*MSS,
but 10*MSS+1 or just write().
In some exceptional situations (sort of writi
Hello!
> So this mean if I do:
Yes. It is cost, which we have to pay. Look into Minshall's draft,
by the way (draft-minshall-nagle-*), it discusses pros and contras.
Much saner behaviour wrt latency (and perfect clarity) overweights
a bit worse coalescing.
Alexey
-
To unsubscribe from this li
Hello!
> semantics of snd_sml), maybe it makes the difference but then I don't see how.
It makes. One small packet is allowed to fly, not depending on packets_out.
This is idea of Minshall.
"Classic" Nagle also does not prohibit this, but it is difficult
to formulate it in terms of presegmented
Hello!
> Actually, as long as there is no "struct page" there _are_ problems.
> This is why the NUMA stuff was brought up - it would require that there
> be a mem_map for the PCI pages.. (to do ref-counting etc).
I see.
Is this strong "no-no-no"? What is obstacle to allow "struct page"
to sit o
Hello!
> is there really
> much value in the second request flowing to the server before the first
> byte of the reply has hit?
Yes, of course, it has lots of sense: f.e. all the icons, referenced
parent page are batched to single well-coales
Hello!
> My argument applies to 2.4. The uncork _won't_ push on the wire the last
> not mss-sized fragment until it's the last one in the write queue even once
> cwnd and receiver window allows that. I think
Look at the code again. You misread it.
> wouldn't be setting nonalge unconditionally
Hello!
> It's about direct i/o from/to pages,
Yes. Formally, there are no problems to send to tcp directly from io space.
But could someone explain me one thing. Does bus-mastering
from io really work? And if it does, is it enough fast?
At least, looking at my book on pci, I do not understand
Hello!
> the business about the last 1100ish bytes of a 4096 byte send being
> delayed by nagle only implies that the stack's implementation of nagle
> was broken and interpreting it on a per-segment rather than a per-send
> basis.
+
> software, or the host TCP stack. otherwise, the persistent c
Hello!
> The "uncork" won't push the last skb on the wire if there is not acknowledged
> data in the write_queue and the payload of the last skb in the write_queue
> isn't large MSS. This because the `uncork' will only re-evaluate the
> write_queue in function of the _nagle_ algorithm, quite corr
Hello!
> I thought setsockopt is meant to set an option in the socket,
It is not.
setsockopt() is simply a bit more clever extension to ioctl(),
which is adapted (in bsd style though) to understand layering
and has an explicit length to data.
It is prefered for all the operations on sockets,
a
Hello!
> So if I understand all this correctly...
>
> The difference in ACK generation
CORK does not affect receive direction and, hence, ACK geneartion.
The problem is that TCP does not know, when full request is received
and it must ack instantly at connection start and after some idle
peri
Hello!
> Doing PUSH from setsockopt(TCP_CORK) looked obviously wrong because it isn't
> setting any socket state,
? 8)
> and also because the SIOCPUSH has nothing specific
> with TCP_CORK, as said it can be useful also to flush the last fragment of data
> pending in the send queue without havi
Hello!
> Ingo, you should realize
Ingo did not try to argue. I do not too.
This is right, no doubts.
> Mantra: "everything is a stream of bytes". Repeat until enlightened.
... but devil invented record marks and pushes, seduced mankind
and we was evicted from the paradise. 8)
Alexey
-
To un
Hello!
> I'm all for TCP_CORK but it has the disavantage of two syscalls for doing the
MSG_MORE was invented to allow to collapse this to 0 of syscalls. 8)
> A new ioctl on the socket should be able to do that (and ioctl looks ligther
> than a setsockopt, ok ignoring actually the VFS is grabbi
1 - 100 of 165 matches
Mail list logo