Alex Still wrote:
We're using tcpdump to try to diagnose an NFS performance problem.
We're seeing a lot of these :
[..]
16:01:19.890794 IP nfs_server.nfs > client_46.1205729087: reply ERR 1448
16:01:19.890831 IP nfs_server.nfs > client_46.3664003007: reply ERR 1448
[..]
That seems to be an RPC error,
It seems to be one, but it isn't necessarily one.
The code to print that printed "reply OK {length}" if the field in the
packet that, *IF* it contains the beginning of an ONC RPC reply, would
be the reply code is 0, and "reply ERR {length}" if it's not zero.
However, not all packets from an ONC RPC server (such as an NFS server)
necessarily contain the beginning of an ONC RPC reply, as a reply (or
request) can require more than one link-layer packet - for example, an
NFS READ reply that returns more bytes that fit in one Ethernet packet
will, when sent over Ethernet, take more than one Ethernet packet.
If the request or reply is coming over UDP, that's handled by IP
fragmentation; the first packet of the reply will be the first fragment,
i.e. it will have a fragment offset of 0, and can be identified by
looking at the IP header. tcpdump only attempts to dissect the first
fragment as anything other than IP (for one thing, subsequent fragments
don't have UDP or TCP port numbers, so, unless it keeps track of the
fragments, it has no idea what port number any fragment other than the
first one was sent from or to), so it won't try to dissect the
subsequent fragments as if they contained the beginning of an ONC RPC
request or reply.
If, however, it's coming over TCP, that's handled by the RPC
implementation, as TCP merely provides a byte stream, and has no idea
what the message boundaries are for the protocols that run atop it.
Therefore, the current tcpdump code to dissect RPC requests and replies
doesn't identify TCP segments that are in the middle of an ONC RPC
request or reply, and tries to dissect them as if they were at the
beginning of the request or reply.
This can cause packets to be mis-dissected; the apparent RPC errors
you're seeing are probably examples of that.
That part of the code
has changed in tcpdump-3.9.7, which we installed, and now gives :
16:01:19.890794 IP nfs_server.nfs > client_46.1205729087: reply Unknown rpc
response code=3128327487 1448
16:01:19.890831 IP nfs_server.nfs > client_46.3664003007: reply Unknown rpc
response code=910276799 1448
It now prints the value of the field that, if the packet really *is* an
RPC error, would be the error code (the code is derived from from
NetBSD's version of tcpdump). If the packet *isn't* an RPC error, the
value at the location that's interpreted as a reply code has a very good
chance of being bogus, as was the case here.
I don't understand what's happening here. Is it something wrong with our NFS
setup,
No.
or I'm misunderstanding the tcpdump output ?
Yes.
Wireshark/TShark should do a better job of this, for two reasons:
1) it does reassembly of ONC RPC requests and replies over TCP, as well
as handling TCP segments with multiple requests or replies;
2) it does some heuristics to check whether a packet *is* an ONC RPC
request or reply.
1) would probably be a major undertaking to do in tcpdump, but 2) could
be done:
for requests, check whether the RPC version field (not the NFS version)
has the value 2 (neither Sun nor anybody else have released any version
of ONC RPC other than 2), and check whether the program number in the
request is the program number for NFS (note that Sun also run another
RPC protocol on port 2049 - a protocol to allow access to POSIX ACLs
over NFS - which will be misdissected if interpreted as NFS);
for replies, check whether the transaction ID of the reply matches the
transaction ID of a request we've already seen (tcpdump already does
that matching for replies without errors, as it needs to know what type
of request the reply is for).
I'll see whether that could be done.
-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.