I have a Linux kernel 4.4 system hosting a number of kvm VMs. Physical 
interface eth0 connects to an 802.1Q trunk port on an external switch. Each VM 
has a virtual interface (e1000 or virtio-net) connected to the physical NIC 
through a macvtap interface and a VLAN interface; traffic between the external 
switch and the host is tagged with a per-VM tag. The only logic is 
demultiplexing incoming traffic by VLAN tag and stripping the tag, and adding 
the tag for outgoing traffic. Other than that, the eth0-VM datapath is a dumb 
pipe.

eth0 is assigned an IP address for host applications to send and receive 
untagged packets. For example, here's the setup with 2 VMs.

        +- (untagged) 192.168.0.2
  eth0 -+- (tag 1) --- eth0.1 --- macvtap1 --- VM1
        +- (tag 2) --- eth0.2 --- macvtap2 --- VM2

Various iptables rules filter the untagged packets received for host 
applications. The last rule in the INPUT chain logs incoming packets that don't 
match earlier rules:

  -A INPUT -m limit --limit 10/min -j LOG --log-prefix FilterInput

This all works, but I see occasional FilterInput messages for traffic received 
on eth0.1 and eth0.2: so far, only DHCP packets with destination MAC address 
ff:ff:ff:ff:ff:ff.

  FilterInput IN=eth0.1 OUT= MAC=ff:ff:ff:ff:ff:ff:00:01:02:03:04:05:08:00 
SRC=0.0.0.0 DST=255.255.255.255 LEN=328 TOS=0x10 PREC=0x00 TTL=128 ID=0 
PROTO=UDP SPT=68 DPT=67 LEN=308

Even though these are IP packets, I naively expect packets received on the VLAN 
interface lacking IP address to be either consumed by the attached macvtap or 
dropped before they trigger an iptables filter INPUT rule. It's a bit alarming 
to see packets destined for a VM to be processed at all by the host IP stack.

Digging through the code, I find that the core packet receive function 
__netif_receive_skb_core() first gives master devices like bridges and 
macvlans/macvtaps a chance to consume the packet; otherwise the packet gets 
handled by all installed protocols like IPv4. The packet gets pretty far down 
the IP receive process before it's discovered that there's nowhere to route it 
to, and no local sockets to deliver it to. The iptables INPUT chain is invoked 
well before that happens. (As far as I can tell, there's no explicit check in 
the IP receive code whether a local interface has an IP address.)

The macvlan's rx_handler definitively consumes or drops unicast packets, 
depending on the destination MAC address. But for broadcast packets, it  passes 
the packet to the attached VM interface and also tells the core receive 
function to continue processing it. Presumably this is to allow a macvlan to 
attach to one or more VMs as well as have a local IP address.

The logic in the bridge driver is a bit different: it consumes all packets from 
the slave interface. This makes sense as only the bridge master interface can 
be assigned a local IP address.

However in my application, I'm setting up the macvtap interfaces in passthrough 
mode, which precludes assigning a local IP address, just like a bridge slave. 
So it stands to reason that for a macvlan in passthrough mode, its rx_handler 
should consume or drop all packets, and not allow broadcast packets to also be 
handled locally.

This one-line change seems to do the trick:

--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -411,7 +411,7 @@ static rx_handler_result_t macvlan_handle_frame(struct 
sk_buff **pskb)
        rx_handler_result_t handle_res;

        port = macvlan_port_get_rcu(skb->dev);
-       if (is_multicast_ether_addr(eth->h_dest)) {
+       if (is_multicast_ether_addr(eth->h_dest) && !port->passthru) {
                skb = ip_check_defrag(dev_net(skb->dev), skb, 
IP_DEFRAG_MACVLAN);
                if (!skb)
                        return RX_HANDLER_CONSUMED;

Well, mostly. I still see FilterInput log messages in the brief window between 
creating the VLAN interface and attaching the macvtap to it, since there's no 
rx_handler to consume them. Hooking the VLAN interface to a bridge rather than 
a macvtap suppresses local IP processing on the slave but enables it on the 
bridge master interface. Apparently any non-slave interface can handle IP 
traffic to some extent, even if it doesn't have an IP address.

I worry that allowing any IP processing at all on eth0-VM traffic is a 
potential security hole, and I'm one configuration typo away from letting VM's 
traffic leak into another VM or a host application, and vice versa. And logging 
those FilterInput messages for non-local traffic just looks like sloppy 
security.

Is there some way to stop all local protocols from handling packets received on 
an interface--a protocol-agnostic equivalent of 
net.ipv6.conf.INTF.disable_ipv6? Would it be reasonable to implement one?

--Ed

Reply via email to