--- /dev/null
+<?xml version="1.0"?>
+<html>
+ <body>
+ <h1 >Firewall and network filtering in libvirt</h1>
+ <p>There are three pieces of libvirt functionality which do network
+ filtering of some type.
+ <br /><br />
+ At a high level they are:
+ </p>
+ <ul>
+ <li>The virtual network driver
+ <br /><br />
+ This provides a isolated bridge device (ie no physical NICs
+ enslaved). Guest TAP devices are attached to this bridge.
+ Guests can talk to each other and the host, and optionally the
+ wider world.
+ <br /><br />
+ </li>
+ <li>The QEMU driver MAC filtering
+ <br /><br />
+ This provides a generic filtering of MAC addresses to prevent
+ the guest spoofing its MAC address. This is mostly obsoleted by
+ the next item, so won't be discussed further.
+ <br /><br />
+ </li>
+ <li>The network filter driver
+ <br /><br />
+ This provides fully configurable, arbitrary network filtering
+ of traffic on guest NICs. Generic rulesets are defined at the
+ host level to control traffic in some manner. Rules sets are
+ then associated with individual NICs of a guest. While not as
+ expressive as directly using iptables/ebtables, this can still
+ do nearly everything you would want to on a guest NIC filter.
+ </li>
+ </ul>
+
+ <h3><a name="name-fw-virtual-network-driver"
+ id="id-fw-virtual-network-driver">The virtual network driver</a>
+ </h3>
+ <p>The typical configuration for guests is to use bridging of the
+ physical NIC on the host to connect the guest directly to the LAN.
+ In RHEL6 there is also the possibility of using macvtap/sr-iov
+ and VEPA connectivity. None of this stuff plays nicely with wireless
+ NICs, since they will typically silently drop any traffic with a
+ MAC address that doesn't match that of the physical NIC.
+ </p>
+ <p>Thus the virtual network driver in libvirt was invented. This takes
+ the form of an isolated bridge device (ie one with no physical NICs
+ enslaved). The TAP devices associated with the guest NICs are attached
+ to the bridge device. This immediately allows guests on a single host
+ to talk to each other and to the host OS (modulo host IPtables rules).
+ </p>
+ <p>libvirt then uses iptables to control what further connectivity is
+ available. There are three configurations possible for a virtual
+ network at time of writing:
+ </p>
+ <ul>
+ <li>isolated: all off-node traffic is completely blocked</li>
+ <li>nat: outbound traffic to the LAN is allowed, but MASQUERADED</li>
+ <li>forward: outbound traffic to the LAN is allowed</li>
+ </ul>
+ <p>The latter 'forward' case requires the virtual network be on a
+ separate sub-net from the main LAN, and that the LAN admin has
+ configured routing for this subnet. In the future we intend to
+ add support for IP subnetting and/or proxy-arp. This allows for
+ the virtual network to use the same subnet as the main LAN and
+ should avoid need for the LAN admin to configure special routing.
+ </p>
+ <p>Libvirt will optionally also provide DHCP services to the virtual
+ network using DNSMASQ. In all cases, we need to allow DNS/DHCP
+ queries to the host OS. Since we can't predict whether the host
+ firewall setup is already allowing this, we insert 4 rules into
+ the head of the INPUT chain
+ </p>
+ <pre>
+target prot opt in out source destination
+ACCEPT udp -- virbr0 * 0.0.0.0/0 0.0.0.0/0 udp dpt:53
+ACCEPT tcp -- virbr0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:53
+ACCEPT udp -- virbr0 * 0.0.0.0/0 0.0.0.0/0 udp dpt:67
+ACCEPT tcp -- virbr0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:67</pre>
+ <p>Note we have restricted our rules to just the bridge associated
+ with the virtual network, to avoid opening undesirable holes in
+ the host firewall wrt the LAN/WAN.
+ </p>
+ <p>The next rules depend on the type of connectivity allowed, and go
+ in the main FORWARD chain:
+ </p>
+ <ul>
+ <li>type=isolated
+ <br /><br />
+Allow traffic between guests. Deny inbound. Deny outbound.
+ <pre>
+target prot opt in out source destination
+ACCEPT all -- virbr1 virbr1 0.0.0.0/0 0.0.0.0/0
+REJECT all -- * virbr1 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable
+REJECT all -- virbr1 * 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable</pre>
+ </li>
+ <li>type=nat
+ <br /><br />
+Allow inbound related to an established connection. Allow
+outbound, but only from our expected subnet. Allow traffic
+between guests. Deny all other inbound. Deny all other outbound.
+ <pre>
+target prot opt in out source destination
+ACCEPT all -- * virbr0 0.0.0.0/0 192.168.122.0/24 state RELATED,ESTABLISHED
+ACCEPT all -- virbr0 * 192.168.122.0/24 0.0.0.0/0
+ACCEPT all -- virbr0 virbr0 0.0.0.0/0 0.0.0.0/0
+REJECT all -- * virbr0 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable
+REJECT all -- virbr0 * 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable</pre>
+ </li>
+ <li>type=routed
+ <br /><br />
+Allow inbound, but only to our expected subnet. Allow
+outbound, but only from our expected subnet. Allow traffic
+between guests. Deny all other inbound. Deny all other outbound.
+ <pre>
+target prot opt in out source destination
+ACCEPT all -- * virbr2 0.0.0.0/0 192.168.124.0/24
+ACCEPT all -- virbr2 * 192.168.124.0/24 0.0.0.0/0
+ACCEPT all -- virbr2 virbr2 0.0.0.0/0 0.0.0.0/0
+REJECT all -- * virbr2 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable
+REJECT all -- virbr2 * 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable</pre>
+ </li>
+ <li>Finally, with type=nat, there is also an entry in the POSTROUTING
+chain to apply masquerading:
+ <pre>
+target prot opt in out source destination
+MASQUERADE all -- * * 192.168.122.0/24 !192.168.122.0/24</pre>
+ </li>
+ </ul>
+
+ <h3><a name="name-fw-network-filter-driver"
+ id="id-fw-network-filter-driver">The network filter driver</a>
+ </h3>
+ <p>This driver provides a fully configurable network filtering capability
+ that leverages ebtables, iptables and ip6tables. This was written by
+ the libvirt guys at IBM and although its XML schema is defined by libvirt,
+ the conceptual model is closely aligned with the DMTF CIM schema for
+ network filtering:
+ </p>
+ <p><a href="http://www.dmtf.org/standards/cim/cim_schema_v2230/CIM_Network.pdf">http://www.dmtf.org/standards/cim/cim_schema_v2230/CIM_Network.pdf</a></p>
+ <p>The filters are managed in libvirt as a top level, standalone object.
+ This allows the filters to then be referenced by any libvirt object
+ that requires their functionality, instead tieing them only to use
+ by guest NICs. In the current implementation, filters can be associated
+ with individual guest NICs via the libvirt domain XML format. In the
+ future we might allow filters to be associated with the virtual network
+ objects. Further we're expecting to define a new 'virtual switch' object
+ to remove the complexity of configuring bridge/sriov/vepa networking
+ modes. This make also end up making use of network filters.
+ </p>
+ <p>There are a new set of virsh commands for managing network filters:</p>
+ <ul>
+ <li>virsh nwfilter-define
+ <br /><br />
+ define or update a network filter from an XML file
+ <br /><br />
+ </li>
+ <li>virsh nwfilter-undefine
+ <br /><br />
+ undefine a network filter
+ <br /><br />
+ </li>
+ <li>virsh nwfilter-dumpxml
+ <br /><br />
+ network filter information in XML
+ <br /><br />
+ </li>
+ <li>virsh nwfilter-list
+ <br /><br />
+ list network filters
+ <br /><br />
+ </li>
+ <li>virsh nwfilter-edit
+ <br /><br />
+ edit XML configuration for a network filter
+ </li>
+ </ul>
+ <p>There are equivalently named C APIs for each of these commands.</p>
+ <p>As with all objects libvirt manages, network filters are configured
+using an XML format. At a high level the format looks like this:
+ </p>
+<pre>
+<filter name='no-spamming' chain='XXXX'>
+ <uuid>d217f2d7-5a04-0e01-8b98-ec2743436b74</uuid>
+
+ <rule ...>
+ ....
+ </rule>
+
+ <filterref filter='XXXX'/>
+</filter></pre>
+ <p>Every filter has a name and UUID which serve as unique identifiers.
+ A filter can have zero-or-more <code><rule></code> elements which
+ are used to actually define network controls. Filters can be arranged
+ into a DAG, so zero-or-more <code><filterref/></code> elements are
+ also allowed. Cycles in the graph are not allowed.
+ </p>
+ <p>The <code><rule></code> element is where all the interesting stuff
+ happens. It has three attributes, an action, a traffic direction and an
+ optional priority. eg:
+ </p>
+ <pre><rule action='drop' direction='out' priority='500'></pre>
+ <p>Within the rule there are a wide variety of elements allowed, which
+ do protocol specific matching. Supported protocols currently include
+ <code>mac</code>, <code>arp</code>, <code>rarp</code>, <code>ip</code>,
+ <code>ipv6</code>, <code>tcp/ip</code>, <code>icmp/ip</code>,
+ <code>igmp/ip</code>, <code>udp/ip</code>, <code>udplite/ip</code>,
+ <code>esp/ip</code>, <code>ah/ip</code>, <code>sctp/ip</code>,
+ <code>tcp/ipv6</code>, <code>icmp/ipv6</code>, <code>igmp/ipv6</code>,
+ <code>udp/ipv6</code>, <code>udplite/ipv6</code>, <code>esp/ipv6</code>,
+ <code>ah/ipv6</code>, <code>sctp/ipv6</code>. Each protocol defines what
+ is valid inside the <rule> element. The general pattern though is:
+ </p>
+ <pre>
+<protocol match='yes|no' attribute1='value1' attribute2='value2'/></pre>
+ <p>So, eg a TCP protocol, matching ports 0-1023 would be expressed as:</p>
+ <pre><tcp match='yes' srcportstart='0' srcportend='1023'/></pre>
+ <p>Attributes can included references to variables defined by the
+ object using the rule. So the guest XML format allows each NIC
+ to have a MAC address and IP address defined. These are made
+ available to filters via the variables <code><b>$IP</b></code> and
+ <code><b>$MAC</b></code>.
+ </p>
+ <p>So to define a filter that prevents IP address spoofing we can
+ simply match on source IP address <code>!= $IP</code> like this:
+ </p>
+ <pre>
+<filter name='no-ip-spoofing' chain='ipv4'>
+ <rule action='drop' direction='out'>
+ <ip match='no' srcipaddr='<b>$IP</b>' />
+ </rule>
+</filter></pre>
+ <p>I'm not going to go into details on all the other protocol
+ matches you can do, because it'll take far too much space.
+ You can read about the options
+ <a href="formatnwfilter.html#nwfelemsRulesProto">here</a>.
+ </p>
+ <p>Out of the box in RHEL6/Fedora rawhide, libvirt ships with a
+ set of default useful rules:
+ </p>
+ <pre>
+# virsh nwfilter-list
+UUID Name
+----------------------------------------------------------------
+15b1ab2b-b1ac-1be2-ed49-2042caba4abb allow-arp
+6c51a466-8d14-6d11-46b0-68b1a883d00f allow-dhcp
+7517ad6c-bd90-37c8-26c9-4eabcb69848d allow-dhcp-server
+3d38b406-7cf0-8335-f5ff-4b9add35f288 allow-incoming-ipv4
+5ff06320-9228-2899-3db0-e32554933415 allow-ipv4
+db0b1767-d62b-269b-ea96-0cc8b451144e clean-traffic
+f88f1932-debf-4aa1-9fbe-f10d3aa4bc95 no-arp-spoofing
+772f112d-52e4-700c-0250-e178a3d91a7a no-ip-multicast
+7ee20370-8106-765d-f7ff-8a60d5aaf30b no-ip-spoofing
+d5d3c490-c2eb-68b1-24fc-3ee362fc8af3 no-mac-broadcast
+fb57c546-76dc-a372-513f-e8179011b48a no-mac-spoofing
+dba10ea7-446d-76de-346f-335bd99c1d05 no-other-l2-traffic
+f5c78134-9da4-0c60-a9f0-fb37bc21ac1f no-other-rarp-traffic
+7637e405-4ccf-42ac-5b41-14f8d03d8cf3 qemu-announce-self
+9aed52e7-f0f3-343e-fe5c-7dcb27b594e5 qemu-announce-self-rarp</pre>
+ <p>Most of these are just building blocks. The interesting one here
+ is 'clean-traffic'. This pulls together all the building blocks
+ into one filter that you can then associate with a guest NIC.
+ This stops the most common bad things a guest might try, IP
+ spoofing, arp spoofing and MAC spoofing. To look at the rules for
+ any of these just do:
+ </p>
+ <pre>virsh nwfilter-dumpxml FILTERNAME|UUID</pre>
+ <p>They are all stored in <code>/etc/libvirt/nwfilter</code>, but don't
+ edit the files there directly. Use <code>virsh nwfilter-define</code>
+ to update them. This ensures the guests have their iptables/ebtables
+ rules recreated.
+ </p>
+ <p>To associate the clean-trafffic filter with a guest, edit the
+ guest XML config and change the <code><interface></code> element
+ to include a <code><filterref></code> and also specify the
+ whitelisted <code><ip address/></code> the guest is allowed to
+ use:
+ </p>
+ <pre>
+<interface type='bridge'>
+ <mac address='52:54:00:56:44:32'/>
+ <source bridge='br1'/>
+ <ip address='10.33.8.131'/>
+ <target dev='vnet0'/>
+ <model type='virtio'/>
+ <filterref filter='clean-traffic'/>
+</interface></pre>
+ <p>If no <code><ip address></code> is included, the network filter
+ driver will activate its 'learning mode'. This uses libpcap to snoop on
+ network traffic the guest sends and attempts to identify the
+ first IP address it uses. It then locks traffic to this address.
+ Obviously this isn't entirely secure, but it does offer some
+ protection against the guest being trojaned once up and running.
+ In the future we intend to enhance the learning mode so that it
+ looks for DHCPOFFERS from a trusted DHCP server and only allows
+ the offered IP address to be used.
+ </p>
+ <p>Now, how is all this implemented...?</p>
+ <p>The network filter driver uses a combination of ebtables, iptables and
+ ip6tables, depending on which protocols are referenced in a filter. The
+ out of the box 'clean-traffic' filter rules only require use of
+ ebtables. If you want to do matching at tcp/udp/etc protocols (eg to add
+ a new filter 'no-email-spamming' to block port 25), then iptables will
+ also be used.
+ </p>
+ <p>The driver attempts to keep its rules separate from those that
+ the host admin might already have configured. So the first thing
+ it does with ebtables, is to add two hooks in POSTROUTING and
+ PREROUTING chains, to redirect traffic to custom chains. These
+ hooks match on the TAP device name of the guest NIC, so they
+ should not interact badly with any administrator defined rules:
+ </p>
+ <pre>
+Bridge chain: PREROUTING, entries: 1, policy: ACCEPT
+-i vnet0 -j libvirt-I-vnet0
+
+Bridge chain: POSTROUTING, entries: 1, policy: ACCEPT
+-o vnet0 -j libvirt-O-vnet0</pre>
+ <p>To keep things managable and easy to follow, the driver will then
+ create further sub-chains for each protocol then it needs to match
+ against:
+ </p>
+ <pre>
+Bridge chain: libvirt-I-vnet0, entries: 5, policy: ACCEPT
+-p IPv4 -j I-vnet0-ipv4
+-p ARP -j I-vnet0-arp
+-p 0x8035 -j I-vnet0-rarp
+-p 0x835 -j ACCEPT
+-j DROP
+
+Bridge chain: libvirt-O-vnet0, entries: 4, policy: ACCEPT
+-p IPv4 -j O-vnet0-ipv4
+-p ARP -j O-vnet0-arp
+-p 0x8035 -j O-vnet0-rarp
+-j DROP</pre>
+ <p>Finally, here comes the actual implementation of the filters. This
+ example shows the 'clean-traffic' filter implementation.
+ I'm not going to explain what this is doing now. :-)
+ </p>
+ <pre>
+Bridge chain: I-vnet0-ipv4, entries: 2, policy: ACCEPT
+-s ! 52:54:0:56:44:32 -j DROP
+-p IPv4 --ip-src ! 10.33.8.131 -j DROP
+
+Bridge chain: O-vnet0-ipv4, entries: 1, policy: ACCEPT
+-j ACCEPT
+
+Bridge chain: I-vnet0-arp, entries: 6, policy: ACCEPT
+-s ! 52:54:0:56:44:32 -j DROP
+-p ARP --arp-mac-src ! 52:54:0:56:44:32 -j DROP
+-p ARP --arp-ip-src ! 10.33.8.131 -j DROP
+-p ARP --arp-op Request -j ACCEPT
+-p ARP --arp-op Reply -j ACCEPT
+-j DROP
+
+Bridge chain: O-vnet0-arp, entries: 5, policy: ACCEPT
+-p ARP --arp-op Reply --arp-mac-dst ! 52:54:0:56:44:32 -j DROP
+-p ARP --arp-ip-dst ! 10.33.8.131 -j DROP
+-p ARP --arp-op Request -j ACCEPT
+-p ARP --arp-op Reply -j ACCEPT
+-j DROP
+
+Bridge chain: I-vnet0-rarp, entries: 2, policy: ACCEPT
+-p 0x8035 -s 52:54:0:56:44:32 -d Broadcast --arp-op Request_Reverse --arp-ip-src 0.0.0.0 --arp-ip-dst 0.0.0.0 --arp-mac-src 52:54:0:56:44:32 --arp-mac-dst 52:54:0:56:44:32 -j ACCEPT
+-j DROP
+
+Bridge chain: O-vnet0-rarp, entries: 2, policy: ACCEPT
+-p 0x8035 -d Broadcast --arp-op Request_Reverse --arp-ip-src 0.0.0.0 --arp-ip-dst 0.0.0.0 --arp-mac-src 52:54:0:56:44:32 --arp-mac-dst 52:54:0:56:44:32 -j ACCEPT
+-j DROP</pre>
+ <p>NB, we would have liked to include the prefix 'libvirt-' in all
+ of our chain names, but unfortunately the kernel limits names
+ to a very short maximum length. So only the first two custom
+ chains can include that prefix. The others just include the
+ TAP device name + protocol name.
+ </p>
+ <p>If I define a new filter 'no-spamming' and then add this to the
+ 'clean-traffic' filter, I can illustrate how iptables usage works:
+ </p>
+ <pre>
+# cat > /root/spamming.xml <<EOF
+<filter name='no-spamming' chain='root'>
+ <uuid>d217f2d7-5a04-0e01-8b98-ec2743436b74</uuid>
+ <rule action='drop' direction='out' priority='500'>
+ <tcp dstportstart='25' dstportend='25'/>
+ </rule>
+</filter>
+EOF
+# virsh nwfilter-define /root/spamming.xml
+# virsh nwfilter-edit clean-traffic</pre>
+
+ <p>...add <code><filterref filter='no-spamming'/></code></p>
+ <p>All active guests immediately have their iptables/ebtables rules
+ rebuilt.
+ </p>
+ <p>The network filter driver deals with iptables in a very similar
+ way. First it separates out its rules from those the admin may
+ have defined, by adding a couple of hooks into the INPUT/FORWARD
+ chains:
+ </p>
+ <pre>
+Chain INPUT (policy ACCEPT 13M packets, 21G bytes)
+target prot opt in out source destination
+libvirt-host-in all -- * * 0.0.0.0/0 0.0.0.0/0
+
+Chain FORWARD (policy ACCEPT 5532K packets, 3010M bytes)
+target prot opt in out source destination
+libvirt-in all -- * * 0.0.0.0/0 0.0.0.0/0
+libvirt-out all -- * * 0.0.0.0/0 0.0.0.0/0
+libvirt-in-post all -- * * 0.0.0.0/0 0.0.0.0/0</pre>
+ <p>These custom chains then do matching based on the TAP device
+ name, so they won't open holes in the admin defined matches for
+ the LAN/WAN (if any).
+ </p>
+ <pre>
+Chain libvirt-host-in (1 references)
+ target prot opt in out source destination
+ HI-vnet0 all -- * * 0.0.0.0/0 0.0.0.0/0 [goto] PHYSDEV match --physdev-in vnet0
+
+Chain libvirt-in (1 references)
+ target prot opt in out source destination
+ FI-vnet0 all -- * * 0.0.0.0/0 0.0.0.0/0 [goto] PHYSDEV match --physdev-in vnet0
+
+Chain libvirt-in-post (1 references)
+ target prot opt in out source destination
+ ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 PHYSDEV match --physdev-in vnet0
+
+Chain libvirt-out (1 references)
+ target prot opt in out source destination
+ FO-vnet0 all -- * * 0.0.0.0/0 0.0.0.0/0 [goto] PHYSDEV match --physdev-out vnet0</pre>
+ <p>Finally, we can see the interesting bit which is the actual
+ implementation of my filter to block port 25 access:
+ </p>
+ <pre>
+Chain FI-vnet0 (1 references)
+ target prot opt in out source destination
+ DROP tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:25
+
+Chain FO-vnet0 (1 references)
+ target prot opt in out source destination
+ DROP tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp spt:25
+
+Chain HI-vnet0 (1 references)
+ target prot opt in out source destination
+ DROP tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:25</pre>
+ <p>One thing in looking at this you may notice is that if there
+ are many guests all using the same filters, we will be duplicating
+ the iptables rules over and over for each guest. This is merely a
+ limitation of the current rules engine implementation. At the libvirt
+ object modelling level you can clearly see we've designed the model
+ so filter rules are defined in one place, and indirectly referenced
+ by guests. Thus it should be possible to change the implementation in
+ the future so we can share the actual iptables/ebtables rules for
+ each guest to create a more scalable system. The stuff in current libvirt
+ is more or less the very first working implementation we've had of this,
+ so there's not been much optimization work done yet.
+ </p>
+ <p>Also notice that at the XML level we don't expose the fact we
+ are using iptables or ebtables at all. The rule definition is done in
+ terms of network protocols. Thus if we ever find a need, we could
+ plug in an alternative implementation that calls out to a different
+ firewall implementation instead of ebtables/iptables (providing that
+ implementation was suitably expressive of course)
+ </p>
+ <p>Finally, in terms of problems we have in deployment. The biggest
+ problem is that if the admin does <code>service iptables restart</code>
+ all our work gets blown away. We've experimented with using lokkit
+ to record our custom rules in a persistent config file, but that
+ caused different problem. Admins who were not using lokkit for
+ their config found that all their own rules got blown away. So
+ we threw away our lokkit code. Instead we document that if you
+ run <code>service iptables restart</code>, you need to send SIGHUP to
+ libvirt to make it recreate its rules.
+ </p>
+ <p>More in depth documentation on this is <a href="formatnwfilter.html">here</a>.</p>
+ </body>
+</html>