[ovs-dev] tunneling: new GRE vport option and action
casado at nicira.com
Wed Oct 27 09:28:28 PDT 2010
The target use case for this patch is clearly very important to OVS.
However, I agree that if we're going to add support for encap/decap
usage as stateless tags (similar to mpls) it should fit within a more
general framework (something like what is outlined in OF1.1
unimplemented draft --
Creating a vendor extension for a special case of GRE doesn't really
lend well to a more generalized multi-table push/pop/swap tag mechanism
which extends to other stateless encap protocols that have ubiquitous
hardware support such as L2inL2 and L3inL3 -- L3inL3 only being
stateless in that today fragmentation of the outer header is not, in
general, done by the hardware at the termination point.
This is definitely an area in which ovs could use contribution!
P.S. I don't understand why you think tunnels cannot be managed by a
controller. All tunnel provisioning/configuring is accessible through
the ovsdb config protocol.
P.S.II Sorry Ben for top posting. Justin did it first ... :)
> Hi, Romain. Thank you very much for contributing these patches. We're in agreement that this kind of functionality would be very useful. While working on the OVS architecture and applications, we've come to think of two broad approaches to these sorts of problems: virtual port-based solutions (tunnels) and match/action-based solutions (tags). We think both have important applications, but the current patches are a mix of the two, where we feel it is better to support each approach in a distinct way. In our view, a tunnel has state associated with it, while a tag is something that is slapped onto a packet.
> We would love to support multicast tunnels, but feel that it should be done in a way that is closer to the way that Linux handles them natively (and described by Jesse). To support a tagging approach for lookup, there needs to be a way to match these tags and pop them off as they traverse some sort of pipeline. Within the next week, we plan to introduce an OpenFlow vendor extension to support TLV-based match definitions, which will allow a much more flexible method for defining flows. Once that is added, an action to pop tags off headers and reprocess the packet through a resubmit action (or one day multiple table support) should be sufficient to do what you want. This has the added benefit of caching out into a very fast flow lookup in the kernel, since it transforms into a single lookup.
> We'd be happy to collaborate with you on a design and delve into more details. This would make a great addition to OVS!
> [Ben's probably going to yell at me for top-posting on a threaded message. Egad!]
> On Oct 26, 2010, at 9:07 PM, Romain Lenglet wrote:
>> On 10/27/10 09:30, Jesse Gross wrote:
>>> On Tue, Oct 26, 2010 at 12:24 AM,<romain.lenglet at berabera.info> wrote:
>>>> This series of 2 patches extends the GRE vport implementation to:
>>>> 1) add an option to prevent matching received GRE packets on their
>>>> remote (source) IP address;
>>>> 2) add a vendor OpenFlow action to dynamically set the destination IP
>>>> address of GRE packets in a flow.
>>>> The code in those patches is Copyright (C) 2010 Midokura, Inc. and
>>>> under the same licenses as the modified files.
>>>> We're using the new match_in_remote_ip option (patch 1/2) for instance
>>>> to use GRE with IP Multicast addresses as destination addresses. In
>>>> this use case, OVS bridges on all hosts have one GRE port sending GRE
>>>> packets to the multicast IP address, and subscribe to that multicast
>>>> group to receive the GRE packets from all other hosts. Disabling
>>>> matching on the source IP address is necessary to receive GRE packets
>>>> from more than one host. In this case, GRE packets are matched on the
>>>> GRE key for each port.
>>> It's actually possible to implement IP multicast in a simpler manner.
>>> Since we can know whether a given destination address is multicast,
>>> it's not necessary to add additional wildcard matching in the receive
>>> port lookup function. In general I am trying to reduce this type of
>>> wildcarding since it adds complexity.
>> The problem solved by adding match_in_remote_ip is independent from multicast. This was just an example of usage (and a confusing one, I admit). The problem solved is: how to receive GRE packets from many other hosts without having to create many GRE vports?
>> We have many hosts (we will have up to thousands of hosts) that interact directly with each other, as in a full mesh. Each host may send GRE packets to any other host with the same GRE key. Those GRE packets may be multicast or unicast: it doesn't matter and a host will receive both (we don't set the "local_ip" option, so any destination address is OK, multicast or not).
>> Using match_in_remote_ip, a single GRE vport in a datapath can receive packets from all other clients. Without it, each datapath will have to have one GRE vport for each other client, i.e. potentially thousands of vports. So yes, it seems necessary to us to add wildcard matching in the receive port function, to avoid an unnecessary proliferation of vports.
>> And I don't think that this patch adds that much complexity:
>> datapath/tunnel.c | 25 ++++++++++++++++++++++---
>> include/openvswitch/tunnel.h | 1 +
>> lib/netdev-vport.c | 18 ++++++++++++++++++
>>> If, for multicast packets, we change the semantics of the matching
>>> slightly by doing a lookup on the incoming destination address against
>>> the configured remote_ip then it will "just work" without needing any
>>> additional config. The Linux ip_gre module is an example of this.
>>> It's worth looking at for ideas, since it also contains a few other
>>> details, such as joining the appropriate multicast group, that are
>>> needed for correct operation.
>> Sorry, the wording in this cover email was wrong (although the patch summary is correct). The patch adds wildcard matching on the *source* address of incoming GRE packets (i.e. the "remote IP"), not the destination address. Wildcarding on the destination address of incoming GRE packets is already handled by the "local_ip" option.
>> There's no need to do anything more to support multicast. It's working fine already. We're managing memberships with an additional daemon in userspace, and that's fine for us. The problem solved by match_in_remote_ip is a different problem: how to receive GRE packets from many senders on a single GRE vport.
>>>> We're using the new set_tunnel_nw_dst action (patch 2/2) to
>>>> dynamically set the destination IP address of GRE packets separately
>>>> for each flow. This allows chosing the destination IP of tunnels
>>>> per-flow, instead of per-port. This is much more scalable and
>>>> practical than using one GRE port per flow. The only drawback is that
>>>> header caching has to be automatically disabled when that action is
>>>> applied to a packet.
>>> Can you describe your environment a bit and be a little more specific
>>> about the scalability problems that you are running into? In general,
>>> I think that the port per remote peer abstraction is fairly natural,
>>> so I would want to know more about the benefits before breaking that.
>> We need to control the destination of GRE packets for each flow, from an OpenFlow controller. We may have thousands of hosts/datapaths that forward flows to each other over GRE. We have two ways to do this:
>> - the current way: In every datapath, create one GRE vport for each other host/datapath. In the OpenFlow controller, for each flow, set the appropriate port in an "output:X" action. One has then to create thousands of GRE vports in every datapath, and must dynamically add or remove vports in every datapath whenever a host is added into or removed from the "pool." For N hosts, one needs to dynamically create and manage N*(N-1) ports.
>> - the set_tunnel_nw_dst way: In every datapath, create a single GRE vport. In the OpenFlow controller, for each flow, select the GRE endpoint destination for the flow and set it in an "set_tunnel_nw_dst:w.x.y.z" action, and output the flow to that single GRE vport with an "output:X" action where X is the same port number for all flows. For N hosts, one only needs to statically create N ports (1 in each datapath). There's no need to dynamically reconfiguration of ports, all dynamic control is done by the OpenFlow controller.
>> So the problem we have with the current GRE vport implementation is that it doesn't scale with the number of GRE endpoints that communicate with each other: we need to create N*(N-1) GRE vports for N hosts to communicate with each other.
>> The number of GRE vports grows quadratically with the number of endpoints, so the cost of dynamically adding and removing GRE vports on every host/datapath, whenever a host is added or removed, becomes high.
>> Romain Lenglet
>> dev mailing list
>> dev at openvswitch.org
> dev mailing list
> dev at openvswitch.org
More information about the dev