[ovs-dev] MPLS and VLAN QinQ patch
rkerur at gmail.com
Thu Jun 14 06:22:40 PDT 2012
On Thu, Jun 14, 2012 at 4:12 AM, Jesse Gross <jesse at nicira.com> wrote:
> On Thu, Jun 14, 2012 at 2:42 PM, ravi kerur <rkerur at gmail.com> wrote:
>> On Wed, Jun 13, 2012 at 8:38 PM, Jesse Gross <jesse at nicira.com> wrote:
>>> On Thu, Jun 14, 2012 at 9:37 AM, Ben Pfaff <blp at nicira.com> wrote:
>>>> On Wed, Jun 13, 2012 at 09:55:36PM +0900, Jesse Gross wrote:
>>>>> If we go with what you have now, I'm fairly confident that we will
>>>>> regret it in the future. The kernel code used to more directly
>>>>> implement various features and prior to upstreaming we broke many
>>>>> of them down. I'm happy with the results of that but this time we
>>>>> won't have the benefit of revising things later. This is
>>>>> particularly bad because it deviates from our usual model of
>>>>> userspace controlling everything and here userspace won't even know
>>>>> what the flow looks like. The effects of this tend to metastasize
>>>>> because when userspace doesn't know what the packet looks like it
>>>>> can't implement things that it might overwise be able to do and
>>>>> more and more ends up in the kernel. The other thing, which is
>>>>> specific to MPLS, is that there is no inherent way to know the type
>>>>> of the payload. Userspace is vastly more likely to have this
>>>>> information in the event that we want to do something with the
>>>>> inner packet. In your patch the kernel is basically assuming that
>>>>> the type is IP (OpenFlow doesn't give us any additional information
>>>>> but it doesn't seem like a good idea in general).
>>>>> Using the approach that I had suggested largely avoids these
>>>>> problems and if we could go that path by adding say, the ability to
>>>>> match on another tag, then I think that would be
>>>>> reasonable. However, it still has a variation of the last problem,
>>>>> which is that although it may know what the inner type is, unless it
>>>>> can tell that to the kernel there is no way to get to the inner
>>>>> flow. This makes copy inwards to IP header difficult. Perhaps we
>>>>> could add a mechanism for userspace to tell the kernel the type of
>>>>> the inner packet, although it might not know either.
>>>>> Potentially a third approach is to add a more native way to
>>>>> recirculate through the pipeline (somewhat like a patch port but
>>>>> lighter weight and with a different intention). In addition to the
>>>>> TTL issues it could be useful in situations where you want to parse
>>>>> deeper into the packet than the number of MPLS or vlan tags
>>>>> supported. Often times people might want to do L3 operations after
>>>>> popping off an MPLS tag and in that case you almost certainly end up
>>>>> doing something like this.
>>>>> Ben, do you have any thoughts?
>>>> I think I understand options 1 and 3 but I wasn't able to quickly
>>>> figure out what option 2 is.
>>> Option 2 would be to do something similar to what we did with IP TTL
>>> where you have a set operation rather than copy in/out/decrement/etc.
>>> It gives userspace much more control because it always knows what the
>>> packet looks like.
>> <rk> this will increase data structure size for both struct flow and
>> sw_flow_key, but do-able. Also note that skb's don't have fields for
>> mpls offsets and it needs to be calculated when modifying, although
>> kernel code might get simplified a little since it just needs to
>> calculate offset and update. Haven't thought through it thoroughly but
>> I think that's the essence. Correct me if I am wrong.
> These are implementation details and not particularly significant
> ones. I want to get the big picture right first.
<rk> atleast I thought we were leaning towards option 2 until option 3
>>>> Anyway, I mostly like option 3. One wrinkle that occurs to me (maybe
>>>> it's obvious?) is to separate defining the inner protocol from the
>>>> recirculation. In other words, you'd have an action,
>>>> e.g. OVS_ACTION_ATTR_SET_MPLS_INNER_PROTO, that tells the datapath
>>>> what to find beyond the MPLS label(s). Following that, you could
>>>> usefully tell the kernel module to copy ttl/in out, or you could
>>>> usefully do a separate recirculate action.
>>> I was thinking that the first pass would pop off the tags and then the
>>> second pass would process the inner packet. Since when popping off
>>> the last tag with MPLS you already set the EtherType, nothing more
>>> needs to be done specific to MPLS. Probably you would model these
>>> passes as connected to tables in userspace, so a common use case might
>>> be the first table does MPLS lookup/pop and the second tables does IP
>>> processing. For that you just need a way to recirculate and know what
>>> pass you're on.
>>> It's somewhat more complicated to have userspace setup recirculation
>>> passes implicitly where it doesn't map directly to OpenFlow but in
>>> theory you can model it the same way for TTL copy or additional levels
>>> of tags.
>> <rk> note that this is case for a pure mpls and egress only. One
>> disadvantage in this scenario is that customers with overlapping ip
>> cannot be supported, this will not be the case for mpls/vpn and
>> doesn't need to parse both mpls and ip.
> I don't know why you think that overlapping IPs can't be supported. A
> design requirement would be that multiple passes over the same packet
> could be linked up in some manner. Also, if you don't need to parse
> the inner packet then you wouldn't run a second pass.
<rk> all packets at last hop come with explicit null label or implicit
null depending on traffic-engineering parameters and not sure how you
would differentiate it.
What you have described above on recirculation is what traditional
routers/switches do in both software and hardware. I figured out ovs
couldn't do it when I was working on qinq. After matching on outer tag
and pop the tag, there was no way to match on modified packet for a
second round on inner tag. So I had to implement single match on both
inner and outer tag. Anyways, the point is, if you guys decide to
finally support some sort of recirculation I believe it won't be
specific to mpls/ip but generic to handle any recirculation.
More information about the dev