[ovs-dev] STT Implementation Thoughts

Jesse Gross jesse at nicira.com
Fri Mar 30 14:49:52 PDT 2012

On Tue, Mar 27, 2012 at 12:40 AM, Simon Horman <horms at verge.net.au> wrote:
> On Mon, Mar 26, 2012 at 02:36:31PM +0900, Simon Horman wrote:
>> On Wed, Mar 21, 2012 at 12:09:09PM -0700, Jesse Gross wrote:
>> > On Wed, Mar 21, 2012 at 1:52 AM, Simon Horman <horms at verge.net.au> wrote:
>> > > Hi Jesse, Hi all,
>> > >
>> > > I am currently investigating how STT[1] may be implemented for Open vSwtich.
>> > > In the course of this it has struck me that a property particular
>> > > to STT is that it uses the TCP IP protocol number but actually isn't TCP.
>> > > Which makes me wonder how the receive-side it may be hooked into the Linux
>> > > kernel.
>> > >
>> > > My thought so far is that some modifications may need to be made to
>> > > tcp_v4_rcv() and/or tcp_v6_rcv() to skip TCP processing of packets
>> > > for sockets that have been bound by STT. But I was wondering if you
>> > > have any thoughts on this.
>> >
>> > I agree it is an issue.  I think what we want is something analogous
>> > to encap_rcv in the UDP stack, which is used for exactly this purpose
>> > (for example, the OVS CAPWAP implementation uses it, as do L2TP and
>> > UDP encapsulated IPsec upstream).  It's perhaps somewhat more likely
>> > to be contentious since with UDP the packets are still processed by
>> > the UDP stack just without the userspace termination part whereas this
>> > takes the TCP state machine out of the picture but as least there is
>> > precedent.
>> Agreed, it does seem that there could be some cause for contention there.
>> The next thing that I am puzzling over is how to select a source port.
>> The draft makes reference to using a hash, which seems like a nice idea.
>> However I am concerned about ensuring that a) the port isn't already in use
>> and b) nothing else uses it while STT is using it. It seems to me that
>> an obvious but not necessarily very efficient way to do this would be to
>> create a sock using sock_create() and bind it using a modified
>> version if inet_bind() or similar. Do you have any thoughts on this.
> Ok, scratch that for the most part. I see that the CAPWAP implementation
> makes use of sock_create_kern() to create a socket. So it seems that it
> would be reasonable for an STT implementation to do similar.

I'm actually not sure that it's necessary to really allocate source
ports at all.  The only thing that needs to be unique is the
SIP/DIP/SPORT/DPORT 4-tuple.  It's important for the kernel to do port
allocation if you can have multiple userspace programs that are trying
to connect to the same remote IP/port combination but in this case,
STT effectively "owns" that remote peer (and any program that tries to
establish a TCP connection to it will fail anyways since the remote
will presumably expect it to be STT traffic).  As a result, as long as
there is only one STT stack running, nobody should stomp on that
unique identifier.

More information about the dev mailing list