ovn-architecture(7)           Open vSwitch Manual          ovn-architecture(7)



NAME
       ovn-architecture - Open Virtual Network architecture

DESCRIPTION
       OVN,  the  Open Virtual Network, is a system to support virtual network
       abstraction.  OVN complements the existing capabilities of OVS  to  add
       native support for virtual network abstractions, such as virtual L2 and
       L3 overlays and security groups.  Services such as DHCP are also desir‐
       able  features.   Just like OVS, OVN’s design goal is to have a produc‐
       tion-quality implementation that can operate at significant scale.

       An OVN deployment consists of several components:

              ·      A Cloud Management System (CMS), which is OVN’s  ultimate
                     client  (via its users and administrators).  OVN integra‐
                     tion  requires  installing  a  CMS-specific  plugin   and
                     related  software  (see  below).   OVN  initially targets
                     OpenStack as CMS.

                     We generally speak of ``the’’ CMS, but  one  can  imagine
                     scenarios  in which multiple CMSes manage different parts
                     of an OVN deployment.

              ·      An OVN Database physical or virtual node (or, eventually,
                     cluster) installed in a central location.

              ·      One or more (usually many) hypervisors.  Hypervisors must
                     run Open vSwitch and implement the interface described in
                     IntegrationGuide.md in the OVS source tree.  Any hypervi‐
                     sor platform supported by Open vSwitch is acceptable.

              ·      Zero or more gateways.  A gateway extends a  tunnel-based
                     logical  network  into a physical network by bidirection‐
                     ally forwarding packets between tunnels  and  a  physical
                     Ethernet  port.   This allows non-virtualized machines to
                     participate in logical networks.   A  gateway  may  be  a
                     physical  host, a virtual machine, or an ASIC-based hard‐
                     ware switch that supports the vtep(5)  schema.   (Support
                     for the latter will come later in OVN implementation.)

                     Hypervisors  and  gateways  are together called transport
                     node or chassis.

       The diagram below shows how the major components  of  OVN  and  related
       software interact.  Starting at the top of the diagram, we have:

              ·      The Cloud Management System, as defined above.

              ·      The  OVN/CMS  Plugin  is  the  component  of the CMS that
                     interfaces to OVN.  In OpenStack, this is a Neutron plug‐
                     in.   The plugin’s main purpose is to translate the CMS’s
                     notion of logical network configuration,  stored  in  the
                     CMS’s  configuration  database  in a CMS-specific format,
                     into an intermediate representation understood by OVN.

                     This component is  necessarily  CMS-specific,  so  a  new
                     plugin  needs  to be developed for each CMS that is inte‐
                     grated with OVN.  All of the components below this one in
                     the diagram are CMS-independent.

              ·      The  OVN  Northbound  Database  receives the intermediate
                     representation of logical  network  configuration  passed
                     down by the OVN/CMS Plugin.  The database schema is meant
                     to be ``impedance matched’’ with the concepts used  in  a
                     CMS,  so  that  it  directly  supports notions of logical
                     switches, routers, ACLs, and so on.   See  ovn-nb(5)  for
                     details.

                     The  OVN  Northbound  Database  has only two clients: the
                     OVN/CMS Plugin above it and ovn-northd below it.

              ·      ovn-northd(8) connects to  the  OVN  Northbound  Database
                     above  it  and  the OVN Southbound Database below it.  It
                     translates the logical network configuration in terms  of
                     conventional  network concepts, taken from the OVN North‐
                     bound Database, into logical datapath flows  in  the  OVN
                     Southbound Database below it.

              ·      The  OVN Southbound Database is the center of the system.
                     Its clients  are  ovn-northd(8)  above  it  and  ovn-con
                     troller(8) on every transport node below it.

                     The OVN Southbound Database contains three kinds of data:
                     Physical Network (PN) tables that specify  how  to  reach
                     hypervisor  and  other nodes, Logical Network (LN) tables
                     that describe the logical network in terms  of  ``logical
                     datapath  flows,’’  and  Binding tables that link logical
                     network components’ locations to  the  physical  network.
                     The  hypervisors populate the PN and Port_Binding tables,
                     whereas ovn-northd(8) populates the LN tables.

                     OVN Southbound Database performance must scale  with  the
                     number of transport nodes.  This will likely require some
                     work on  ovsdb-server(1)  as  we  encounter  bottlenecks.
                     Clustering for availability may be needed.

       The remaining components are replicated onto each hypervisor:

              ·      ovn-controller(8)  is  OVN’s agent on each hypervisor and
                     software gateway.  Northbound, it  connects  to  the  OVN
                     Southbound  Database to learn about OVN configuration and
                     status and to populate the PN table and the Chassis  col‐
                     umn  in  Binding  table  with  the  hypervisor’s  status.
                     Southbound, it connects to ovs-vswitchd(8) as an OpenFlow
                     controller,  for control over network traffic, and to the
                     local ovsdb-server(1) to allow it to monitor and  control
                     Open vSwitch configuration.

              ·      ovs-vswitchd(8) and ovsdb-server(1) are conventional com‐
                     ponents of Open vSwitch.

                                         CMS
                                          |
                                          |
                              +-----------|-----------+
                              |           |           |
                              |     OVN/CMS Plugin    |
                              |           |           |
                              |           |           |
                              |   OVN Northbound DB   |
                              |           |           |
                              |           |           |
                              |       ovn-northd      |
                              |           |           |
                              +-----------|-----------+
                                          |
                                          |
                                +-------------------+
                                | OVN Southbound DB |
                                +-------------------+
                                          |
                                          |
                       +------------------+------------------+
                       |                  |                  |
         HV 1          |                  |    HV n          |
       +---------------|---------------+  .  +---------------|---------------+
       |               |               |  .  |               |               |
       |        ovn-controller         |  .  |        ovn-controller         |
       |         |          |          |  .  |         |          |          |
       |         |          |          |     |         |          |          |
       |  ovs-vswitchd   ovsdb-server  |     |  ovs-vswitchd   ovsdb-server  |
       |                               |     |                               |
       +-------------------------------+     +-------------------------------+

   Chassis Setup
       Each chassis in an OVN deployment  must  be  configured  with  an  Open
       vSwitch  bridge dedicated for OVN’s use, called the integration bridge.
       System startup  scripts  may  create  this  bridge  prior  to  starting
       ovn-controller if desired.  If this bridge does not exist when ovn-con‐
       troller starts, it will be created automatically with the default  con‐
       figuration  suggested  below.   The  ports  on  the  integration bridge
       include:

              ·      On any chassis, tunnel ports that OVN  uses  to  maintain
                     logical   network   connectivity.   ovn-controller  adds,
                     updates, and removes these tunnel ports.

              ·      On a hypervisor, any VIFs that are to be attached to log‐
                     ical networks.  The hypervisor itself, or the integration
                     between Open vSwitch and  the  hypervisor  (described  in
                     IntegrationGuide.md)  takes  care  of this.  (This is not
                     part of OVN or new to OVN; this is pre-existing  integra‐
                     tion  work that has already been done on hypervisors that
                     support OVS.)

              ·      On a gateway, the physical port used for logical  network
                     connectivity.   System  startup  scripts add this port to
                     the bridge prior to starting ovn-controller.  This can be
                     a  patch  port  to  another bridge, instead of a physical
                     port, in more sophisticated setups.

       Other ports should not be attached to the integration bridge.  In  par‐
       ticular, physical ports attached to the underlay network (as opposed to
       gateway ports, which are physical ports attached to  logical  networks)
       must  not  be  attached  to  the integration bridge.  Underlay physical
       ports should instead be attached to  a  separate  Open  vSwitch  bridge
       (they need not be attached to any bridge at all, in fact).

       The  integration  bridge  should be configured as described below.  The
       effect   of   each    of    these    settings    is    documented    in
       ovs-vswitchd.conf.db(5):

              fail-mode=secure
                     Avoids  switching  packets  between isolated logical net‐
                     works before ovn-controller starts  up.   See  Controller
                     Failure Settings in ovs-vsctl(8) for more information.

              other-config:disable-in-band=true
                     Suppresses  in-band  control  flows  for  the integration
                     bridge.  It would be unusual for such flows  to  show  up
                     anyway,  because OVN uses a local controller (over a Unix
                     domain socket) instead of a remote controller.  It’s pos‐
                     sible,  however, for some other bridge in the same system
                     to have an in-band remote controller, and  in  that  case
                     this  suppresses  the  flows  that  in-band control would
                     ordinarily set up.  See In-Band Control in DESIGN.md  for
                     more information.

       The  customary  name  for the integration bridge is br-int, but another
       name may be used.

   Logical Networks
       A logical network implements the same concepts  as  physical  networks,
       but  they are insulated from the physical network with tunnels or other
       encapsulations.  This allows logical networks to have separate  IP  and
       other address spaces that overlap, without conflicting, with those used
       for physical networks.  Logical  network  topologies  can  be  arranged
       without  regard  for  the  topologies of the physical networks on which
       they run.

       Logical network concepts in OVN include:

              ·      Logical  switches,  the  logical  version   of   Ethernet
                     switches.

              ·      Logical routers, the logical version of IP routers.  Log‐
                     ical switches and routers can be connected into sophisti‐
                     cated topologies.

              ·      Logical  datapaths are the logical version of an OpenFlow
                     switch.  Logical switches and  routers  are  both  imple‐
                     mented as logical datapaths.

   Life Cycle of a VIF
       Tables and their schemas presented in isolation are difficult to under‐
       stand.  Here’s an example.

       A VIF on a hypervisor is a virtual network interface attached either to
       a  VM  or a container running directly on that hypervisor (This is dif‐
       ferent from the interface of a container running inside a VM).

       The steps in this example refer often to details of  the  OVN  and  OVN
       Northbound  database  schemas.   Please  see  ovn-sb(5)  and ovn-nb(5),
       respectively, for the full story on these databases.

              1.
                A VIF’s life cycle begins when a CMS administrator  creates  a
                new  VIF  using the CMS user interface or API and adds it to a
                switch (one implemented by OVN as a logical switch).  The  CMS
                updates  its  own  configuration.   This  includes associating
                unique, persistent identifier vif-id and Ethernet address  mac
                with the VIF.

              2.
                The  CMS plugin updates the OVN Northbound database to include
                the new VIF, by adding a row to the  Logical_Port  table.   In
                the  new row, name is vif-id, mac is mac, switch points to the
                OVN logical switch’s Logical_Switch record, and other  columns
                are initialized appropriately.

              3.
                ovn-northd  receives  the  OVN Northbound database update.  In
                turn, it makes the corresponding updates to the OVN Southbound
                database,  by adding rows to the OVN Southbound database Logi
                cal_Flow table to reflect the new port, e.g.  add  a  flow  to
                recognize  that packets destined to the new port’s MAC address
                should be delivered to it, and update the flow  that  delivers
                broadcast  and  multicast packets to include the new port.  It
                also creates a record in the Binding table and  populates  all
                its columns except the column that identifies the chassis.

              4.
                On  every hypervisor, ovn-controller receives the Logical_Flow
                table updates that ovn-northd made in the previous  step.   As
                long  as  the  VM  that  owns the VIF is powered off, ovn-con
                troller cannot do much; it cannot,  for  example,  arrange  to
                send  packets  to or receive packets from the VIF, because the
                VIF does not actually exist anywhere.

              5.
                Eventually, a user powers on the VM that owns the VIF.  On the
                hypervisor where the VM is powered on, the integration between
                the  hypervisor  and  Open  vSwitch  (described  in   Integra
                tionGuide.md)  adds  the VIF to the OVN integration bridge and
                stores vif-id in external-ids:iface-id to  indicate  that  the
                interface  is  an instantiation of the new VIF.  (None of this
                code is new in OVN; this is pre-existing integration work that
                has already been done on hypervisors that support OVS.)

              6.
                On  the  hypervisor where the VM is powered on, ovn-controller
                notices  external-ids:iface-id  in  the  new  Interface.    In
                response, it updates the local hypervisor’s OpenFlow tables so
                that packets to and from the VIF are properly handled.  After‐
                ward, in the OVN Southbound DB, it updates the Binding table’s
                chassis column for the row that links the  logical  port  from
                external-ids:iface-id to the hypervisor.

              7.
                Some  CMS  systems, including OpenStack, fully start a VM only
                when its networking is ready.   To  support  this,  ovn-northd
                notices  the chassis column updated for the row in Binding ta‐
                ble and pushes this upward by updating the up  column  in  the
                OVN  Northbound database’s Logical_Port table to indicate that
                the VIF is now up.  The CMS, if it uses this feature, can then
                react by allowing the VM’s execution to proceed.

              8.
                On  every  hypervisor  but  the  one  where  the  VIF resides,
                ovn-controller notices the completely  populated  row  in  the
                Binding  table.   This  provides  ovn-controller  the physical
                location of the logical port, so  each  instance  updates  the
                OpenFlow tables of its switch (based on logical datapath flows
                in the OVN DB Logical_Flow table) so that packets to and  from
                the VIF can be properly handled via tunnels.

              9.
                Eventually,  a  user  powers off the VM that owns the VIF.  On
                the hypervisor where the  VM  was  powered  off,  the  VIF  is
                deleted from the OVN integration bridge.

              10.
                On the hypervisor where the VM was powered off, ovn-controller
                notices that the VIF was deleted.  In response, it removes the
                Chassis  column  content  in the Binding table for the logical
                port.

              11.
                On every hypervisor, ovn-controller notices the empty  Chassis
                column  in the Binding table’s row for the logical port.  This
                means that ovn-controller no longer knows the  physical  loca‐
                tion  of  the logical port, so each instance updates its Open‐
                Flow table to reflect that.

              12.
                Eventually, when the VIF (or  its  entire  VM)  is  no  longer
                needed  by  anyone, an administrator deletes the VIF using the
                CMS user interface or API.  The CMS updates its own configura‐
                tion.

              13.
                The  CMS  plugin removes the VIF from the OVN Northbound data‐
                base, by deleting its row in the Logical_Port table.

              14.
                ovn-northd receives the OVN  Northbound  update  and  in  turn
                updates  the  OVN Southbound database accordingly, by removing
                or updating the rows from the OVN  Southbound  database  Logi
                cal_Flow table and Binding table that were related to the now-
                destroyed VIF.

              15.
                On every hypervisor, ovn-controller receives the  Logical_Flow
                table  updates  that  ovn-northd  made  in  the previous step.
                ovn-controller updates OpenFlow tables to reflect the  update,
                although  there  may  not  be  much  to  do, since the VIF had
                already become unreachable when it was removed from the  Bind
                ing table in a previous step.

   Life Cycle of a Container Interface Inside a VM
       OVN  provides  virtual  network  abstractions by converting information
       written in OVN_NB  database  to  OpenFlow  flows  in  each  hypervisor.
       Secure virtual networking for multi-tenants can only be provided if OVN
       controller is the only entity that can modify flows  in  Open  vSwitch.
       When  the Open vSwitch integration bridge resides in the hypervisor, it
       is a fair assumption to make that tenant workloads running  inside  VMs
       cannot make any changes to Open vSwitch flows.

       If  the infrastructure provider trusts the applications inside the con‐
       tainers not to break out and modify the Open vSwitch flows,  then  con‐
       tainers can be run in hypervisors.  This is also the case when contain‐
       ers are run inside the VMs and Open  vSwitch  integration  bridge  with
       flows  added  by  OVN  controller resides in the same VM.  For both the
       above cases, the workflow is the same as explained with an  example  in
       the previous section ("Life Cycle of a VIF").

       This  section talks about the life cycle of a container interface (CIF)
       when containers are created in the VMs and the Open vSwitch integration
       bridge  resides  inside  the  hypervisor.  In this case, even if a con‐
       tainer application breaks out, other tenants are not  affected  because
       the  containers  running  inside the VMs cannot modify the flows in the
       Open vSwitch integration bridge.

       When multiple containers are created inside a VM,  there  are  multiple
       CIFs  associated  with them.  The network traffic associated with these
       CIFs need to reach the Open vSwitch integration bridge running  in  the
       hypervisor for OVN to support virtual network abstractions.  OVN should
       also be able to distinguish network traffic coming from different CIFs.
       There are two ways to distinguish network traffic of CIFs.

       One  way  is  to provide one VIF for every CIF (1:1 model).  This means
       that there could be a lot of network devices in the  hypervisor.   This
       would slow down OVS because of all the additional CPU cycles needed for
       the management of all the VIFs.  It would also  mean  that  the  entity
       creating  the containers in a VM should also be able to create the cor‐
       responding VIFs in the hypervisor.

       The second way is to provide a single VIF  for  all  the  CIFs  (1:many
       model).  OVN could then distinguish network traffic coming from differ‐
       ent CIFs via a tag written in every packet.  OVN  uses  this  mechanism
       and uses VLAN as the tagging mechanism.

              1.
                A CIF’s life cycle begins when a container is spawned inside a
                VM by the either the same CMS that created the VM or a  tenant
                that  owns  that  VM  or even a container Orchestration System
                that is different than the CMS that initially created the  VM.
                Whoever the entity is, it will need to know the vif-id that is
                associated with the network interface of the VM through  which
                the  container  interface’s  network traffic is expected to go
                through.  The entity that creates the container interface will
                also need to choose an unused VLAN inside that VM.

              2.
                The  container spawning entity (either directly or through the
                CMS that manages the underlying  infrastructure)  updates  the
                OVN  Northbound  database  to include the new CIF, by adding a
                row to the Logical_Port table.  In the new row,  name  is  any
                unique identifier, parent_name is the vif-id of the VM through
                which the CIF’s network traffic is expected to go through  and
                the tag is the VLAN tag that identifies the network traffic of
                that CIF.

              3.
                ovn-northd receives the OVN Northbound  database  update.   In
                turn, it makes the corresponding updates to the OVN Southbound
                database, by adding rows to the OVN Southbound database’s Log
                ical_Flow table to reflect the new port and also by creating a
                new row in the Binding table and populating  all  its  columns
                except the column that identifies the chassis.

              4.
                On  every hypervisor, ovn-controller subscribes to the changes
                in the Binding table.  When a new row is created by ovn-northd
                that  includes a value in parent_port column of Binding table,
                the ovn-controller in the  hypervisor  whose  OVN  integration
                bridge  has that same value in vif-id in external-ids:iface-id
                updates the local hypervisor’s OpenFlow tables so that packets
                to  and from the VIF with the particular VLAN tag are properly
                handled.  Afterward it updates the chassis column of the Bind
                ing to reflect the physical location.

              5.
                One  can only start the application inside the container after
                the underlying network is ready.  To support this,  ovn-northd
                notices  the  updated  chassis  column  in  Binding  table and
                updates the up column in the OVN Northbound  database’s  Logi
                cal_Port table to indicate that the CIF is now up.  The entity
                responsible to start the container  application  queries  this
                value and starts the application.

              6.
                Eventually  the entity that created and started the container,
                stops it.  The entity, through the CMS (or  directly)  deletes
                its row in the Logical_Port table.

              7.
                ovn-northd  receives  the  OVN  Northbound  update and in turn
                updates the OVN Southbound database accordingly,  by  removing
                or  updating  the  rows from the OVN Southbound database Logi
                cal_Flow table that were related to the now-destroyed CIF.  It
                also deletes the row in the Binding table for that CIF.

              8.
                On  every hypervisor, ovn-controller receives the Logical_Flow
                table updates that  ovn-northd  made  in  the  previous  step.
                ovn-controller updates OpenFlow tables to reflect the update.

   Architectural Physical Life Cycle of a Packet
       This section describes how a packet travels from one virtual machine or
       container to another through OVN.   This  description  focuses  on  the
       physical  treatment  of a packet; for a description of the logical life
       cycle of a packet, please refer to the Logical_Flow table in ovn-sb(5).

       This section mentions several data and  metadata  fields,  for  clarity
       summarized here:

              tunnel key
                     When  OVN encapsulates a packet in Geneve or another tun‐
                     nel, it attaches extra data to it to allow the  receiving
                     OVN instance to process it correctly.  This takes differ‐
                     ent forms depending on the particular encapsulation,  but
                     in  each  case we refer to it here as the ``tunnel key.’’
                     See Tunnel Encapsulations, below, for details.

              logical datapath field
                     A field that denotes the logical datapath through which a
                     packet is being processed.  OVN uses the field that Open‐
                     Flow 1.1+ simply (and confusingly) calls ``metadata’’  to
                     store the logical datapath.  (This field is passed across
                     tunnels as part of the tunnel key.)

              logical input port field
                     A field that denotes the  logical  port  from  which  the
                     packet  entered the logical datapath.  OVN stores this in
                     Nicira extension register number 6.

                     Geneve and STT tunnels pass this field  as  part  of  the
                     tunnel  key.   Although  VXLAN  tunnels do not explicitly
                     carry a logical input port, OVN only uses VXLAN to commu‐
                     nicate  with gateways that from OVN’s perspective consist
                     of only a single logical port, so that OVN  can  set  the
                     logical  input  port  field to this one on ingress to the
                     OVN logical pipeline.

              logical output port field
                     A field that denotes the  logical  port  from  which  the
                     packet will leave the logical datapath.  This is initial‐
                     ized to 0 at the beginning of the logical  ingress  pipe‐
                     line.   OVN stores this in Nicira extension register num‐
                     ber 7.

                     Geneve and STT tunnels pass this field  as  part  of  the
                     tunnel  key.   VXLAN  tunnels do not transmit the logical
                     output port field.

              conntrack zone field
                     A field that denotes the connection tracking  zone.   The
                     value  only  has local significance and is not meaningful
                     between chassis.  This is initialized to 0 at the  begin‐
                     ning of the logical ingress pipeline.  OVN stores this in
                     Nicira extension register number 5.

              VLAN ID
                     The VLAN ID is used as an interface between OVN and  con‐
                     tainers nested inside a VM (see Life Cycle of a container
                     interface inside a VM, above, for more information).

       Initially, a VM or container on the ingress hypervisor sends  a  packet
       on a port attached to the OVN integration bridge.  Then:

              1.
                OpenFlow table 0 performs physical-to-logical translation.  It
                matches the packet’s ingress port.  Its actions  annotate  the
                packet  with logical metadata, by setting the logical datapath
                field to identify the logical  datapath  that  the  packet  is
                traversing  and  the  logical input port field to identify the
                ingress port.  Then it resubmits to table 16 to enter the log‐
                ical ingress pipeline.

                It’s possible that a single ingress physical port maps to mul‐
                tiple logical ports with a type of localnet. The logical data‐
                path  and  logical  input  port  fields  will be reset and the
                packet will be resubmitted to table 16 multiple times.

                Packets that originate from a container nested within a VM are
                treated  in  a  slightly  different way.  The originating con‐
                tainer can be distinguished based on the VIF-specific VLAN ID,
                so  the  physical-to-logical  translation  flows  additionally
                match on VLAN ID and the actions strip the VLAN header.   Fol‐
                lowing this step, OVN treats packets from containers just like
                any other packets.

                Table 0 also processes packets that arrive from other chassis.
                It  distinguishes  them  from  other  packets by ingress port,
                which is a tunnel.  As with  packets  just  entering  the  OVN
                pipeline,  the  actions  annotate  these  packets with logical
                datapath and logical ingress port metadata.  In addition,  the
                actions  set the logical output port field, which is available
                because in OVN tunneling occurs after the logical output  port
                is known.  These three pieces of information are obtained from
                the tunnel encapsulation metadata (see  Tunnel  Encapsulations
                for  encoding details).  Then the actions resubmit to table 33
                to enter the logical egress pipeline.

              2.
                OpenFlow tables 16 through  31  execute  the  logical  ingress
                pipeline  from  the  Logical_Flow  table in the OVN Southbound
                database.  These tables are expressed  entirely  in  terms  of
                logical  concepts like logical ports and logical datapaths.  A
                big part of ovn-controller’s job is  to  translate  them  into
                equivalent  OpenFlow  (in  particular  it translates the table
                numbers: Logical_Flow tables  0  through  15  become  OpenFlow
                tables  16  through  31).   For  a  given  packet, the logical
                ingress pipeline  eventually  executes  zero  or  more  output
                actions:

                ·      If  the pipeline executes no output actions at all, the
                       packet is effectively dropped.

                ·      Most commonly, the pipeline executes one output action,
                       which  ovn-controller  implements  by  resubmitting the
                       packet to table 32.

                ·      If the  pipeline  can  execute  more  than  one  output
                       action,  then each one is separately resubmitted to ta‐
                       ble 32.  This can be used to send  multiple  copies  of
                       the  packet  to multiple ports.  (If the packet was not
                       modified between the output actions, and  some  of  the
                       copies  are destined to the same hypervisor, then using
                       a logical multicast output port  would  save  bandwidth
                       between hypervisors.)

              3.
                OpenFlow  tables  32 through 47 implement the output action in
                the logical ingress pipeline.  Specifically, table 32  handles
                packets to remote hypervisors, table 33 handles packets to the
                local hypervisor, and table 34 discards packets whose  logical
                ingress and egress port are the same.

                Logical  patch  ports are a special case.  Logical patch ports
                do not have a physical  location  and  effectively  reside  on
                every hypervisor.  Thus, flow table 33, for output to ports on
                the local hypervisor, naturally implements output  to  unicast
                logical  patch ports too.  However, applying the same logic to
                a logical patch port that is part of a logical multicast group
                yields  packet  duplication, because each hypervisor that con‐
                tains a logical port in the multicast group will  also  output
                the  packet to the logical patch port.  Thus, multicast groups
                implement output to logical patch ports in table 32.

                Each flow in table 32 matches on a  logical  output  port  for
                unicast or multicast logical ports that include a logical port
                on a remote hypervisor.  Each flow’s actions implement sending
                a  packet  to the port it matches.  For unicast logical output
                ports on remote hypervisors, the actions set the tunnel key to
                the  correct value, then send the packet on the tunnel port to
                the correct hypervisor.  (When the remote hypervisor  receives
                the  packet,  table  0  there  will recognize it as a tunneled
                packet and pass it along to table 33.)  For multicast  logical
                output  ports, the actions send one copy of the packet to each
                remote hypervisor, in the same way  as  for  unicast  destina‐
                tions.   If a multicast group includes a logical port or ports
                on the local hypervisor, then its actions also resubmit to ta‐
                ble 33.  Table 32 also includes a fallback flow that resubmits
                to table 33 if there is no other match.

                Flows in table 33 resemble those in table 32 but  for  logical
                ports  that  reside locally rather than remotely.  For unicast
                logical output ports on the local hypervisor, the actions just
                resubmit to table 34.  For multicast output ports that include
                one or more logical ports on the local  hypervisor,  for  each
                such  logical  port  P,  the actions change the logical output
                port to P, then resubmit to table 34.

                Table 34 matches and drops packets for which the logical input
                and  output ports are the same.  It resubmits other packets to
                table 48.

              4.
                OpenFlow tables 48 through 63 execute the logical egress pipe‐
                line  from  the Logical_Flow table in the OVN Southbound data‐
                base.  The egress pipeline can perform a final stage of  vali‐
                dation  before packet delivery.  Eventually, it may execute an
                output action, which ovn-controller implements by resubmitting
                to  table  64.  A packet for which the pipeline never executes
                output is effectively  dropped  (although  it  may  have  been
                transmitted through a tunnel across a physical network).

                The  egress  pipeline cannot change the logical output port or
                cause further tunneling.

              5.
                OpenFlow table 64  performs  logical-to-physical  translation,
                the  opposite  of  table  0.   It matches the packet’s logical
                egress port.  Its  actions  output  the  packet  to  the  port
                attached  to  the  OVN integration bridge that represents that
                logical port.  If the  logical  egress  port  is  a  container
                nested  with  a VM, then before sending the packet the actions
                push on a VLAN header with an appropriate VLAN ID.

                If the logical egress port is a logical patch port, then table
                64  outputs  to  an OVS patch port that represents the logical
                patch port.  The packet re-enters the OpenFlow flow table from
                the  OVS  patch  port’s  peer in table 0, which identifies the
                logical datapath and logical input port based on the OVS patch
                port’s OpenFlow port number.

   Life Cycle of a VTEP gateway
       A  gateway  is  a chassis that forwards traffic between the OVN-managed
       part of a logical network and a physical  VLAN,   extending  a  tunnel-
       based logical network into a physical network.

       The  steps  below  refer  often to details of the OVN and VTEP database
       schemas.  Please see ovn-sb(5), ovn-nb(5)  and  vtep(5),  respectively,
       for the full story on these databases.

              1.
                A VTEP gateway’s life cycle begins with the administrator reg‐
                istering the VTEP gateway as a Physical_Switch table entry  in
                the  VTEP database.  The ovn-controller-vtep connected to this
                VTEP database, will recognize the new VTEP gateway and  create
                a  new  Chassis table entry for it in the OVN_Southbound data‐
                base.

              2.
                The administrator can then create a new  Logical_Switch  table
                entry,  and bind a particular vlan on a VTEP gateway’s port to
                any VTEP logical switch.  Once a VTEP logical switch is  bound
                to  a VTEP gateway, the ovn-controller-vtep will detect it and
                add its name to the vtep_logical_switches column of the  Chas
                sis  table  in  the  OVN_Southbound  database.  Note, the tun
                nel_key column of VTEP logical switch is not  filled  at  cre‐
                ation.   The  ovn-controller-vtep will set the column when the
                correponding vtep logical switch is bound to  an  OVN  logical
                network.

              3.
                Now,  the  administrator can use the CMS to add a VTEP logical
                switch to the OVN logical network.  To do that, the  CMS  must
                first  create a new Logical_Port table entry in the OVN_North
                bound database.  Then, the type column of this entry  must  be
                set  to "vtep".  Next, the vtep-logical-switch and vtep-physi
                cal-switch keys in the options column must also be  specified,
                since multiple VTEP gateways can attach to the same VTEP logi‐
                cal switch.

              4.
                The newly created logical port in the OVN_Northbound  database
                and  its  configuration  will be passed down to the OVN_South
                bound  database  as  a  new  Port_Binding  table  entry.   The
                ovn-controller-vtep  will  recognize  the  change and bind the
                logical port to the corresponding VTEP gateway chassis.   Con‐
                figuration  of  binding the same VTEP logical switch to a dif‐
                ferent OVN logical networks is not allowed and a warning  will
                be generated in the log.

              5.
                Beside  binding  to  the  VTEP  gateway  chassis, the ovn-con
                troller-vtep will update the tunnel_key  column  of  the  VTEP
                logical  switch  to  the  corresponding Datapath_Binding table
                entry’s tunnel_key for the bound OVN logical network.

              6.
                Next, the ovn-controller-vtep will keep reacting to  the  con‐
                figuration  change  in  the Port_Binding in the OVN_Northbound
                database, and updating the Ucast_Macs_Remote table in the VTEP
                database.  This allows the VTEP gateway to understand where to
                forward the unicast traffic coming from the extended  external
                network.

              7.
                Eventually, the VTEP gateway’s life cycle ends when the admin‐
                istrator unregisters the VTEP gateway from the VTEP  database.
                The  ovn-controller-vtep  will  recognize the event and remove
                all related configurations (Chassis table entry and port bind‐
                ings) in the OVN_Southbound database.

              8.
                When  the  ovn-controller-vtep is terminated, all related con‐
                figurations in the OVN_Southbound database and the VTEP  data‐
                base  will be cleaned, including Chassis table entries for all
                registered VTEP gateways and  their  port  bindings,  and  all
                Ucast_Macs_Remote  table entries and the Logical_Switch tunnel
                keys.

DESIGN DECISIONS
   Tunnel Encapsulations
       OVN annotates logical network packets that it sends from one hypervisor
       to  another  with  the  following  three  pieces of metadata, which are
       encoded in an encapsulation-specific fashion:

              ·      24-bit logical datapath identifier, from  the  tunnel_key
                     column in the OVN Southbound Datapath_Binding table.

              ·      15-bit logical ingress port identifier.  ID 0 is reserved
                     for internal use within OVN.  IDs 1 through 32767, inclu‐
                     sive,  may  be  assigned  to  logical ports (see the tun
                     nel_key column in the OVN Southbound Port_Binding table).

              ·      16-bit logical egress port  identifier.   IDs  0  through
                     32767 have the same meaning as for logical ingress ports.
                     IDs 32768 through 65535, inclusive, may  be  assigned  to
                     logical  multicast  groups  (see the tunnel_key column in
                     the OVN Southbound Multicast_Group table).

       For hypervisor-to-hypervisor traffic, OVN supports only Geneve and  STT
       encapsulations, for the following reasons:

              ·      Only STT and Geneve support the large amounts of metadata
                     (over 32 bits per packet) that  OVN  uses  (as  described
                     above).

              ·      STT  and  Geneve  use  randomized UDP or TCP source ports
                     that allows efficient distribution among  multiple  paths
                     in environments that use ECMP in their underlay.

              ·      NICs  are  available to offload STT and Geneve encapsula‐
                     tion and decapsulation.

       Due to its flexibility, the preferred encapsulation between hypervisors
       is  Geneve.   For Geneve encapsulation, OVN transmits the logical data‐
       path identifier in the Geneve VNI.  OVN transmits the  logical  ingress
       and  logical  egress  ports  in  a TLV with class 0x0102, type 0, and a
       32-bit value encoded as follows, from MSB to LSB:

              ·      1 bits: rsv (0)

              ·      15 bits: ingress port

              ·      16 bits: egress port


       Environments whose NICs lack Geneve offload may prefer  STT  encapsula‐
       tion  for  performance reasons.  For STT encapsulation, OVN encodes all
       three pieces of logical metadata in the STT 64-bit tunnel  ID  as  fol‐
       lows, from MSB to LSB:

              ·      9 bits: reserved (0)

              ·      15 bits: ingress port

              ·      16 bits: egress port

              ·      24 bits: datapath


       For connecting to gateways, in addition to Geneve and STT, OVN supports
       VXLAN, because only  VXLAN  support  is  common  on  top-of-rack  (ToR)
       switches.   Currently,  gateways  have  a  feature set that matches the
       capabilities as defined by the VTEP schema, so fewer bits  of  metadata
       are  necessary.  In the future, gateways that do not support encapsula‐
       tions with large amounts of metadata may continue  to  have  a  reduced
       feature set.



Open vSwitch 2.5.1             OVN Architecture            ovn-architecture(7)