ovn-architecture(7)           Open vSwitch Manual          ovn-architecture(7)



NAME
       ovn-architecture - Open Virtual Network architecture

DESCRIPTION
       OVN,  the  Open Virtual Network, is a system to support virtual network
       abstraction.  OVN complements the existing capabilities of OVS  to  add
       native support for virtual network abstractions, such as virtual L2 and
       L3 overlays and security groups.  Services such as DHCP are also desir‐
       able  features.   Just like OVS, OVN’s design goal is to have a produc‐
       tion-quality implementation that can operate at significant scale.

       An OVN deployment consists of several components:

              ·      A Cloud Management System (CMS), which is OVN’s  ultimate
                     client  (via its users and administrators).  OVN integra‐
                     tion  requires  installing  a  CMS-specific  plugin   and
                     related  software  (see  below).   OVN  initially targets
                     OpenStack as CMS.

                     We generally speak of ``the’’ CMS, but  one  can  imagine
                     scenarios  in which multiple CMSes manage different parts
                     of an OVN deployment.

              ·      An OVN Database physical or virtual node (or, eventually,
                     cluster) installed in a central location.

              ·      One or more (usually many) hypervisors.  Hypervisors must
                     run Open vSwitch and implement the interface described in
                     IntegrationGuide.rst  in the OVS source tree.  Any hyper‐
                     visor platform supported by Open vSwitch is acceptable.

              ·      Zero or more gateways.  A gateway extends a  tunnel-based
                     logical  network  into a physical network by bidirection‐
                     ally forwarding packets between tunnels  and  a  physical
                     Ethernet  port.   This allows non-virtualized machines to
                     participate in logical networks.   A  gateway  may  be  a
                     physical  host, a virtual machine, or an ASIC-based hard‐
                     ware switch that supports the vtep(5)  schema.   (Support
                     for the latter will come later in OVN implementation.)

                     Hypervisors  and  gateways  are together called transport
                     node or chassis.

       The diagram below shows how the major components  of  OVN  and  related
       software interact.  Starting at the top of the diagram, we have:

              ·      The Cloud Management System, as defined above.

              ·      The  OVN/CMS  Plugin  is  the  component  of the CMS that
                     interfaces to OVN.  In OpenStack, this is a Neutron plug‐
                     in.   The plugin’s main purpose is to translate the CMS’s
                     notion of logical network configuration,  stored  in  the
                     CMS’s  configuration  database  in a CMS-specific format,
                     into an intermediate representation understood by OVN.

                     This component is  necessarily  CMS-specific,  so  a  new
                     plugin  needs  to be developed for each CMS that is inte‐
                     grated with OVN.  All of the components below this one in
                     the diagram are CMS-independent.

              ·      The  OVN  Northbound  Database  receives the intermediate
                     representation of logical  network  configuration  passed
                     down by the OVN/CMS Plugin.  The database schema is meant
                     to be ``impedance matched’’ with the concepts used  in  a
                     CMS,  so  that  it  directly  supports notions of logical
                     switches, routers, ACLs, and so on.   See  ovn-nb(5)  for
                     details.

                     The  OVN  Northbound  Database  has only two clients: the
                     OVN/CMS Plugin above it and ovn-northd below it.

              ·      ovn-northd(8) connects to  the  OVN  Northbound  Database
                     above  it  and  the OVN Southbound Database below it.  It
                     translates the logical network configuration in terms  of
                     conventional  network concepts, taken from the OVN North‐
                     bound Database, into logical datapath flows  in  the  OVN
                     Southbound Database below it.

              ·      The  OVN Southbound Database is the center of the system.
                     Its clients  are  ovn-northd(8)  above  it  and  ovn-con
                     troller(8) on every transport node below it.

                     The OVN Southbound Database contains three kinds of data:
                     Physical Network (PN) tables that specify  how  to  reach
                     hypervisor  and  other nodes, Logical Network (LN) tables
                     that describe the logical network in terms  of  ``logical
                     datapath  flows,’’  and  Binding tables that link logical
                     network components’ locations to  the  physical  network.
                     The  hypervisors populate the PN and Port_Binding tables,
                     whereas ovn-northd(8) populates the LN tables.

                     OVN Southbound Database performance must scale  with  the
                     number of transport nodes.  This will likely require some
                     work on  ovsdb-server(1)  as  we  encounter  bottlenecks.
                     Clustering for availability may be needed.

       The remaining components are replicated onto each hypervisor:

              ·      ovn-controller(8)  is  OVN’s agent on each hypervisor and
                     software gateway.  Northbound, it  connects  to  the  OVN
                     Southbound  Database to learn about OVN configuration and
                     status and to populate the PN table and the Chassis  col‐
                     umn  in  Binding  table  with  the  hypervisor’s  status.
                     Southbound, it connects to ovs-vswitchd(8) as an OpenFlow
                     controller,  for control over network traffic, and to the
                     local ovsdb-server(1) to allow it to monitor and  control
                     Open vSwitch configuration.

              ·      ovs-vswitchd(8) and ovsdb-server(1) are conventional com‐
                     ponents of Open vSwitch.

                                         CMS
                                          |
                                          |
                              +-----------|-----------+
                              |           |           |
                              |     OVN/CMS Plugin    |
                              |           |           |
                              |           |           |
                              |   OVN Northbound DB   |
                              |           |           |
                              |           |           |
                              |       ovn-northd      |
                              |           |           |
                              +-----------|-----------+
                                          |
                                          |
                                +-------------------+
                                | OVN Southbound DB |
                                +-------------------+
                                          |
                                          |
                       +------------------+------------------+
                       |                  |                  |
         HV 1          |                  |    HV n          |
       +---------------|---------------+  .  +---------------|---------------+
       |               |               |  .  |               |               |
       |        ovn-controller         |  .  |        ovn-controller         |
       |         |          |          |  .  |         |          |          |
       |         |          |          |     |         |          |          |
       |  ovs-vswitchd   ovsdb-server  |     |  ovs-vswitchd   ovsdb-server  |
       |                               |     |                               |
       +-------------------------------+     +-------------------------------+

   Information Flow in OVN
       Configuration data in OVN flows from north to south.  The CMS,  through
       its  OVN/CMS  plugin,  passes  the  logical  network  configuration  to
       ovn-northd via the northbound database.  In turn,  ovn-northd  compiles
       the  configuration  into a lower-level form and passes it to all of the
       chassis via the southbound database.

       Status information in OVN flows from south  to  north.   OVN  currently
       provides  only  a  few  forms of status information.  First, ovn-northd
       populates the up column in the northbound Logical_Switch_Port table: if
       a logical port’s chassis column in the southbound Port_Binding table is
       nonempty, it sets up to true, otherwise to false.  This allows the  CMS
       to detect when a VM’s networking has come up.

       Second, OVN provides feedback to the CMS on the realization of its con‐
       figuration, that is, whether the configuration provided by the CMS  has
       taken  effect.   This  feature  requires  the  CMS  to participate in a
       sequence number protocol, which works the following way:

              1.
                When the CMS updates the configuration in the northbound data‐
                base, as part of the same transaction, it increments the value
                of the nb_cfg column in the NB_Global table.   (This  is  only
                necessary  if the CMS wants to know when the configuration has
                been realized.)

              2.
                When ovn-northd updates the southbound  database  based  on  a
                given  snapshot  of  the northbound database, it copies nb_cfg
                from  northbound  NB_Global  into  the   southbound   database
                SB_Global  table,  as part of the same transaction.  (Thus, an
                observer monitoring both  databases  can  determine  when  the
                southbound database is caught up with the northbound.)

              3.
                After  ovn-northd  receives  confirmation  from the southbound
                database server that its changes have  committed,  it  updates
                sb_cfg in the northbound NB_Global table to the nb_cfg version
                that was pushed down.  (Thus, the CMS or another observer  can
                determine  when the southbound database is caught up without a
                connection to the southbound database.)

              4.
                The  ovn-controller  process  on  each  chassis  receives  the
                updated  southbound  database,  with the updated nb_cfg.  This
                process in turn updates the physical flows  installed  in  the
                chassis’s  Open vSwitch instances.  When it receives confirma‐
                tion from Open vSwitch  that  the  physical  flows  have  been
                updated,  it  updates  nb_cfg in its own Chassis record in the
                southbound database.

              5.
                ovn-northd monitors the nb_cfg column in all  of  the  Chassis
                records  in  the  southbound  database.  It keeps track of the
                minimum value among all the records and  copies  it  into  the
                hv_cfg  column  in the northbound NB_Global table.  (Thus, the
                CMS or another observer can determine when all of the hypervi‐
                sors have caught up to the northbound configuration.)

   Chassis Setup
       Each  chassis  in  an  OVN  deployment  must be configured with an Open
       vSwitch bridge dedicated for OVN’s use, called the integration  bridge.
       System  startup  scripts  may  create  this  bridge  prior  to starting
       ovn-controller if desired.  If this bridge does not exist when ovn-con‐
       troller  starts, it will be created automatically with the default con‐
       figuration suggested  below.   The  ports  on  the  integration  bridge
       include:

              ·      On  any  chassis,  tunnel ports that OVN uses to maintain
                     logical  network  connectivity.    ovn-controller   adds,
                     updates, and removes these tunnel ports.

              ·      On a hypervisor, any VIFs that are to be attached to log‐
                     ical networks.  The hypervisor itself, or the integration
                     between  Open  vSwitch  and  the hypervisor (described in
                     IntegrationGuide.rst) takes care of this.  (This  is  not
                     part  of OVN or new to OVN; this is pre-existing integra‐
                     tion work that has already been done on hypervisors  that
                     support OVS.)

              ·      On  a gateway, the physical port used for logical network
                     connectivity.  System startup scripts add  this  port  to
                     the bridge prior to starting ovn-controller.  This can be
                     a patch port to another bridge,  instead  of  a  physical
                     port, in more sophisticated setups.

       Other  ports should not be attached to the integration bridge.  In par‐
       ticular, physical ports attached to the underlay network (as opposed to
       gateway  ports,  which are physical ports attached to logical networks)
       must not be attached to  the  integration  bridge.   Underlay  physical
       ports  should  instead  be  attached  to a separate Open vSwitch bridge
       (they need not be attached to any bridge at all, in fact).

       The integration bridge should be configured as  described  below.   The
       effect    of    each    of    these    settings    is   documented   in
       ovs-vswitchd.conf.db(5):

              fail-mode=secure
                     Avoids switching packets between  isolated  logical  net‐
                     works  before  ovn-controller  starts up.  See Controller
                     Failure Settings in ovs-vsctl(8) for more information.

              other-config:disable-in-band=true
                     Suppresses in-band  control  flows  for  the  integration
                     bridge.   It  would  be unusual for such flows to show up
                     anyway, because OVN uses a local controller (over a  Unix
                     domain socket) instead of a remote controller.  It’s pos‐
                     sible, however, for some other bridge in the same  system
                     to  have  an  in-band remote controller, and in that case
                     this suppresses the  flows  that  in-band  control  would
                     ordinarily set up.  See In-Band Control in DESIGN.rst for
                     more information.

       The customary name for the integration bridge is  br-int,  but  another
       name may be used.

   Logical Networks
       A  logical  network  implements the same concepts as physical networks,
       but they are insulated from the physical network with tunnels or  other
       encapsulations.   This  allows logical networks to have separate IP and
       other address spaces that overlap, without conflicting, with those used
       for  physical  networks.   Logical  network  topologies can be arranged
       without regard for the topologies of the  physical  networks  on  which
       they run.

       Logical network concepts in OVN include:

              ·      Logical   switches,   the  logical  version  of  Ethernet
                     switches.

              ·      Logical routers, the logical version of IP routers.  Log‐
                     ical switches and routers can be connected into sophisti‐
                     cated topologies.

              ·      Logical datapaths are the logical version of an  OpenFlow
                     switch.   Logical  switches  and  routers are both imple‐
                     mented as logical datapaths.

   Life Cycle of a VIF
       Tables and their schemas presented in isolation are difficult to under‐
       stand.  Here’s an example.

       A VIF on a hypervisor is a virtual network interface attached either to
       a VM or a container running directly on that hypervisor (This  is  dif‐
       ferent from the interface of a container running inside a VM).

       The  steps  in  this  example refer often to details of the OVN and OVN
       Northbound database  schemas.   Please  see  ovn-sb(5)  and  ovn-nb(5),
       respectively, for the full story on these databases.

              1.
                A  VIF’s  life cycle begins when a CMS administrator creates a
                new VIF using the CMS user interface or API and adds it  to  a
                switch  (one implemented by OVN as a logical switch).  The CMS
                updates its  own  configuration.   This  includes  associating
                unique,  persistent identifier vif-id and Ethernet address mac
                with the VIF.

              2.
                The CMS plugin updates the OVN Northbound database to  include
                the new VIF, by adding a row to the Logical_Switch_Port table.
                In the new row, name is vif-id, mac is mac, switch  points  to
                the OVN logical switch’s Logical_Switch record, and other col‐
                umns are initialized appropriately.

              3.
                ovn-northd receives the OVN Northbound  database  update.   In
                turn, it makes the corresponding updates to the OVN Southbound
                database, by adding rows to the OVN Southbound database  Logi
                cal_Flow  table  to  reflect  the new port, e.g. add a flow to
                recognize that packets destined to the new port’s MAC  address
                should  be  delivered to it, and update the flow that delivers
                broadcast and multicast packets to include the new  port.   It
                also  creates  a record in the Binding table and populates all
                its columns except the column that identifies the chassis.

              4.
                On every hypervisor, ovn-controller receives the  Logical_Flow
                table  updates  that ovn-northd made in the previous step.  As
                long as the VM that owns the  VIF  is  powered  off,  ovn-con
                troller  cannot  do  much;  it cannot, for example, arrange to
                send packets to or receive packets from the VIF,  because  the
                VIF does not actually exist anywhere.

              5.
                Eventually, a user powers on the VM that owns the VIF.  On the
                hypervisor where the VM is powered on, the integration between
                the   hypervisor  and  Open  vSwitch  (described  in  Integra
                tionGuide.rst) adds the VIF to the OVN integration bridge  and
                stores  vif-id  in  external_ids:iface-id to indicate that the
                interface is an instantiation of the new VIF.  (None  of  this
                code is new in OVN; this is pre-existing integration work that
                has already been done on hypervisors that support OVS.)

              6.
                On the hypervisor where the VM is powered  on,  ovn-controller
                notices   external_ids:iface-id   in  the  new  Interface.  In
                response, in the OVN Southbound DB, it updates the Binding ta‐
                ble’s  chassis  column for the row that links the logical port
                from external_ids:  iface-id  to  the  hypervisor.  Afterward,
                ovn-controller  updates the local hypervisor’s OpenFlow tables
                so that packets to and from the VIF are properly handled.

              7.
                Some CMS systems, including OpenStack, fully start a  VM  only
                when  its  networking  is  ready.  To support this, ovn-northd
                notices the chassis column updated for the row in Binding  ta‐
                ble  and  pushes  this upward by updating the up column in the
                OVN Northbound database’s Logical_Switch_Port table  to  indi‐
                cate  that  the  VIF is now up.  The CMS, if it uses this fea‐
                ture, can then react by allowing the VM’s  execution  to  pro‐
                ceed.

              8.
                On  every  hypervisor  but  the  one  where  the  VIF resides,
                ovn-controller notices the completely  populated  row  in  the
                Binding  table.   This  provides  ovn-controller  the physical
                location of the logical port, so  each  instance  updates  the
                OpenFlow tables of its switch (based on logical datapath flows
                in the OVN DB Logical_Flow table) so that packets to and  from
                the VIF can be properly handled via tunnels.

              9.
                Eventually,  a  user  powers off the VM that owns the VIF.  On
                the hypervisor where the  VM  was  powered  off,  the  VIF  is
                deleted from the OVN integration bridge.

              10.
                On the hypervisor where the VM was powered off, ovn-controller
                notices that the VIF was deleted.  In response, it removes the
                Chassis  column  content  in the Binding table for the logical
                port.

              11.
                On every hypervisor, ovn-controller notices the empty  Chassis
                column  in the Binding table’s row for the logical port.  This
                means that ovn-controller no longer knows the  physical  loca‐
                tion  of  the logical port, so each instance updates its Open‐
                Flow table to reflect that.

              12.
                Eventually, when the VIF (or  its  entire  VM)  is  no  longer
                needed  by  anyone, an administrator deletes the VIF using the
                CMS user interface or API.  The CMS updates its own configura‐
                tion.

              13.
                The  CMS  plugin removes the VIF from the OVN Northbound data‐
                base, by deleting its row in the Logical_Switch_Port table.

              14.
                ovn-northd receives the OVN  Northbound  update  and  in  turn
                updates  the  OVN Southbound database accordingly, by removing
                or updating the rows from the OVN  Southbound  database  Logi
                cal_Flow table and Binding table that were related to the now-
                destroyed VIF.

              15.
                On every hypervisor, ovn-controller receives the  Logical_Flow
                table  updates  that  ovn-northd  made  in  the previous step.
                ovn-controller updates OpenFlow tables to reflect the  update,
                although  there  may  not  be  much  to  do, since the VIF had
                already become unreachable when it was removed from the  Bind
                ing table in a previous step.

   Life Cycle of a Container Interface Inside a VM
       OVN  provides  virtual  network  abstractions by converting information
       written in OVN_NB  database  to  OpenFlow  flows  in  each  hypervisor.
       Secure virtual networking for multi-tenants can only be provided if OVN
       controller is the only entity that can modify flows  in  Open  vSwitch.
       When  the Open vSwitch integration bridge resides in the hypervisor, it
       is a fair assumption to make that tenant workloads running  inside  VMs
       cannot make any changes to Open vSwitch flows.

       If  the infrastructure provider trusts the applications inside the con‐
       tainers not to break out and modify the Open vSwitch flows,  then  con‐
       tainers can be run in hypervisors.  This is also the case when contain‐
       ers are run inside the VMs and Open  vSwitch  integration  bridge  with
       flows  added  by  OVN  controller resides in the same VM.  For both the
       above cases, the workflow is the same as explained with an  example  in
       the previous section ("Life Cycle of a VIF").

       This  section talks about the life cycle of a container interface (CIF)
       when containers are created in the VMs and the Open vSwitch integration
       bridge  resides  inside  the  hypervisor.  In this case, even if a con‐
       tainer application breaks out, other tenants are not  affected  because
       the  containers  running  inside the VMs cannot modify the flows in the
       Open vSwitch integration bridge.

       When multiple containers are created inside a VM,  there  are  multiple
       CIFs  associated  with them.  The network traffic associated with these
       CIFs need to reach the Open vSwitch integration bridge running  in  the
       hypervisor for OVN to support virtual network abstractions.  OVN should
       also be able to distinguish network traffic coming from different CIFs.
       There are two ways to distinguish network traffic of CIFs.

       One  way  is  to provide one VIF for every CIF (1:1 model).  This means
       that there could be a lot of network devices in the  hypervisor.   This
       would slow down OVS because of all the additional CPU cycles needed for
       the management of all the VIFs.  It would also  mean  that  the  entity
       creating  the containers in a VM should also be able to create the cor‐
       responding VIFs in the hypervisor.

       The second way is to provide a single VIF  for  all  the  CIFs  (1:many
       model).  OVN could then distinguish network traffic coming from differ‐
       ent CIFs via a tag written in every packet.  OVN  uses  this  mechanism
       and uses VLAN as the tagging mechanism.

              1.
                A CIF’s life cycle begins when a container is spawned inside a
                VM by the either the same CMS that created the VM or a  tenant
                that  owns  that  VM  or even a container Orchestration System
                that is different than the CMS that initially created the  VM.
                Whoever the entity is, it will need to know the vif-id that is
                associated with the network interface of the VM through  which
                the  container  interface’s  network traffic is expected to go
                through.  The entity that creates the container interface will
                also need to choose an unused VLAN inside that VM.

              2.
                The  container spawning entity (either directly or through the
                CMS that manages the underlying  infrastructure)  updates  the
                OVN  Northbound  database  to include the new CIF, by adding a
                row to the Logical_Switch_Port table.  In the new row, name is
                any  unique  identifier,  parent_name  is the vif-id of the VM
                through which the CIF’s network  traffic  is  expected  to  go
                through  and  the tag is the VLAN tag that identifies the net‐
                work traffic of that CIF.

              3.
                ovn-northd receives the OVN Northbound  database  update.   In
                turn, it makes the corresponding updates to the OVN Southbound
                database, by adding rows to the OVN Southbound database’s Log
                ical_Flow table to reflect the new port and also by creating a
                new row in the Binding table and populating  all  its  columns
                except the column that identifies the chassis.

              4.
                On  every hypervisor, ovn-controller subscribes to the changes
                in the Binding table.  When a new row is created by ovn-northd
                that  includes a value in parent_port column of Binding table,
                the ovn-controller in the  hypervisor  whose  OVN  integration
                bridge  has that same value in vif-id in external_ids:iface-id
                updates the local hypervisor’s OpenFlow tables so that packets
                to  and from the VIF with the particular VLAN tag are properly
                handled.  Afterward it updates the chassis column of the Bind
                ing to reflect the physical location.

              5.
                One  can only start the application inside the container after
                the underlying network is ready.  To support this,  ovn-northd
                notices  the  updated  chassis  column  in  Binding  table and
                updates the up column in the OVN Northbound  database’s  Logi
                cal_Switch_Port table to indicate that the CIF is now up.  The
                entity responsible to start the container application  queries
                this value and starts the application.

              6.
                Eventually  the entity that created and started the container,
                stops it.  The entity, through the CMS (or  directly)  deletes
                its row in the Logical_Switch_Port table.

              7.
                ovn-northd  receives  the  OVN  Northbound  update and in turn
                updates the OVN Southbound database accordingly,  by  removing
                or  updating  the  rows from the OVN Southbound database Logi
                cal_Flow table that were related to the now-destroyed CIF.  It
                also deletes the row in the Binding table for that CIF.

              8.
                On  every hypervisor, ovn-controller receives the Logical_Flow
                table updates that  ovn-northd  made  in  the  previous  step.
                ovn-controller updates OpenFlow tables to reflect the update.

   Architectural Physical Life Cycle of a Packet
       This section describes how a packet travels from one virtual machine or
       container to another through OVN.   This  description  focuses  on  the
       physical  treatment  of a packet; for a description of the logical life
       cycle of a packet, please refer to the Logical_Flow table in ovn-sb(5).

       This section mentions several data and  metadata  fields,  for  clarity
       summarized here:

              tunnel key
                     When  OVN encapsulates a packet in Geneve or another tun‐
                     nel, it attaches extra data to it to allow the  receiving
                     OVN instance to process it correctly.  This takes differ‐
                     ent forms depending on the particular encapsulation,  but
                     in  each  case we refer to it here as the ``tunnel key.’’
                     See Tunnel Encapsulations, below, for details.

              logical datapath field
                     A field that denotes the logical datapath through which a
                     packet is being processed.  OVN uses the field that Open‐
                     Flow 1.1+ simply (and confusingly) calls ``metadata’’  to
                     store the logical datapath.  (This field is passed across
                     tunnels as part of the tunnel key.)

              logical input port field
                     A field that denotes the  logical  port  from  which  the
                     packet  entered the logical datapath.  OVN stores this in
                     Open vSwitch extension register number 14.

                     Geneve and STT tunnels pass this field  as  part  of  the
                     tunnel  key.   Although  VXLAN  tunnels do not explicitly
                     carry a logical input port, OVN only uses VXLAN to commu‐
                     nicate  with gateways that from OVN’s perspective consist
                     of only a single logical port, so that OVN  can  set  the
                     logical  input  port  field to this one on ingress to the
                     OVN logical pipeline.

              logical output port field
                     A field that denotes the  logical  port  from  which  the
                     packet will leave the logical datapath.  This is initial‐
                     ized to 0 at the beginning of the logical  ingress  pipe‐
                     line.  OVN stores this in Open vSwitch extension register
                     number 15.

                     Geneve and STT tunnels pass this field  as  part  of  the
                     tunnel  key.   VXLAN  tunnels do not transmit the logical
                     output port field.  Since VXLAN tunnels do  not  carry  a
                     logical  output  port  field  in  the  tunnel key, when a
                     packet is received from VXLAN tunnel by an  OVN  hypervi‐
                     sor,  the  packet is resubmitted to table 16 to determine
                     the output port(s);  when the packet  reaches  table  32,
                     these  packets  are  resubmitted  to  table  33 for local
                     delivery by checking a MLF_RCV_FROM_VXLAN flag, which  is
                     set when the packet arrives from a VXLAN tunnel.

              conntrack zone field for logical ports
                     A  field  that  denotes  the connection tracking zone for
                     logical ports.  The value only has local significance and
                     is  not  meaningful between chassis.  This is initialized
                     to 0 at the beginning of the  logical  ingress  pipeline.
                     OVN stores this in Open vSwitch extension register number
                     13.

              conntrack zone fields for Gateway router
                     Fields that denote  the  connection  tracking  zones  for
                     Gateway  routers.   These values only have local signifi‐
                     cance (only on chassis that have Gateway routers  instan‐
                     tiated)  and  is  not  meaningful  between  chassis.  OVN
                     stores the zone information for DNATting in Open  vSwitch
                     extension  register  number  11  and zone information for
                     SNATing in Open vSwitch extension register number 12.

              logical flow flags
                     The logical flags are intended to handle keeping  context
                     between  tables  in order to decide which rules in subse‐
                     quent tables are matched.  These values only  have  local
                     significance and are not meaningful between chassis.  OVN
                     stores the logical flags in Open vSwitch extension regis‐
                     ter number 10.

              VLAN ID
                     The  VLAN ID is used as an interface between OVN and con‐
                     tainers nested inside a VM (see Life Cycle of a container
                     interface inside a VM, above, for more information).

       Initially,  a  VM or container on the ingress hypervisor sends a packet
       on a port attached to the OVN integration bridge.  Then:

              1.
                OpenFlow table 0 performs physical-to-logical translation.  It
                matches  the  packet’s ingress port.  Its actions annotate the
                packet with logical metadata, by setting the logical  datapath
                field  to  identify  the  logical  datapath that the packet is
                traversing and the logical input port field  to  identify  the
                ingress port.  Then it resubmits to table 16 to enter the log‐
                ical ingress pipeline.

                Packets that originate from a container nested within a VM are
                treated  in  a  slightly  different way.  The originating con‐
                tainer can be distinguished based on the VIF-specific VLAN ID,
                so  the  physical-to-logical  translation  flows  additionally
                match on VLAN ID and the actions strip the VLAN header.   Fol‐
                lowing this step, OVN treats packets from containers just like
                any other packets.

                Table 0 also processes packets that arrive from other chassis.
                It  distinguishes  them  from  other  packets by ingress port,
                which is a tunnel.  As with  packets  just  entering  the  OVN
                pipeline,  the  actions  annotate  these  packets with logical
                datapath and logical ingress port metadata.  In addition,  the
                actions  set the logical output port field, which is available
                because in OVN tunneling occurs after the logical output  port
                is known.  These three pieces of information are obtained from
                the tunnel encapsulation metadata (see  Tunnel  Encapsulations
                for  encoding details).  Then the actions resubmit to table 33
                to enter the logical egress pipeline.

              2.
                OpenFlow tables 16 through  31  execute  the  logical  ingress
                pipeline  from  the  Logical_Flow  table in the OVN Southbound
                database.  These tables are expressed  entirely  in  terms  of
                logical  concepts like logical ports and logical datapaths.  A
                big part of ovn-controller’s job is  to  translate  them  into
                equivalent  OpenFlow  (in  particular  it translates the table
                numbers: Logical_Flow tables  0  through  15  become  OpenFlow
                tables 16 through 31).

                Most  OVN actions have fairly obvious implementations in Open‐
                Flow (with OVS  extensions),  e.g.  next;  is  implemented  as
                resubmit,  field  =  constant;  as set_field.  A few are worth
                describing in more detail:

                output:
                       Implemented by resubmitting the packet to table 32.  If
                       the pipeline executes more than one output action, then
                       each one is separately resubmitted to table  32.   This
                       can  be  used  to send multiple copies of the packet to
                       multiple  ports.   (If  the  packet  was  not  modified
                       between  the output actions, and some of the copies are
                       destined to the same hypervisor, then using  a  logical
                       multicast  output  port  would  save  bandwidth between
                       hypervisors.)

                get_arp(P, A);
                get_nd(P, A);
                     Implemented by storing arguments  into  OpenFlow  fields,
                     then resubmitting to table 66, which ovn-controller popu‐
                     lates with flows generated from the MAC_Binding table  in
                     the  OVN Southbound database.  If there is a match in ta‐
                     ble 66, then its actions store the bound MAC in the  Eth‐
                     ernet destination address field.

                     (The  OpenFlow  actions  save  and  restore  the OpenFlow
                     fields used for the arguments, so that the OVN actions do
                     not have to be aware of this temporary use.)

                put_arp(P, A, E);
                put_nd(P, A, E);
                     Implemented   by  storing  the  arguments  into  OpenFlow
                     fields, then outputting a packet to ovn-controller, which
                     updates the MAC_Binding table.

                     (The  OpenFlow  actions  save  and  restore  the OpenFlow
                     fields used for the arguments, so that the OVN actions do
                     not have to be aware of this temporary use.)

              3.
                OpenFlow  tables  32 through 47 implement the output action in
                the logical ingress pipeline.  Specifically, table 32  handles
                packets to remote hypervisors, table 33 handles packets to the
                local hypervisor, and table 34 checks  whether  packets  whose
                logical  ingress  and  egress port are the same should be dis‐
                carded.

                Logical patch ports are a special case.  Logical  patch  ports
                do  not  have  a  physical  location and effectively reside on
                every hypervisor.  Thus, flow table 33, for output to ports on
                the  local  hypervisor, naturally implements output to unicast
                logical patch ports too.  However, applying the same logic  to
                a logical patch port that is part of a logical multicast group
                yields packet duplication, because each hypervisor  that  con‐
                tains  a  logical port in the multicast group will also output
                the packet to the logical patch port.  Thus, multicast  groups
                implement output to logical patch ports in table 32.

                Each  flow  in  table  32 matches on a logical output port for
                unicast or multicast logical ports that include a logical port
                on a remote hypervisor.  Each flow’s actions implement sending
                a packet to the port it matches.  For unicast  logical  output
                ports on remote hypervisors, the actions set the tunnel key to
                the correct value, then send the packet on the tunnel port  to
                the  correct hypervisor.  (When the remote hypervisor receives
                the packet, table 0 there will  recognize  it  as  a  tunneled
                packet  and pass it along to table 33.)  For multicast logical
                output ports, the actions send one copy of the packet to  each
                remote  hypervisor,  in  the  same way as for unicast destina‐
                tions.  If a multicast group includes a logical port or  ports
                on the local hypervisor, then its actions also resubmit to ta‐
                ble 33.  Table 32 also includes a fallback flow that resubmits
                to  table  33  if there is no other match.  Table 32 also con‐
                tains a higher priority rule to match  packets  received  from
                VXLAN  tunnels,  based on flag MLF_RCV_FROM_VXLAN and resubmit
                these packets to table 33 for local delivery. Packets received
                from  VXLAN  tunnels  reach  here because of a lack of logical
                output port field in the tunnel key  and  thus  these  packets
                needed  to  be  submitted  to table 16 to determine the output
                port.

                Flows in table 33 resemble those in table 32 but  for  logical
                ports  that  reside locally rather than remotely.  For unicast
                logical output ports on the local hypervisor, the actions just
                resubmit to table 34.  For multicast output ports that include
                one or more logical ports on the local  hypervisor,  for  each
                such  logical  port  P,  the actions change the logical output
                port to P, then resubmit to table 34.

                A special case is that when a  localnet  port  exists  on  the
                datapath,  remote port is connected by switching to the local‐
                net port. In this case, instead of adding a flow in  table  32
                to  reach  the  remote  port,  a  flow is added in table 33 to
                switch the logical outport to the localnet port, and  resubmit
                to  table  33 as if it were unicasted to a logical port on the
                local hypervisor.

                Table 34 matches and drops packets for which the logical input
                and  output ports are the same and the MLF_ALLOW_LOOPBACK flag
                is not set.  It resubmits other packets to table 48.

              4.
                OpenFlow tables 48 through 63 execute the logical egress pipe‐
                line  from  the Logical_Flow table in the OVN Southbound data‐
                base.  The egress pipeline can perform a final stage of  vali‐
                dation  before packet delivery.  Eventually, it may execute an
                output action, which ovn-controller implements by resubmitting
                to  table  64.  A packet for which the pipeline never executes
                output is effectively  dropped  (although  it  may  have  been
                transmitted through a tunnel across a physical network).

                The  egress  pipeline cannot change the logical output port or
                cause further tunneling.

              5.
                Table 64 bypasses OpenFlow loopback when MLF_ALLOW_LOOPBACK is
                set.   Logical  loopback was handled in table 34, but OpenFlow
                by default also prevents  loopback  to  the  OpenFlow  ingress
                port.  Thus, when MLF_ALLOW_LOOPBACK is set, OpenFlow table 64
                saves the OpenFlow ingress port, sets it to zero, resubmits to
                table  65  for  logical-to-physical  transformation,  and then
                restores the  OpenFlow  ingress  port,  effectively  disabling
                OpenFlow loopback prevents.  When MLF_ALLOW_LOOPBACK is unset,
                table 64 flow simply resubmits to table 65.

              6.
                OpenFlow table 65  performs  logical-to-physical  translation,
                the  opposite  of  table  0.   It matches the packet’s logical
                egress port.  Its  actions  output  the  packet  to  the  port
                attached  to  the  OVN integration bridge that represents that
                logical port.  If the  logical  egress  port  is  a  container
                nested  with  a VM, then before sending the packet the actions
                push on a VLAN header with an appropriate VLAN ID.

                If the logical egress port is a logical patch port, then table
                65  outputs  to  an OVS patch port that represents the logical
                patch port.  The packet re-enters the OpenFlow flow table from
                the  OVS  patch  port’s  peer in table 0, which identifies the
                logical datapath and logical input port based on the OVS patch
                port’s OpenFlow port number.

   Life Cycle of a VTEP gateway
       A  gateway  is  a chassis that forwards traffic between the OVN-managed
       part of a logical network and a physical  VLAN,   extending  a  tunnel-
       based logical network into a physical network.

       The  steps  below  refer  often to details of the OVN and VTEP database
       schemas.  Please see ovn-sb(5), ovn-nb(5)  and  vtep(5),  respectively,
       for the full story on these databases.

              1.
                A VTEP gateway’s life cycle begins with the administrator reg‐
                istering the VTEP gateway as a Physical_Switch table entry  in
                the  VTEP database.  The ovn-controller-vtep connected to this
                VTEP database, will recognize the new VTEP gateway and  create
                a  new  Chassis table entry for it in the OVN_Southbound data‐
                base.

              2.
                The administrator can then create a new  Logical_Switch  table
                entry,  and bind a particular vlan on a VTEP gateway’s port to
                any VTEP logical switch.  Once a VTEP logical switch is  bound
                to  a VTEP gateway, the ovn-controller-vtep will detect it and
                add its name to the vtep_logical_switches column of the  Chas
                sis  table  in  the  OVN_Southbound  database.  Note, the tun
                nel_key column of VTEP logical switch is not  filled  at  cre‐
                ation.   The  ovn-controller-vtep will set the column when the
                correponding vtep logical switch is bound to  an  OVN  logical
                network.

              3.
                Now,  the  administrator can use the CMS to add a VTEP logical
                switch to the OVN logical network.  To do that, the  CMS  must
                first  create  a  new  Logical_Switch_Port  table entry in the
                OVN_Northbound database.  Then, the type column of this  entry
                must  be  set  to  "vtep".   Next, the vtep-logical-switch and
                vtep-physical-switch keys in the options column must  also  be
                specified, since multiple VTEP gateways can attach to the same
                VTEP logical switch.

              4.
                The newly created logical port in the OVN_Northbound  database
                and  its  configuration  will be passed down to the OVN_South
                bound  database  as  a  new  Port_Binding  table  entry.   The
                ovn-controller-vtep  will  recognize  the  change and bind the
                logical port to the corresponding VTEP gateway chassis.   Con‐
                figuration  of  binding the same VTEP logical switch to a dif‐
                ferent OVN logical networks is not allowed and a warning  will
                be generated in the log.

              5.
                Beside  binding  to  the  VTEP  gateway  chassis, the ovn-con
                troller-vtep will update the tunnel_key  column  of  the  VTEP
                logical  switch  to  the  corresponding Datapath_Binding table
                entry’s tunnel_key for the bound OVN logical network.

              6.
                Next, the ovn-controller-vtep will keep reacting to  the  con‐
                figuration  change  in  the Port_Binding in the OVN_Northbound
                database, and updating the Ucast_Macs_Remote table in the VTEP
                database.  This allows the VTEP gateway to understand where to
                forward the unicast traffic coming from the extended  external
                network.

              7.
                Eventually, the VTEP gateway’s life cycle ends when the admin‐
                istrator unregisters the VTEP gateway from the VTEP  database.
                The  ovn-controller-vtep  will  recognize the event and remove
                all related configurations (Chassis table entry and port bind‐
                ings) in the OVN_Southbound database.

              8.
                When  the  ovn-controller-vtep is terminated, all related con‐
                figurations in the OVN_Southbound database and the VTEP  data‐
                base  will be cleaned, including Chassis table entries for all
                registered VTEP gateways and  their  port  bindings,  and  all
                Ucast_Macs_Remote  table entries and the Logical_Switch tunnel
                keys.

DESIGN DECISIONS
   Tunnel Encapsulations
       OVN annotates logical network packets that it sends from one hypervisor
       to  another  with  the  following  three  pieces of metadata, which are
       encoded in an encapsulation-specific fashion:

              ·      24-bit logical datapath identifier, from  the  tunnel_key
                     column in the OVN Southbound Datapath_Binding table.

              ·      15-bit logical ingress port identifier.  ID 0 is reserved
                     for internal use within OVN.  IDs 1 through 32767, inclu‐
                     sive,  may  be  assigned  to  logical ports (see the tun
                     nel_key column in the OVN Southbound Port_Binding table).

              ·      16-bit logical egress port  identifier.   IDs  0  through
                     32767 have the same meaning as for logical ingress ports.
                     IDs 32768 through 65535, inclusive, may  be  assigned  to
                     logical  multicast  groups  (see the tunnel_key column in
                     the OVN Southbound Multicast_Group table).

       For hypervisor-to-hypervisor traffic, OVN supports only Geneve and  STT
       encapsulations, for the following reasons:

              ·      Only STT and Geneve support the large amounts of metadata
                     (over 32 bits per packet) that  OVN  uses  (as  described
                     above).

              ·      STT  and  Geneve  use  randomized UDP or TCP source ports
                     that allows efficient distribution among  multiple  paths
                     in environments that use ECMP in their underlay.

              ·      NICs  are  available to offload STT and Geneve encapsula‐
                     tion and decapsulation.

       Due to its flexibility, the preferred encapsulation between hypervisors
       is  Geneve.   For Geneve encapsulation, OVN transmits the logical data‐
       path identifier in the Geneve VNI.  OVN transmits the  logical  ingress
       and  logical  egress ports in a TLV with class 0x0102, type 0x80, and a
       32-bit value encoded as follows, from MSB to LSB:

              ·      1 bits: rsv (0)

              ·      15 bits: ingress port

              ·      16 bits: egress port


       Environments whose NICs lack Geneve offload may prefer  STT  encapsula‐
       tion  for  performance reasons.  For STT encapsulation, OVN encodes all
       three pieces of logical metadata in the STT 64-bit tunnel  ID  as  fol‐
       lows, from MSB to LSB:

              ·      9 bits: reserved (0)

              ·      15 bits: ingress port

              ·      16 bits: egress port

              ·      24 bits: datapath


       For connecting to gateways, in addition to Geneve and STT, OVN supports
       VXLAN, because only  VXLAN  support  is  common  on  top-of-rack  (ToR)
       switches.   Currently,  gateways  have  a  feature set that matches the
       capabilities as defined by the VTEP schema, so fewer bits  of  metadata
       are  necessary.  In the future, gateways that do not support encapsula‐
       tions with large amounts of metadata may continue  to  have  a  reduced
       feature set.



Open vSwitch 2.6.90            OVN Architecture            ovn-architecture(7)