[ovs-dev] [PATCH 1/1] sFlow: adds sFlow export
Ben Pfaff
blp at nicira.com
Mon Nov 9 13:52:33 PST 2009
Neil McKee <neil.mckee at inmon.com> writes:
> On Nov 9, 2009, at 12:46 PM, Ben Pfaff wrote:
>> Neil McKee <neil.mckee at inmon.com> writes:
>>> On Nov 9, 2009, at 11:02 AM, Ben Pfaff wrote:
>>>
>>>> Neil McKee <neil.mckee at inmon.com> writes:
>>>> In general, I think that we can do most of what
>>>> sflow_take_sample() does in userspace. Userspace should have the
>>>> set of flow actions (since it added them in the first place). I
>>>> guess the only trick, if there is one, is that sFlow packets
>>>> might show up in userspace for a flow after it's been removed.
>>>> However, that's not very likely (since flows usually get removed
>>>> only after they've gone idle) so perhaps we could ignore it on a
>>>> first pass.
>>>
>>> I'll defer to you on that choice, though it does seem like it might
>>> cause an awkward race someday. sFlow samples should be processed at
>>> a low priority (e.g. compared to spanning-tree updates) so they could
>>> languish in a queue for some time. I think the standard allows you
>>> take as long as a second. If that ever happened then the forwarding
>>> details might have changed or disappeared.
>>
>> I think we can avoid that problem. It's only an initial, very
>> simple implementation that would have it. Even if we cannot in
>> every case, for whatever reason, I am sure that we could detect
>> it, and then we would, I guess, just increment the "drops"
>> counter in struct flow_sample (would that be reasonable
>> behavior?).
>
> The purpose of the "drops" counter is to indicate that you couldn't keep
> up. So if you try to queue a sample for processing but the queue is
> full (because it hasn't been serviced for a while), then you would
> increment it. This could be done at the user-space end by detecting
> any time that you went to read from the queue and discovered it was
> full.
We keep track of "dropped message" counters in the datapath, so
we could probably get a precise number. (We do it on a
per-datapath basis now, but it would make perfect sense to do it
on a per-message queue basis.)
> It seems like a lot of trouble to go to to look up the
> forwarding decision again in user-space (and check to see that
> it hasn't changed in the interim) when you already have it
> right there, at the very moment that the packet is forwarded.
> What reference would you pass up, anyway? The hash key? I
> guess I'm not understanding what's wrong with doing it in the
> datapath. Is there another spin-lock problem here, or is it
> more a general rule-of-thumb that you always try to minimize
> changes there?
I don't think that there's anything wrong with doing it in the
datapath per se. It is a perfectly good and straightforward way
to implement it.
The reason that I want to do almost everything in userspace is
that I think that we are working to make our code acceptable for
the upstream Linux networking stack. One of the criteria for
putting something into the kernel is that it can't be acceptably
done in userspace. Often, when developer A tries to add a kernel
feature, developer B will point out that it can be done just as
well in userspace. And then the onus is on developer A to show
that B is wrong for some reason, for performance, resource
isolation, security, etc. In general, the simpler the mechanism
and the less code you add, the easier it is to show that you
can't do it from userspace. So I don't want to add anything we
don't absolutely have to the kernel datapath. Perhaps it will,
for some reason, turn out that some part of it can't be done from
userspace acceptably. Then it'll make sense to add it to the
datapath.
A lesser reason is that we have multiple datapath implementations
now and will have more in the future. The more we do in the
datapath, the more work it becomes to maintain each of these
implementations.
I was thinking along the lines of using a hash to identify the
flow information along with some way of deciding when we can
throw away old flow information. I don't have a specific design.
> It's an interesting property of sFlow that you don't have to be
> particularly close to the intended mean. If the collector
> is implemented correctly it will ignore the advertised number and
> compute the "effective sampling rate" by looking at the number of
> samples actually received and comparing it to the change in the
> value of the sample_pool. That way you are more or less immune
> to packet loss in transit, and that's why sFlow can get away with
> using UDP as the transport. That's also why it's important to
> increment the sample_pool right at the "coal-face": it allows for
> samples to be dropped/lost at any point after that, even on the
> queue up to user-space. Is that going to be a problem for the
> per-CPU thing? How expensive is an atomic increment on a
> 32-bit number? Do we need each CPU to increment it's own
> sample_pool and then always add them together to synthesize
> the total at the point where a packet-sample is sent?
Based on everything I've seen and heard, atomic operations are
very expensive on SMP, so that per-CPU increments and summing the
per-CPU counters when necessary is almost certainly cheaper. I
don't have specific performance numbers.
I don't expect you to implement this. I'm happy to do it. The
kernel has nice primitives for it.
>>>>> +/* sFlow library callback to allocate memory. */
>>>>> +static void *
>>>>> +sflow_agent_alloc_cb(void *magic, SFLAgent *agent, size_t bytes)
>>>>> +{
>>>>> + return calloc(1, bytes);
>>>>> +}
>>>>> +
>>>>
>>>> I guess that the sFlow library gracefully handles allocation
>>>> failure? If not, there is also an xcalloc() function that will
>>>> abort on allocation failure.
>>>>
>>>
>>> The sFlow library only ever allocates memory using a callback,
>>> so we can change it to use xcalloc in:
>>> ofproto.c:sflow_agent_alloc_cb()
>>
>> Well, I guess the question is which is a better choice. Probably
>> it doesn't matter much.
>
> Better to use xcalloc, because I doubt if the sFlow library handles
> a failure gracefully.
OK.
More information about the dev
mailing list