fluxmon-status-update-2015

Svedrin

2015-10-17 19:49

I've been working on Fluxmon for quite a while now, so I thought it was time to give a little status update. I've done quite a lot of research in the meantime, upon which I can now base the conclusion that I don't know jack shit about what I'm doing.

The hardest issue is figuring out what it is that I actually want to do.

The first commit in my repo dates back to August 9th, 2013. Since then I collected a bunch of data and did all kinds of statistics stuff to that data. I figured out that RRDtool's Holt-Winters stuff is not really up to the task without extensive tuning, which is exactly what I want to avoid. I tried applying different kinds of distributions to it, but mostly failed in getting anything out of the data that looks relevant. I also engaged with the Freifunk guys in Fulda and learned that it's pretty easy to generate irrelevant data by accidentally amplifying traffic by the number of nodes, which doesn't help anyone.

Then I deployed meshping in a friend's WiFi network and found that his desktop box has a minimum ping of 1.95ms, an average of 781.10ms and a maximum of 21685.33ms. Thing is: I have no clue about how the ping is distributed, so I can't tell whether or not we need to do something about it (other than "this feels kinda slow", which obviously sucks).

Now recently, the internet tought me that a normal distribution most likely doesn't apply (which explains why I haven't gotten anything sensible out of my data using it). I need a histogram of my data to see its real structure, and I need to graph those histograms over time intervals to see the development over time.

Implementing a histogram is totally easy — in theory.

In practice, it requires storing every single data point. So suppose you're sending one ping every 30 seconds to 16 hosts for two days for a sensible measurement. Then that's already 46080 data points which you have to store twice (once sorted by time, once sorted by value). This approach also means memory costs scale in O(n), which (again) obviously sucks.

If you don't feel like doing that (and I didn't), you'll need to bucket the data somehow and store counters for the buckets. This reduces complexity to O(1), but only at the cost of introducing error, which may or may not be a problem. But also it requires you to know the minimum and maximum values you're ever going to encounter, in other words the dynamic range of your signal needs to be known. I'd like Fluxmon to be able to figure that out on its own, so having to know it beforehand doesn't exactly help. Some people suggest using logarithmic buckets. While this makes sense, it doesn't really make this stuff any simpler, and it's far from easy in the first place.

This is way harder than I'd have ever imagined.