Discrete Derivative over time series data using PostgreSQL
I use FluxMon to query statistics from the inverter hooked up to our solar collectors. One of the values the inverter spits out is today's energy production in Wh. This value is reset every day and counts up to a certain value. Fair enough, but I'd be way more interested in the derivative of that value, which would give an indication of how much power is being produced per hour or so.
So first of all, we get the value for the etoday and the etotal variables like we did before:
select cm.check_id, cm.measured_at, avg(cm.value) filter (where sv.name = 'etoday') as etoday, avg(cm.value) filter (where sv.name = 'etotal') as etotal from monitoring_checkmeasurement cm inner join monitoring_sensorvariable sv on variable_id=sv.id where cm.check_id=641 group by cm.check_id, cm.measured_at order by cm.measured_at ;
Result:
check_id | measured_at | etoday | etotal ----------+------------------------+--------------+-------------- 641 | 2015-08-15 18:55:00+02 | 34619.17 | 16529491.15 641 | 2015-08-15 19:00:00+02 | 34803.77 | 16529674.78 ... 641 | 2015-08-15 21:10:00+02 | 35528 | 16530398 641 | 2015-08-15 21:15:00+02 | 35528 | 16530398 641 | 2015-08-16 06:25:00+02 | 0 | 16530398 641 | 2015-08-16 06:30:00+02 | 0 | 16530398 ... 641 | 2015-08-16 07:25:00+02 | 0 | 16530398 641 | 2015-08-16 07:30:00+02 | 0 | 16530398 641 | 2015-08-16 07:35:00+02 | 0.99 | 16530398.99 641 | 2015-08-16 07:40:00+02 | 3.98 | 16530401.98
You can clearly see the value increasing up to a certain point, then being abruptly reset (and there's a big hole in the measured_at timestamps too), and the process starting over again.
It looks like to get the derivative of this field, we need to apply the lag
function, which is a window function and therefore requires a window to be defined. I have no idea how to do that, but luckily, windows can also be applied to other aggregate functions like avg
, so I should be able to rewrite the above statement to do the same thing, but use a window instead.
After quite a lot of fiddling, I ended up with this:
select energeh.check_id, energeh.measured_at, energeh.etoday, lag(energeh.etoday, 1) over (partition by date_trunc( 'day', measured_at ) order by measured_at) as etoday_lag, energeh.etotal, lag(energeh.etotal, 1) over (order by measured_at) as etotal_lag from ( select cm.check_id, cm.measured_at, avg(cm.value) filter (where sv.name = 'etoday') as etoday, avg(cm.value) filter (where sv.name = 'etotal') as etotal from monitoring_checkmeasurement cm inner join monitoring_sensorvariable sv on variable_id=sv.id where cm.check_id=641 group by cm.check_id, cm.measured_at order by cm.measured_at ) as energeh;
I'm not quite sure the subselect is actually necessary, but it's the best I could come up with so far. Anyway, now I get this result:
check_id | measured_at | etoday | etoday_lag | etotal | etotal_lag ----------+------------------------+--------------+--------------+--------------+-------------- 641 | 2015-08-15 18:55:00+02 | 34619.17 | | 16529491.15 | 641 | 2015-08-15 19:00:00+02 | 34803.77 | 34619.17 | 16529674.78 | 16529491.15 ... 641 | 2015-08-15 21:10:00+02 | 35528 | 35528 | 16530398 | 16530398 641 | 2015-08-15 21:15:00+02 | 35528 | 35528 | 16530398 | 16530398 641 | 2015-08-16 06:25:00+02 | 0 | | 16530398 | 16530398 641 | 2015-08-16 06:30:00+02 | 0 | 0 | 16530398 | 16530398 ... 641 | 2015-08-16 07:25:00+02 | 0 | 0 | 16530398 | 16530398 641 | 2015-08-16 07:30:00+02 | 0 | 0 | 16530398 | 16530398 641 | 2015-08-16 07:35:00+02 | 0.99 | 0 | 16530398.99 | 16530398 641 | 2015-08-16 07:40:00+02 | 3.98 | 0.99 | 16530401.98 | 16530398.99
Note that I need two separate windows for etoday and etotal. This is because the etoday value is reset every day, but the etotal field is not. Hence, I must prevent the last value of etoday from being transferred into the new day, whereas the etoday value must be transferred. That's why the etoday_lag column is empty in the 2015-08-16 06:25:00+02 row above.
So now, getting the derivative is a pretty easy thing to do, just substract the values and divide by the interval (300 seconds in this case):
select energeh.check_id, energeh.measured_at, energeh.etoday, lag(energeh.etoday, 1) over (partition by date_trunc( 'day', measured_at ) order by measured_at) as etoday_lag, (energeh.etoday - lag(energeh.etoday, 1) over (partition by date_trunc( 'day', measured_at ) order by measured_at)) / 300 as etoday_diff, energeh.etotal, lag(energeh.etotal, 1) over (order by measured_at) as etotal_lag, (energeh.etotal - lag(energeh.etotal, 1) over (order by measured_at)) / 300 as etotal_diff from ( select cm.check_id, cm.measured_at, avg(cm.value) filter (where sv.name = 'etoday') as etoday, avg(cm.value) filter (where sv.name = 'etotal') as etotal from monitoring_checkmeasurement cm inner join monitoring_sensorvariable sv on variable_id=sv.id where cm.check_id=641 group by cm.check_id, cm.measured_at order by cm.measured_at ) as energeh;
Result:
check_id | measured_at | etoday | etoday_lag | etoday_diff | etotal | etotal_lag | etotal_diff ----------+------------------------+--------------+--------------+----------------------+--------------+--------------+---------------------- 641 | 2015-08-15 18:55:00+02 | 34619.17 | | | 16529491.15 | | 641 | 2015-08-15 19:00:00+02 | 34803.773333 | 34619.17 | 0.61534444333333 | 16529674.78 | 16529491.15 | 0.612099999996523 ... 641 | 2015-08-15 21:10:00+02 | 35528 | 35528 | 0 | 16530398 | 16530398 | 0 641 | 2015-08-15 21:15:00+02 | 35528 | 35528 | 0 | 16530398 | 16530398 | 0 641 | 2015-08-16 06:25:00+02 | 0 | | | 16530398 | 16530398 | 0 641 | 2015-08-16 06:30:00+02 | 0 | 0 | 0 | 16530398 | 16530398 | 0 ... 641 | 2015-08-16 07:25:00+02 | 0 | 0 | 0 | 16530398 | 16530398 | 0 641 | 2015-08-16 07:30:00+02 | 0 | 0 | 0 | 16530398 | 16530398 | 0 641 | 2015-08-16 07:35:00+02 | 0.99 | 0 | 0.0033 | 16530398.99 | 16530398 | 0.00330000000074506 641 | 2015-08-16 07:40:00+02 | 3.98 | 0.99 | 0.00996666666666667 | 16530401.98 | 16530398.99 | 0.00996666666741172 ... 641 | 2015-08-16 08:20:00+02 | 37.94 | 28.98 | 0.0298666666666667 | 16530435.94 | 16530426.98 | 0.0298666666634381 641 | 2015-08-16 08:25:00+02 | 47.933333333 | 37.94 | 0.03331111111 | 16530445.933 | 16530435.94 | 0.0333100000023842 641 | 2015-08-16 08:30:00+02 | 52.966666667 | 47.933333333 | 0.01677777778 | 16530450.967 | 16530445.933 | 0.0167799999999503 641 | 2015-08-16 08:35:00+02 | 56.973333333 | 52.966666667 | 0.0133555555533333 | 16530454.973 | 16530450.967 | 0.0133533333304028
Unfortunately, it's been a rainy day, so the energy production today totally sucks ass — but the query seems to work. I'd just like to improve the way the interval is being handled, using a hardcoded / 300
seems a bit hacky.