Potential issue with aggregation result (std deviation and variance)
2
votes
1
answer
23
views
I am running a query against the following dataset: https://www.kaggle.com/datasets/census/population-time-series-data .
Here is the code:
year_in_mili = 31536000000
ts = store.get_container("population")
query = ts.query("select * from population where value > 280000")
rs = query.fetch()
data = rs.next()
timestamp = calendar.timegm(data.timetuple())
gsTS = (griddb.TimestampUtils.get_time_millis(timestamp))
time = datetime.datetime.fromtimestamp(gsTS/1000.0)
added = gsTS + (year_in_mili * 7)
addedTime = datetime.datetime.fromtimestamp(added/1000.0)
variance = ts.aggregate_time_series(time, addedTime, griddb.Aggregation.VARIANCE, "value")
print("VARIANCE: ", variance.get(griddb.Type.DOUBLE))
stdDev = ts.aggregate_time_series(time, addedTime, griddb.Aggregation.STANDARD_DEVIATION, "value")
print("STANDARD DEVIATION: ", stdDev.get(griddb.Type.LONG))
The results for all are correct except for stdDev and variance
TOTAL: 48714984
AVERAGE: 289970
VARIANCE: -84078718183.5204
STANDARD DEVIATION: -9223372036854775808
COUNT 168
WEIGHTED AVERAGE: 289970
Obviously they should not be negative, but I did run the numbers in excel. These are the results:
var: 31045317.27
std dev: 5571.832488
Asked by Nooruddin Lakhani
(149 rep)
Mar 11, 2024, 10:52 AM
Last activity: Mar 11, 2024, 12:30 PM
Last activity: Mar 11, 2024, 12:30 PM