Sample Header Ad - 728x90

Potential issue with aggregation result (std deviation and variance)

2 votes
1 answer
23 views
I am running a query against the following dataset: https://www.kaggle.com/datasets/census/population-time-series-data . Here is the code: year_in_mili = 31536000000 ts = store.get_container("population") query = ts.query("select * from population where value > 280000") rs = query.fetch() data = rs.next() timestamp = calendar.timegm(data.timetuple()) gsTS = (griddb.TimestampUtils.get_time_millis(timestamp)) time = datetime.datetime.fromtimestamp(gsTS/1000.0) added = gsTS + (year_in_mili * 7) addedTime = datetime.datetime.fromtimestamp(added/1000.0) variance = ts.aggregate_time_series(time, addedTime, griddb.Aggregation.VARIANCE, "value") print("VARIANCE: ", variance.get(griddb.Type.DOUBLE)) stdDev = ts.aggregate_time_series(time, addedTime, griddb.Aggregation.STANDARD_DEVIATION, "value") print("STANDARD DEVIATION: ", stdDev.get(griddb.Type.LONG)) The results for all are correct except for stdDev and variance TOTAL: 48714984 AVERAGE: 289970 VARIANCE: -84078718183.5204 STANDARD DEVIATION: -9223372036854775808 COUNT 168 WEIGHTED AVERAGE: 289970 Obviously they should not be negative, but I did run the numbers in excel. These are the results: var: 31045317.27 std dev: 5571.832488
Asked by Nooruddin Lakhani (149 rep)
Mar 11, 2024, 10:52 AM
Last activity: Mar 11, 2024, 12:30 PM