Distinct based on window function (sort curiosity or wrong problem approach)
0
votes
1
answer
84
views
A lot of values are stored in
values
. The data is in a hierarchical order, so there could be a probe, on which some measurements have been performed. Many points have been archived with strain and stress for example. So in the values
table all the strain/stress values belongs to different objects.
The relation to the probe is stored in another table. In this example the hierarchical order is neglected.
I would like to calculate something on the probe-level, based on the measurements. So i want to group all the stress/strain values in a json, so it is clear they belong together.
Now there should be a second measurement with strain and stress. And i want to calculate values based on all values, but grouped by strain. So there should be a value calculated by the stress, where strain = 2
(for example) of both measurements (the additional level in the hierarchy coming from multiple measurements is neglected too). Yes, and that's not enough: i want to be able to group by multiple values.
In multiple rounds i calculate values based on values of previous rounds.
I made a little example :
- complex
is a table containing the model info for grouping the data.
complex_in
contains the inputs for the aggregated datatype (->json)
complex_groupcond
contains the information, by which subvalues (in the json) the calculation should be grouped by values
(contains the values in json format).
- the second next block is just for inserting initial values and grouping models).
- the third block simulate the first round, building the first complex data_type).
- In the fourth block i have the basics of my query, where the values get joined with the inputs for the complex data type and get grouped by them. Important is the window function for getting the condition of the group by out of the json and the distinct on the same Partition.
So I have some questions (now there is more data so the performance should be improved):
Sort Key: "values".name, "values".value_id, complex_in.complex_id
Presorted Key: "values".name, "values".value_id
1. The system uses the index complex_in_pkey
for the join of the values
and the complex_in
. Why the system does not recognize, that everything should already be sorted with complex_in.complex_id
in the next place (because complex_id
is in the index of complex_in_pkey
)?
2. Is one approach better, where i join the (prejoined) values
and complex_in
with a pre grouped table of complex_cond
, so the rows of the prejoin
cant get multiplied with this join. But now its more difficult to extract the grouping condition out of the the complex value (json).
3. Or would be one approach better to leave all the values single and track the hierarchical order where the values comes from. So in the calculation one could order by this hierarchical knowledge. (Not finished thinking about the grouping condition in that case.)
If your are a German speaker, perhaps this this description will be clearer.
To make the question more clear:
1. Why the system does not recognize the data is already sorted by "values".name, "values".value_id, complex_in.complex_id? (or isn't that true and why?)
2. Is a join (with complex_groupcond
) better on a unique key, with the a usage of a new special function (for extracting the data)?
3. Do you have a better idea to approach a problem like that?
Asked by lrahlff
(1 rep)
Mar 6, 2024, 09:01 AM
Last activity: Mar 7, 2024, 04:39 AM
Last activity: Mar 7, 2024, 04:39 AM