Postgres: 15.1
My goal is simple: I want to create an aggregated hash value for x rows from column c in table z.
Example Input:
SELECT hash_agg(c) AS checksum FROM z;
Example Output:
checksum
51jj5l1jl55
Unfortunately, this seems to be such an irrelevant use case, that it's not implemented by default.
To be more precise: I got the following function to aggregate all my values and hash them with md5:
SELECT md5(string_agg(c, '')) AS checksum FROM z;
This may work with around 5 rows, but I am expecting to crash the function due to the high input.
It may also be inadequate for hashing a lot of data.
EDIT:
I did manage to build an md5 aggregate function that does not need to build an extreme big array or string beforehand.
(Please correct me if array_agg() can handle a million rows without being a memory concern)
As @Erwin Brandstetter points out, md5 might not be the best solution in terms of speed and i would love to use everything more efficient if the implementation of that algorithm is open source. It also seems to be a waste of processing power to rehash the existing state
`
md5(state || input::bytea)
`
If there is a way to 'update' the existing hash with new values, as if it would have been the same as if hashing the whole dataset from the beginning, i would be grateful.
(And if it is not possible in Postgres, please tell me so i don't have to look any further)
DROP AGGREGATE IF EXISTS md5_agg(anyelement);
DROP FUNCTION IF EXISTS md5_agg_state_func;
CREATE OR REPLACE FUNCTION md5_agg_state_func(state bytea, input anyelement)
RETURNS bytea AS $$
BEGIN
IF input IS NOT NULL THEN
RETURN md5(state || input::bytea);
ELSE
RETURN state;
END IF;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
CREATE OR REPLACE FUNCTION md5_agg_final_func(state bytea)
RETURNS text AS $$
BEGIN
RETURN encode(state, 'escape');
END;
$$ LANGUAGE plpgsql IMMUTABLE;
CREATE OR REPLACE AGGREGATE md5_agg(anyelement)
(
sfunc = md5_agg_state_func,
stype = bytea,
finalfunc = md5_agg_final_func,
initcond = ''
);
Asked by Yuki
(41 rep)
Jun 28, 2023, 10:10 PM
Last activity: Jun 29, 2023, 10:06 AM
Last activity: Jun 29, 2023, 10:06 AM