Sample Header Ad - 728x90

Create Postgres hash aggregate function

4 votes
1 answer
3881 views
Postgres: 15.1 My goal is simple: I want to create an aggregated hash value for x rows from column c in table z. Example Input:
SELECT hash_agg(c) AS checksum FROM z;
Example Output:
checksum
51jj5l1jl55
Unfortunately, this seems to be such an irrelevant use case, that it's not implemented by default. To be more precise: I got the following function to aggregate all my values and hash them with md5:
SELECT md5(string_agg(c, '')) AS checksum FROM z;
This may work with around 5 rows, but I am expecting to crash the function due to the high input. It may also be inadequate for hashing a lot of data. EDIT: I did manage to build an md5 aggregate function that does not need to build an extreme big array or string beforehand. (Please correct me if array_agg() can handle a million rows without being a memory concern) As @Erwin Brandstetter points out, md5 might not be the best solution in terms of speed and i would love to use everything more efficient if the implementation of that algorithm is open source. It also seems to be a waste of processing power to rehash the existing state
`
md5(state || input::bytea)
` If there is a way to 'update' the existing hash with new values, as if it would have been the same as if hashing the whole dataset from the beginning, i would be grateful. (And if it is not possible in Postgres, please tell me so i don't have to look any further)
DROP AGGREGATE IF EXISTS md5_agg(anyelement);
DROP FUNCTION IF EXISTS md5_agg_state_func;

CREATE OR REPLACE FUNCTION md5_agg_state_func(state bytea, input anyelement)
  RETURNS bytea AS $$
BEGIN
  IF input IS NOT NULL THEN
    RETURN md5(state || input::bytea);
  ELSE
    RETURN state;
  END IF;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

CREATE OR REPLACE FUNCTION md5_agg_final_func(state bytea)
  RETURNS text AS $$
BEGIN
  RETURN encode(state, 'escape');
END;
$$ LANGUAGE plpgsql IMMUTABLE;


CREATE OR REPLACE AGGREGATE md5_agg(anyelement)
(
  sfunc = md5_agg_state_func,
  stype = bytea,
  finalfunc = md5_agg_final_func,
  initcond = ''
);
Asked by Yuki (41 rep)
Jun 28, 2023, 10:10 PM
Last activity: Jun 29, 2023, 10:06 AM