Using Postgres to return a selection based on frequency distribution

0 votes

1 answer

256 views

I am not even sure if this is a possibility so my apologies if I'm barking up the wrong tree; I am ignorant of Postgres's limitations as I am self-taught. This is hypothetical, but if I had a table like so: **dice_rolls** | roll_1 | roll_2 | sum | | -------- | ------- |------| | 1 | 1 | 2 | | 1 | 2 | 3 | | 1 | 3 | 4 | | 1 | 4 | 5 | | 1 | 5 | 6 | | 1 | 6 | 7 | | 2 | 1 | 3 | | 2 | 2 | 4 | | 2 | 3 | 5 | | 2 | 4 | 6 | | 2 | 5 | 7 | | 2 | 6 | 8 | | 3 | 1 | 4 | | 3 | 2 | 5 | | 3 | 3 | 6 | | 3 | 4 | 7 | | 3 | 5 | 8 | | 3 | 6 | 9 | | 4 | 1 | 5 | | 4 | 2 | 6 | | 4 | 3 | 7 | | 4 | 4 | 8 | | 4 | 5 | 9 | | 4 | 6 | 10 | | 5 | 1 | 6 | | 5 | 2 | 7 | | 5 | 3 | 8 | | 5 | 4 | 9 | | 5 | 5 | 10 | | 5 | 6 | 11 | | 6 | 1 | 7 | | 6 | 2 | 8 | | 6 | 3 | 9 | | 6 | 4 | 10 | | 6 | 5 | 11 | | 6 | 6 | 12 | we can see that the value '7' for sum appears most frequently. If we plotted the frequency distribution of the sum, it would look something like this:

x
            x  x  x
         x  x  x  x  x  
      x  x  x  x  x  x  x 
   x  x  x  x  x  x  x  x  x
x  x  x  x  x  x  x  x  x  x  x

2  3  4  5  6  7  8  9  10 11 12

Therefore, we can assume that a truly random selection of this dataset will follow this distribution pattern. What I'm interested in is psuedo-random DISTINCT selection of N rows that attempts to normalize the distribution based on the sum column. E.g.,

SELECT * FROM dice_rolls ORDER BY RANDOM() LIMIT 5;

is what I have now. This query is most likely to contain rows where the sum is 5, 6, 7, 8, 9. How would I go about using a custom ORDER BY function that would be significantly more likely (or even *strictly*) to, return rows where the sum has an even chance of being 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12?

Asked by PoorBob (101 rep)

Jan 20, 2021, 07:51 PM
Last activity: May 29, 2025, 09:04 PM

Using Postgres to return a selection based on frequency distribution

Related Questions