Sample Header Ad - 728x90

Store millions of rows of denomalized data or some SQL magic?

8 votes
4 answers
1027 views
My DBA experience doesn't go much further than simple storage + retrieval of CMS style data - so this may be a silly question, I don't know! I have a problem whereby I need to lookup or calculate holiday prices for a certain group size and a certain number of days within a certain time period. E.g.: ***How much is a hotel room for 2 people for 4 nights anytime in January?*** I have pricing and availability data for, say, 5000 hotels stored like so: Hotel ID | Date | Spaces | Price PP ----------------------------------- 123 | Jan1 | 5 | 100 123 | Jan2 | 7 | 100 123 | Jan3 | 5 | 100 123 | Jan4 | 3 | 100 123 | Jan5 | 5 | 100 123 | Jan6 | 7 | 110 456 | Jan1 | 5 | 120 456 | Jan2 | 1 | 120 456 | Jan3 | 4 | 130 456 | Jan4 | 3 | 110 456 | Jan5 | 5 | 100 456 | Jan6 | 7 | 90 With this table, I can do a query like so: SELECT hotel_id, sum(price_pp) FROM hotel_data WHERE date >= Jan1 and date = 2 GROUP BY hotel_id HAVING count(*) = 4; results hotel_id | sum ---------------- 123 | 400 The HAVING clause here makes sure that there is an entry for every single day between my desired dates that has the spaces available. ie. Hotel 456 had 1 space available on Jan2, the HAVING clause would return 3, so we don't get a result for hotel 456. So far so good. However, is there a way to find out all the 4 night periods in January where there is space available? We could repeat the query 27 times - incrementing the dates each time, which does seem a little bit awkward. Or another way around could be to store all possible combinations in a lookup table like so: Hotel ID | total price pp | num_people | num_nights | start_date ---------------------------------------------------------------- 123 | 400 | 2 | 4 | Jan1 123 | 400 | 2 | 4 | Jan2 123 | 400 | 2 | 4 | Jan3 123 | 400 | 3 | 4 | Jan1 123 | 400 | 3 | 4 | Jan2 123 | 400 | 3 | 4 | Jan3 And so on. We'd have to limit max number of nights, and the max number of people we would search for - e.g. max nights = 28, max people = 10 (limited to the number of spaces available for that set period starting on that date). For one hotel, this could give us 28*10*365=102000 outcomes per year. 5000 hotels = 500m outcomes! But we'd have a very simple query to find the cheapest 4 night stay in Jan for 2 people: SELECT hotel_id, start_date, price from hotel_lookup where num_people=2 and num_nights=4 and start_date >= Jan1 and start_date <= Jan27 order by price limit 1; Is there a way to perform this query on the initial table without having to generate the 500m row lookup table!? e.g. generate the 27 possible outcomes in a temporary table or some other such inner query magic? At the moment all data is held in a Postgres DB - if needs be for this purpose we can move the data out to something else more suitable? Not sure if this type of query fits the map/reduce patterns for NoSQL style DBs ...
Asked by Guy Bowden (223 rep)
Jun 17, 2014, 12:37 PM
Last activity: May 3, 2020, 12:33 AM