Ranking and use of cube by in an SQL query?
1
vote
0
answers
134
views
I am a the MSc student who does not understand this following SQL dilemma and want to be certain from an Academic standpoint.
The entire “Foodmart” database can be downloaded from: http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/foodmart_mysql.sql.zip
In short, is my SQL syntax valid in interpreting the question correctly? I put the questions in reverse order from 4.3a 4.3b 4.2 and 4.1 so they decrease with difficulty.
Question 4.3 Produce a report with information about the
A) profit and ratioP of each product family, store country, year and gender; the ratioP is between profit and the store sales in the gender category (Male or Female)
B) the profit of each subset of the above combination
My syntax is:
-- 4.3a
select PC.product_family , ST.store_country , TBD.the_year as year ,
C.gender , (sum(store_sales) - sum(store_cost)) as total_profit ,
((sum(store_sales) - sum(store_cost)) / sum(store_sales)) as perc_profit
from sales_fact S
left join product P ON S.product_id = P.product_id
left join product_class PC ON P.product_class_id = PC.product_class_id
left join store ST ON S.store_id = st.store_id
left join time_by_day TBD ON S.time_id = TBD.time_id
left join customer C ON S.customer_id = C.customer_id
group by PC.product_family, ST.store_country, TBD.the_year,
C.gender ;
-- 4.3b
select PC.product_family , ST.store_country , ST.store_city ,
TBD.the_year as year , TBD.the_month as month , TBD.the_day as day ,
C.gender , (sum(store_sales) - sum(store_cost)) as total_profit
from sales_fact S
left join product P ON S.product_id = P.product_id
left join product_class PC ON P.product_class_id = PC.product_class_id
left join store ST ON S.store_id = st.store_id
left join time_by_day TBD ON S.time_id = TBD.time_id
left join customer C ON S.customer_id = C.customer_id
group by PC.product_family, ST.store_country, ST.store_city,
TBD.the_year, TBD.the_month, TBD.the_day, C.gender ;
select PC.product_family , ST.store_city , TBD.the_year as year ,
sum(store_sales) as total_sales , (sum(store_sales) - sum(store_cost)) as total_profit ,
((sum(store_sales) - sum(store_cost)) / sum(store_sales)) as perc_profit
from sales_fact S
left join product P ON S.product_id = P.product_id
left join product_class PC ON P.product_class_id = PC.product_class_id
left join store ST ON S.store_id = ST.store_id
left join time_by_day TBD ON S.time_id = TBD.time_id
group by PC.product_family, ST.store_city, TBD.the_year ;
Could there be any merit, necessity, or efficiency in using cube by or roll-up functions in this query?
A university Professor suggests:
You do not need cube by. The queries you mention have nothing to do with cubes or rollup
They are simply group by and many combinations of the intermediate results.
"They are simply group by and many combinations of the intermediate results."
"With group by having and aggregate functions like max sum etc. can achieve what is asked. You just need to create a strategy (plan) on how to compute it and then do the select project join queries that do that."
“With group by having and aggregate functions like max sum etc. you can achieve what is asked. You just need to create a strategy (plan) on how to compute it and then do the select project join queries that do that.”
“It is not something done it 2 min … It needs some thinking (but that does not make it hard).”
I have other questions below but am grateful for your reply or anyone else you deem suitable!
Here are the other 2 questions on the same database with my syntax:
4.2 For every product family and customer country and gender return the customer with the highest store sales of the product family. If there are two or more customers with the same store sales, pick up any.
-- 4.2
select distinct
C.lname,
C.fname,
PC.product_family,
C.country,
C.gender,
MAX(S.store_sales)
from
sales_fact S
left join product P on S.product_id = P.product_id
c left join product_class PC on P.product_class_id = PC.product_class_id
left join customer C on S.customer_id = C.customer_id
group by
PC.product_family, C.country, C.gender
;
The university Professor suggests only grouping by year?:
you group by (product family, customer country, gender, customer)
in the select clause you put
product family, customer country, gender, customer, sum(sakes)
Now you know the sales of each customer.
Then you use that table, to identify the max sales and select the tuples that have such a value.
Possible error?
Is there also for example the requirement of the use of a ranking or anything else for that matter?
4.1 Return a table where you list product family, store city and year, with the corresponding total sales
and profit, and the percentage of profit over the total sales of the year.
-- 4.1
select
PC.product_family
, ST.store_city
, TBD.the_year as year
, sum(store_sales) as total_sales
, (sum(store_sales) - sum(store_cost)) as total_profit
, ((sum(store_sales) - sum(store_cost)) / sum(store_sales)) as perc_profit
from
sales_fact S
left join product P on S.product_id = P.product_id
left join product_class PC on P.product_class_id = PC.product_class_id
left join store ST on S.store_id = ST.store_id
left join time_by_day TBD on S.time_id = TBD.time_id
group by
PC.product_family, ST.store_city, TBD.the_year
;
you simply group by year and then you do
in the select clause
sum(sales), sum(profit), sum(sales)/sum(profits)
The from clause should have the join of all the tables containing the needed information
That is all.
I do not only group by year as I see also PC.product_family, ST.store_city in my syntax. Possible error?
Could there also, for example, in the Select be the error of the inclusion of attributes not present in the group by?
Many thanks people for this help,
Tim
Asked by Timothy
(11 rep)
Mar 26, 2021, 03:13 PM
Last activity: Mar 26, 2021, 04:51 PM
Last activity: Mar 26, 2021, 04:51 PM