What is the most optimal way to query in Hive (Datalake, based on hdfs)?
Establishing filters in the tables prior to join them,
select* from
(select code from table_1 where type="a") a
inner join
(select code from table_2
where type="a") b
on a.code=b.code
Or this way? In
where
condition.
Select *
From table_1 inner join table_2 on table_1.codigo=table_2.codigo
Where table_1.type="a" and
Table_2.type="a".
Perhaps the most obvious and quickest answer is the first way. But I think that with HDFS the environment is optimized in such a way that it reads the "where" first and then the "join", I mean, HDFS brings an internal code optimization.
Asked by cfsl
(1 rep)
Jan 31, 2024, 09:46 PM
Last activity: Jan 31, 2024, 10:20 PM
Last activity: Jan 31, 2024, 10:20 PM