Is it necessary to optimize join in hdfs?

0 votes

0 answers

49 views

                          What is the most optimal way to query in Hive (Datalake, based on hdfs)?
Establishing filters in the tables prior to join them, 

     
    select* from
    (select code from table_1 where type="a") a 
    inner join 
    (select code from table_2
    where type="a") b 
    on a.code=b.code

Or this way? In where condition.

    Select *
    From table_1 inner join table_2 on table_1.codigo=table_2.codigo
    Where table_1.type="a" and
    Table_2.type="a".

Perhaps the most obvious and quickest answer is the first way. But I think that with HDFS the environment is optimized in such a way that it reads the "where" first and then the "join", I mean, HDFS brings an internal code optimization. 
                        

Asked by cfsl (1 rep)

Jan 31, 2024, 09:46 PM
Last activity: Jan 31, 2024, 10:20 PM

Is it necessary to optimize join in hdfs?

Related Questions