Sample Header Ad - 728x90

Is it necessary to optimize join in hdfs?

0 votes
0 answers
49 views
What is the most optimal way to query in Hive (Datalake, based on hdfs)? Establishing filters in the tables prior to join them, select* from (select code from table_1 where type="a") a inner join (select code from table_2 where type="a") b on a.code=b.code Or this way? In where condition. Select * From table_1 inner join table_2 on table_1.codigo=table_2.codigo Where table_1.type="a" and Table_2.type="a". Perhaps the most obvious and quickest answer is the first way. But I think that with HDFS the environment is optimized in such a way that it reads the "where" first and then the "join", I mean, HDFS brings an internal code optimization.
Asked by cfsl (1 rep)
Jan 31, 2024, 09:46 PM
Last activity: Jan 31, 2024, 10:20 PM