we have hadoop cluster ( based on ambari platform ) , when hive metastore installed on two machines
Sometime during job running ( we runs the queries from presto ) we saw job failure due to heap size. ( metastore heap size )
in our case It's because of metastore heap issue
from the metastore logs we can see the following:
2021-12-13 01:39:23,145 INFO [org.apache.hadoop.hive.common.JvmPauseMonitor$Monitor@1595ec02]: common.JvmPauseMonitor (JvmPauseMonitor.java:run(193)) - Detected pause in JVM or host machine (eg GC): pause of approximately 3263ms
No GCs detected
so we increase the metastore heap size from 2G to 4G
**but the Question is how to know what is the right size of MetaStore heap and according to what**
for example if we compare it to HDFS heap size of namenode
we can say that namenode heap size should be according to number of files that namenode is managed
but what is the same concept for metastore heapsize ?
how to calculate the metastore heap size ?
reference - https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_hive_tuning.html
https://docs.informatica.com/data-engineering/data-engineering-integration/10-1/_user-guide_big-data-management_10-1_ditamap/connections/hive_connection_properties.html
https://stackoverflow.com/questions/56363736/hive-too-many-connection-to-postgresql-db
https://docs.microsoft.com/en-us/azure/databricks/kb/metastore/hive-metastore-troubleshooting
https://www.linkedin.com/pulse/hive-metastore-hcatalog-hcat-haotian-zhang/
Asked by yael
(13936 rep)
Dec 23, 2021, 12:53 PM