Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes

1 answers

32 views

Cannot get Impala to start services

I'm colaborating on a deployment which aims to provide Hadoop, Hive and Impala for learning and teaching pruposes. We use Ubuntu 22.04 as a base system on a VM. While all related to HDFS, Hadoop and Hive are working fine (including PostgreSQL for Hive Metastore), Impala installation is being a very hard challenge. I applied the commands recommended in [Apache's Documentation](https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala) in order to build Impala, and then ran make install. After building, when trying to run catalogd service, it doesn't start, logging the following:

cat /tmp/catalogd.pc.hadoop.log.ERROR.20250311-114741.4782
Log file created at: 2025/03/11 11:47:41
Running on machine: pc
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0311 11:47:41.450573  4782 logging.cc:256] stderr will be logged to this file.
E0311 11:48:10.130164  4782 catalog.cc:101] NoSuchMethodError: org.apache.hadoop.hive.metastore.IMetaStoreClient.getThriftClient()Lorg/apache/hadoop/hive/metastore/api/ThriftHiveMetastore$Client;
. Impalad exiting.
Picked up JAVA_TOOL_OPTIONS: -Dsun.java.command=catalogd

statestored and admissiond are working. Using Impala commit 34b17db7b473d6729ac6c9cf139fcf410f18d941 and Hive 4.0.1 pre-built.

manu_romero_411 (13 rep)

Mar 11, 2025, 10:55 AM • Last activity: Mar 11, 2025, 12:07 PM

1 votes

0 answers

1105 views

how to set the value of Hive Metastore heap size

rhel java apache-hive

we have hadoop cluster ( based on ambari platform ) , when hive metastore installed on two machines Sometime during job running ( we runs the queries from presto ) we saw job failure due to heap size. ( metastore heap size ) in our case It's because of metastore heap issue from the metastore logs we...

                                  we have hadoop cluster ( based on ambari platform ) , when hive metastore installed on two machines

Sometime during job running ( we runs the queries from presto ) we saw job failure due to heap size.  ( metastore heap size ) 

in our case It's because of metastore heap issue 

from the metastore logs we can see the following:

    2021-12-13 01:39:23,145 INFO  [org.apache.hadoop.hive.common.JvmPauseMonitor$Monitor@1595ec02]: common.JvmPauseMonitor (JvmPauseMonitor.java:run(193)) - Detected pause in JVM or host machine (eg GC): pause of approximately 3263ms
    No GCs detected

so we increase the metastore heap size from 2G to 4G

**but the Question is how to know what is the right size of MetaStore heap and according to what**

for example if we compare it to HDFS heap size of namenode

we can say that namenode heap size should be according to number of files that namenode is managed 

but what is the same concept for metastore heapsize ?

how to calculate the metastore heap size ?

reference - https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_hive_tuning.html 

https://docs.informatica.com/data-engineering/data-engineering-integration/10-1/_user-guide_big-data-management_10-1_ditamap/connections/hive_connection_properties.html 

https://stackoverflow.com/questions/56363736/hive-too-many-connection-to-postgresql-db 

https://docs.microsoft.com/en-us/azure/databricks/kb/metastore/hive-metastore-troubleshooting 

https://www.linkedin.com/pulse/hive-metastore-hcatalog-hcat-haotian-zhang/

yael (13936 rep)

Dec 23, 2021, 12:53 PM

0 votes

1 answers

2308 views

hive + what is the meaning of nofile and nproc

linux rhel file-descriptors ulimit apache-hive

regarding to hive in hadoop cluster what is the meaning of the following hive - nofile 30000 hive - nproc 18000 the file: more /etc/security/limits.d/hive.conf # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with...

                                  regarding to hive in hadoop cluster

what is the meaning of the following

    hive   - nofile 30000
    hive   - nproc  18000

the file:

     more /etc/security/limits.d/hive.conf
    
    # Licensed to the Apache Software Foundation (ASF) under one or more
    # contributor license agreements.  See the NOTICE file distributed with
    # this work for additional information regarding copyright ownership.
    # The ASF licenses this file to You under the Apache License, Version 2.0
    # (the "License"); you may not use this file except in compliance with
    # the License.  You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0 
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    hive   - nofile 30000
    hive   - nproc  18000
                                

yael (13936 rep)

May 29, 2019, 06:54 PM • Last activity: May 29, 2019, 07:25 PM

3 votes

1 answers

10880 views

Change column datatypes in Hive database

database sql hadoop apache-hive

Can I change the datatype in Hive database? Below is a complete information about the same. I have a database named "test". It has a table "name". Below is a query I had used while creating a column in name table. create table name(custID String,nameValuePairs array >) row format delimited fields te...

                                   Can I change the datatype in Hive database? Below is a complete information about the same. 

I have a database named "test". It has a table "name". Below is a query I had used while creating a column in name table.

    create table name(custID String,nameValuePairs array>) row format delimited fields terminated by '/' collection items terminated by '|' map keys terminated by '=' lines terminated by '\n';

Now, I would like to change the datatype entry of column name "nameValuePairs".

Currently the column nameValuePairs has datatype array>.

Now I would like to change the datatype to array>.

Nitesh B. (593 rep)

Sep 9, 2016, 08:43 AM • Last activity: Apr 16, 2019, 03:31 PM

1 votes

1 answers

1867 views

How to change Hive Server Timezone

apache-hive

I am using [Apache Hive](https://hive.apache.org/) on RedHat. Hive version is `1.2.1`. The current Timezone in Hive is `EST` and I want to convert it to `GMT`. PS: I want to change the Hive Server Timezone and I am not asking about using a particular function in Hive that can convert `EST` timezone...

                                  I am using [Apache Hive](https://hive.apache.org/)  on RedHat. Hive version is 1.2.1. The current Timezone in Hive is EST and I want to convert it to GMT. 
PS: I want to change the Hive Server Timezone and I am not asking about using a particular function in Hive that can convert EST timezone to GMT timezone.
                                

Raman (113 rep)

Nov 23, 2016, 09:42 AM • Last activity: Mar 9, 2019, 02:49 PM

1 votes

1 answers

3826 views

How to verify, that a DB hive already exists?

linux expect postgresql apache-hive

We create a hive database using expect script and with some other commands. In the case if we run the expect script again on the machine that hive already created, we get this: > ERROR hive already exists How can we check if *database hive already created* before creating it again? And by this verif...

                                  We create a hive database using expect script and with some other commands.

In the case if we run the expect script again on the machine that hive already created, we get this:

> ERROR hive already exists

How can we check if *database hive already created* before creating it again?

And by this verification we can escape the *expect script*:

    # su - postgres
     Last login: Sun Aug 13 11:12:03 UTC 2017 on pts/0
     -bash-4.2$ psql
     psql (9.2.13)
     Type "help" for help.

    postgres=# CREATE DATABASE hive;
    ERROR:  database "hive" already exists

My expect script: (from my bash script)

    set timeout -1

    #exec the scripts

    spawn timeout 60 ssh root@IP
    expect "#"
    spawn su - postgres
    expect "$"
    send "psql\n"
    expect "=#"
    send "CREATE DATABASE hive;\n"
    .
    .
    .
    .


Other example:

    postgres=# CREATE DATABASE [IF NOT EXISTS] hive;
    ERROR:  syntax error at or near "["
    LINE 1: CREATE DATABASE [IF NOT EXISTS] hive;
                        ^
                                

yael (13936 rep)

Aug 13, 2017, 01:56 PM • Last activity: Aug 21, 2017, 04:06 PM

3 votes

3 answers

675 views

Why does using pipe '|' terminate the second process, is there a way to prevent it?

pipe input exit apache-hive

Just to give some context, I'm trying to run this command echo "set hive.execution.engine=tez;" | hive hive terminates as soon as "set hive.execution.engine..." has been entered into the hive interactive shell, I want to it to stay in the hive interactive shell, but the shell then terminates and I'm...

                                  Just to give some context, I'm trying to run this command

    echo "set hive.execution.engine=tez;" | hive

hive terminates as soon as "set hive.execution.engine..." has been entered into the hive interactive shell, 

I want to it to stay in the hive interactive shell, but the shell then terminates and I'm back into the normal linux shell.

Fil (33 rep)

Sep 10, 2015, 10:22 AM • Last activity: Jun 25, 2017, 12:21 PM

1 votes

0 answers

26 views

Shell Script Execution with Bash and without bash on command line

shell-script ubuntu apache-hive

I have a shell script that reads data from a file and stores it into an array. When i execute the script like `bash scriptname.sh` i dont get any errors but when i execute it using `sh scriptname.sh` i get errors like array deceleration not found some syntax errors like parentheses are expected. why...

                                  I have a shell script that reads data from a file and stores it into an array.
When i execute the script like bash scriptname.sh i dont get any errors 
but when i execute it using sh scriptname.sh i get errors like array deceleration not found some syntax errors like parentheses are expected. 

why differences are coming between these two approaches??
                                

vaibhav kumar (51 rep)

Nov 2, 2016, 10:18 AM • Last activity: Nov 2, 2016, 10:22 AM

1 votes

1 answers

647 views

Escaping bash variables before storing them in an apache hive database

bash quoting string sql apache-hive

I'm running a script file `sqoop_import_ATM.sh` and would like to store the logs in a `SQL` database. First thing I did was to direct the logs into my own variable: OUTPUT="$(/home/me/sqoop_insights_extract/./sqoop_import_ATM.sh 2>&1)" This gets both `stdout` and `stderr` Then I store this `OUTPUT`...

                                  I'm running a script file sqoop_import_ATM.sh and would like to store the logs in a SQL database.

First thing I did was to direct the logs into my own variable:

    OUTPUT="$(/home/me/sqoop_insights_extract/./sqoop_import_ATM.sh 2>&1)"

This gets both stdout and stderr

Then I store this OUTPUT in a SQL database (this part doesn't work):

    hive -e "INSERT INTO SCHEDULED_MIGRATION_LOGS VALUES ('$OUTPUT')"

Here is an example of the log that OUTPUT might hold:

    rm: `transaction_detail_atm': No such file or directory Warning: /usr/iop/4.1.0.0/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 16/07/26 10:28:33 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6_IBM_20 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/iop/4.1.0.0/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J

To remove the harmful single quotes I tried using 

    $OUTPUT | sed "s/'/\ /g"

This seemed to have worked.

Now there's a problem with the line [jar:file:/usr/iop/4.1.0.0/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

It starts checking my directory for this file or something

Apologies if I'm missing something obvious, I'm very new to bash.

Any idea how to simply post this log to my SQL database

**EDIT - Entire LOG**

    rm: `transaction_detail_atm': No such file or directory Warning: /usr/iop/4.1.0.0/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 16/07/26 10:28:33 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6_IBM_20 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/iop/4.1.0.0/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/iop/4.1.0.0/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings  for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 16/07/26 10:28:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 16/07/26 10:28:33 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 16/07/26 10:28:33 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 16/07/26 10:28:33 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. 16/07/26 10:28:33 INFO manager.SqlManager: Using default fetchSize of 1000 16/07/26 10:28:33 INFO tool.CodeGenTool: Beginning code generation 16/07/26 10:28:34 INFO manager.SqlManager: Executing SQL statement: select QueryResult.java sqoop_import_ALL.sh sqoop_import_ATM.sh sqoop_import_CC.sh sqoop_import_TD.sh sqoop_import_VD.sh testscript.sh transaction_detail.java transaction_detail_visa_debit.java from transaction_detail_atm where posted_dte = current_date-1 and (1 = 0) 16/07/26 10:28:35 INFO manager.SqlManager: Executing SQL statement: select QueryResult.java sqoop_import_ALL.sh sqoop_import_ATM.sh sqoop_import_CC.sh sqoop_import_TD.sh sqoop_import_VD.sh testscript.sh transaction_detail.java transaction_detail_visa_debit.java from transaction_detail_atm where posted_dte = current_date-1 and (1 = 0) 16/07/26 10:28:35 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/iop/4.1.0.0/hadoop Note: /tmp/sqoop-u28158/compile/5fe23466bd0c4860e3529366cde274b4/QueryResult.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 16/07/26 10:28:37 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-u28158/compile/5fe23466bd0c4860e3529366cde274b4/QueryResult.jar 16/07/26 10:28:37 INFO mapreduce.ImportJobBase: Beginning query import. 16/07/26 10:28:38 INFO impl.TimelineClientImpl: Timeline service address: http://rhdtmomg1.mid.aib.pri:8188/ws/v1/timeline/  16/07/26 10:28:38 INFO client.RMProxy: Connecting to ResourceManager at rhdtmomg1.mid.aib.pri/10.30.39.1:8050 16/07/26 10:28:43 INFO db.DBInputFormat: Using read commited transaction isolation 16/07/26 10:28:44 INFO mapreduce.JobSubmitter: number of splits:1 16/07/26 10:28:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464347352640_0138 16/07/26 10:28:44 INFO impl.YarnClientImpl: Submitted application application_1464347352640_0138 16/07/26 10:28:44 INFO mapreduce.Job: The url to track the job: http://rhdtmomg1.mid.aib.pri:8088/proxy/application_1464347352640_0138/  16/07/26 10:28:44 INFO mapreduce.Job: Running job: job_1464347352640_0138 16/07/26 10:28:49 INFO mapreduce.Job: Job job_1464347352640_0138 running in uber mode : false 16/07/26 10:28:49 INFO mapreduce.Job: map 0% reduce 0% 16/07/26 10:31:21 INFO mapreduce.Job: map 100% reduce 0% 16/07/26 10:31:22 INFO mapreduce.Job: Job job_1464347352640_0138 completed successfully 16/07/26 10:31:22 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=147903 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=2514798055 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=149743 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=149743 Total vcore-seconds taken by all map tasks=149743 Total megabyte-seconds taken by all map tasks=306673664 Map-Reduce Framework Map input records=4073603 Map output records=4073603 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=2264 CPU time spent (ms)=127770 Physical memory (bytes) snapshot=368541696 Virtual memory (bytes) snapshot=3798663168 Total committed heap usage (bytes)=331350016 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=2514798055 16/07/26 10:31:22 INFO mapreduce.ImportJobBase: Transferred 2.3421 GB in 164.7606 seconds (14.5563 MB/sec) 16/07/26 10:31:22 INFO mapreduce.ImportJobBase: Retrieved 4073603 records. 16/07/26 10:31:22 INFO manager.SqlManager: Executing SQL statement: select QueryResult.java sqoop_import_ALL.sh sqoop_import_ATM.sh sqoop_import_CC.sh sqoop_import_TD.sh sqoop_import_VD.sh testscript.sh transaction_detail.java transaction_detail_visa_debit.java from transaction_detail_atm where posted_dte = current_date-1 and (1 = 0) 16/07/26 10:31:23 INFO manager.SqlManager: Executing SQL statement: select QueryResult.java sqoop_import_ALL.sh sqoop_import_ATM.sh sqoop_import_CC.sh sqoop_import_TD.sh sqoop_import_VD.sh testscript.sh transaction_detail.java transaction_detail_visa_debit.java from transaction_detail_atm where posted_dte = current_date-1 and (1 = 0) 16/07/26 10:31:23 WARN hive.TableDefWriter: Column DISPENSED_AMT had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column POSTED_AMT had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column POSTED_REF_CRNCY_AMT had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column UNCLEARED_1_DAY had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column UNCLEARED_3_DAY had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column UNCLEARED_4_DAY had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column UNCLEARED_5_DAY had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column UNCLEARED_6_DAY had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column UNCLEARED_TEMP had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column POSTED_DTE had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column TRANS_DTE had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column POSTED_COMMISSION_AMT had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column COMMISSION_REF_CRNCY had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column SEQUENCE_NO had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column TRACER_NO had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column FORCE_POST_RECVRY had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column FORCE_POST_RSN had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column CURR_LEDG_BAL had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column ATM_POSTED_DTE had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column ACQUIRER_SETTLEMENT_DTE had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column ISSUER_SETTLEMENT_DTE had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column REQUESTED_AMT had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column ATM_DISPENSED_AMT had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column SETTLEMENT_AMT had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column CARD_EXCH_RTE had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column SETTLEMENT_EXCH_RTE had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column COMMISSION_AMT had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column ERR_INFO_CDE had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column REVERSAL_RSN_CDE had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column CARD_MBR_NO had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column SURCHARGE_DR_AMT had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column SURCHARGE_ORIG_AMT had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column ACQUIRER_SURCHARGE_AMT had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column ACQUIRER_INSTIT_ID had to be cast to a less precise type in Hive 16/07/26 10:31:23 WARN hive.TableDefWriter: Column LOAD_DTE had to be cast to a less precise type in Hive 16/07/26 10:31:23 INFO hive.HiveImport: Loading uploaded data into Hive Logging initialized using configuration in jar:file:/usr/iop/4.1.0.0/hive/lib/hive-common-1.2.1-IBM-12.jar!/hive-log4j.properties OK Time taken: 1.034 seconds OK Time taken: 0.15 seconds Loading data to table insights.transaction_detail_atm_archive chgrp: changing ownership of 'hdfs://rhdtmomg1.mid.aib.pri:8020/apps/hive/warehouse/insights.db/transaction_detail_atm_archive/part-m-00000_copy_5': User does not belong to hadoop Table insights.transaction_detail_atm_archive stats: [numFiles=6, numRows=0, totalSize=15088788330, rawDataSize=0] OK Time taken: 0.587 seconds

Greg Peckory (111 rep)

Jul 26, 2016, 10:19 AM • Last activity: Jul 26, 2016, 10:52 PM

Showing page 1 of 9 total questions