Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

0 answers

13 views

Greenplum was locked by more than 17 open connections(open transactions) in same time

I have 50 tables in greenplum database. I need load data to these tables by copy command. I opened connections consistently. 21'th connection locked with whole database. After that I performed copy command by libpq command for table in each thread(no commit, 20 transactions are opened). If connectio...

                                  I have 50 tables in greenplum database.

I need load data to these tables by copy command. I opened connections consistently. 21'th connection locked with whole database.
After that I performed copy command by libpq command for table in each thread(no commit, 20 transactions are opened).
If connections  loaders(50);
        for (int i = 1; i <= 50; i++)
        {
            PGLoader& loader = loaders[i];
            loader.connectToDB("myhost", 5432, "myusr", "mypwd", "mydb");
            loader.beginTransaction();
            const std::string cmd = std::string("copy big_src.simple_tbl_") + std::to_string(i) 
                + "  from '/home/adb/document"  + std::to_string(i) + ".csv' delimiter ',' csv header;";
            //loader.performCommand(cmd);    
            
        }
    
        for (int i = 1; i <= 50; i++)
        {
            PGLoader& loader = loaders[i];
            loader.commitTransaction();
        }
    }

Copy command does not affect in here. 

Do you know there is exist some restrictions to work with 20 or more open transactions (connections) in same time?



                                

Эльфия Валиева (13 rep)

Jul 21, 2025, 08:45 AM • Last activity: Jul 21, 2025, 10:20 AM

0 votes

2 answers

208 views

Using IMPORT FOREIGN SCHEMA with Greenplum database

postgresql greenplum foreign-data

I have setup a PostgreSQL 12 database and am trying to connect to a Greenplum database in order to create proxy tables. I am able to connect to the Greenplum db, but I get an error when I try to use the IMPORT FOREIGN SCHEMA command. IMPORT FOREIGN SCHEMA remote_schema FROM SERVER "remote_server" IN...

                                  I have setup a PostgreSQL 12 database and am trying to connect to a Greenplum database in order to create proxy tables.  I am able to connect to the Greenplum db, but I get an error when I try to use the IMPORT FOREIGN SCHEMA command.

    IMPORT FOREIGN SCHEMA remote_schema FROM SERVER "remote_server" INTO schema_test_1;
returns:

        ERROR:  Greenplum Database does not support REPEATABLE READ transactions. (variable.c:570)
    CONTEXT:  remote SQL command: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
    SQL state: XX000

I read that REPEATABLE READ is not supported in Greenplum and to use SERIALIZE instead.  Is there a way to edit the IMPORT FOREIGN SCHEMA command so that I can replace REPEATABLE READ with SERIALIZE?

I am using PGadmin 4.

Update:

I found that I can get commands to work if I write them as complete transactions and include the following before any commands:

    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

Is there a way to set this as the default value for all transactions going through the Foreign Server?

Jason (1 rep)

Nov 11, 2019, 03:29 PM • Last activity: Jun 25, 2025, 04:04 PM

0 votes

1 answers

303 views

What is the fastest way copy 800Gb data from Greenplum to MS SQL Server

sql-server greenplum

Our company has a base on Greenplum with a total volume of almost 800 GB. What is the fastest way to transfer all data from Greenplum to a clean database on the SQL Server?

                                  Our company has a base on Greenplum with a total volume of almost 800 GB. What is the fastest way to transfer all data from Greenplum to a clean database on the SQL Server?
                                

Gorkov Aleksey (13 rep)

Feb 22, 2022, 08:25 PM • Last activity: May 12, 2025, 08:03 AM

1 votes

1 answers

41 views

Greenplum/Postgres - Why not backward index scan?

postgresql index explain greenplum seqscan

I have this query running on `Greenplum 6`(Postgres 9.4) too slow as it performing a `sequential scan`. But I am thinking why it's not performing an `index backward scan` to fetch the records since there is an index on the same column. explain analyse select * from tab1 where create_time > now() - i...

                                  I have this query running on Greenplum 6(Postgres 9.4) too slow as it performing a sequential scan. But I am thinking why it's not performing an index backward scan to fetch the records since there is an index on the same column. 

    explain analyse select * from tab1 where create_time > now() - interval '1d' order by create_time desc limit 20;

https://explain.depesz.com/s/UTA7#html 

The index is a composite index (create_time,col2), could it be the reason behind sequential scan? I have already updated statistics of the table and also the bloat ≈5% only.

goodfella (595 rep)

Apr 3, 2025, 09:03 AM • Last activity: Apr 4, 2025, 04:42 AM

2 votes

0 answers

23 views

Greenplum/PostgreSQL: Querries run individually, but won't run in plpgsql function

postgresql-9.4 plpgsql greenplum

We are running Greenplum 6.28.1, which is built on PostgreSQL 9.4.26. We have a master node and 6 server nodes, all running Oracle Linux release 8.10. We have 5 queries that progressively build temp tables that are use to generate a final result set. Those queries, in the order they are run, are: ``...

drop table if exists hourly_time_windows;
create temp table hourly_time_windows as (
	WITH
        time_windows as (
            select 
                (generate_series(0, 23, 1)::text || ' Hours'::text)::interval as window_start_interval,
                (generate_series(1, 24, 1)::text || ' Hours'::text)::interval as window_end_interval
            ),
        dates_and_users as (
			select distinct
					activity_date,
					career_account
			from network_logs_processed.hourly_internet_activity_building log
			where activity_date >= '2024-02-01'::date and activity_date  0
			THEN 1
			ELSE 0 
			END) as device_moved_buildings,
		sum(megabytes_transferred) as megabytes_transferred
	from 
		ip.hourly_activity_with_building_lag data
	left outer join 
		(  
            select schedule.*, banner.career_account from utility_features.student_class_schedule_detail schedule
            left outer join banner_oltp.banner_lookup banner 
                on schedule.person_uid = banner.person_uid::numeric
		) class
		on 
			data.activity_date = class.calendar_date
			and data.career_account = class.career_account
			and data.building = class.building 
			and (
					(data.session_start_in_hour between class.begin_time and class.end_time)
					OR
					(data.session_end_in_hour between class.begin_time and class.end_time)
					OR 
					(data.session_end_in_hour > class.end_time and data.session_start_in_hour  0 and
	            campus in ('PWL','CEC') and
	            schedule_type not in ('IND','RES','EX') and
	            substring(course_identification from '[a-zA-Z]+#"[0-9]#"%' for '#')::int <= 4 and   --undergrad courses only
	            sub_academic_period in ('1','F8','FHS','M1','M2','M3','M12','M23','S8','SHS') and
	            registration_status like 'R%'
            ),
        housed_students as (
    		select distinct academic_period, person_uid
		    from utility_features.resident_student_room_details
            ),
        full_student_list as (
    		select academic_period, person_uid::numeric from registered_students
    		UNION
    		select academic_period, person_uid::numeric from housed_students
            )
    select 
    	full_student_list.academic_period, 
    	full_student_list.person_uid, 
    	banner.career_account 
    from 
    	full_student_list
    left outer join banner_oltp.banner_lookup banner
    	on full_student_list.person_uid = banner.person_uid::numeric
) distributed by (career_account);

drop table if exists aggregated_hourly_data;
create temp table aggregated_hourly_data as (
	select
		hourly_time_windows.career_account,
		banner.puid,
		banner.person_uid,
		ac.academic_period,
		hourly_time_windows.activity_date,
		hourly_time_windows.window_start,
		hourly_time_windows.window_end,
		sum(not_in_residential) as not_in_residential,
		sum(in_class) as in_class,
		sum(device_moved_buildings) as device_moved_buildings,
		sum(megabytes_transferred) as megabytes_transferred
	from 
		hourly_time_windows 
	left outer join hourly_activity_with_movement_flag
	on 	
		hourly_time_windows.career_account = hourly_activity_with_movement_flag.career_account
		and hourly_time_windows.window_start = hourly_activity_with_movement_flag.window_start
	left outer join banner_oltp.banner_lookup banner 
	on 
		hourly_time_windows.career_account = banner.career_account
	left outer join utility.academic_calendar ac 
	on 
		hourly_time_windows.window_start between ac.last_term_end_date + INTERVAL '1 day' and ac.end_date
	inner join included_students
		on hourly_time_windows.career_account = included_students.career_account
			and ac.academic_period = included_students.academic_period
	group by 1,2,3,4,5,6,7
) distributed by (career_account);

Here is my problem: If I run each query directly and count the rows in the temp table after it is created, all five queries complete in a minute or less. (That's 60 seconds to run all 5, not 60 seconds for each of them.) I created a plpgsql function to run those five queries. The first four completed in about the same time it takes when I run them directly, and the final row counts for all four tables were exactly the same. But I've let the function run for 30+ minutes and the fifth query still never completes. I also tried creating a plpgsql function to run just the first four queries and then running the fifth query directly. Again, the function completes very quickly, and the temp tables it creates have the same row counts as when I run the queries directly, but the fifth query still does not complete. I know PostgreSQL optimizes things differently when run in a function rather than individually, but I really thought that running just the first four in a function and the fifth directly would give different results. I am kind of at my wit's end here. Has anyone run into anything like this before?

lpscott (33 rep)

Mar 25, 2025, 02:59 PM

1 votes

2 answers

1222 views

Resolve parametrized schema in PostgreSQL

postgresql functions greenplum

I'm looking for a way to use a parametrized schema in the `DECLARE` section of a PostgreSQL function. **Why am I looking this way ?** 1) This functions will refer %ROWTYPE from multiple schema like ` Var1 s1.emp%ROWTYPE ` ===> to refer the columns of emp tables as variable from s1 schema ` var2 s2.s...

I'm looking for a way to use a parametrized schema in the DECLARE section of a PostgreSQL function. **Why am I looking this way ?** 1) This functions will refer %ROWTYPE from multiple schema like Var1 s1.emp%ROWTYPE ===> to refer the columns of emp tables as variable from s1 schema var2 s2.stg_emp_d%ROWTPE; ===> to refer the columns of stg_emp_d tables from s2 schema as variable var3 s3.emp_comp%ROWTYPE; ===> to refer the colums of emp_comp tables from s3 schema as variable 2) s1, s2, s3 (schema names) are not same across different env eg:

`
IN DEV ENV s1, s2, s3
IN TEST ENV s1_t, s2_t, s3_t
IN PROD ENV s1_p, s2_p, s3_p

` 3) As we accept schema names as parameter in function, same function can be deployed without any changes across environments. It will inherit appropriate schema name set during run-time. As we use %ROWTYPE of respective table, any changes to table structure will not impact these function. These changes are inherited via %ROWTYPE. 4) These functions need to get individual table columns name via %ROWTYPE with in BEGIN section like

`
if var1.emp_type = 'C' then ... do some thing ;
elsif  var1.emp_type = 'T' then ... do some thing ;

end if ;

` Here is an example:

CREATE OR REPLACE FUNCTION get_list(in_code text[], p_schema text)
  RETURNS text  AS
$func$

DECLARE

  var1 user.emp%ROWTYPE;

BEGIN

SELECT q.id, q.title, q.code
FROM   questions q, emp e
WHERE  q.code  ALL ($1)
and    q.emp_id = e.emp_id;

END ;  

$func$ LANGUAGE sql;

The above getting created. When I change var1 user.emp%ROWTYPE; to * var1 p_schema.emp%ROWTYPE; or * var1 $$ || p_schema || $$.emp%ROWTYPE; Function is not getting created, but throwing error

ERROR : relation emp not found

**Are there any limitations using parameterized items within DECLARE section?** I used these kinds of parameters with queries within BEGIN & END section. It did not throw any errors.

user2647763 - RIMD (11 rep)

Sep 12, 2020, 02:09 AM • Last activity: Dec 19, 2024, 04:40 AM

1 votes

1 answers

101 views

How a B-Tree index on a columnar table would look like?

postgresql index columnstore btree greenplum

`Greenplum` database supports `B-tree index` on `append-optimized columnar` tables which allows `UPDATE` operations as well . Even though it's not a recommended practice to have index on such tables(probably because they are intended for append-only and do fast sequential scans) for update operation...

                                  Greenplum database supports B-tree index on append-optimized columnar tables which allows UPDATE operations as well . Even though it's not a recommended practice to have index on such tables(probably because they are intended for append-only and do fast sequential scans) for update operation, an index on distributing column reduces execution time drastically. 

While traditionally a B-tree index on rowstore table holds the pointer to heap with offset value, how this will be implemented on a columnar table? If the table has N columns does each entry in index contain total N-1 pointers to each column blocks?

goodfella (595 rep)

Jul 8, 2024, 09:12 AM • Last activity: Jul 9, 2024, 01:15 PM

1 votes

1 answers

38 views

Merging wifi session data if time between them is less than 15 minutes

postgresql postgresql-9.4 gaps-and-islands greenplum

I am trying to process network logs and join sessions together if the time between them is less than 15 minutes. The relevant fields are start time, end time, mac address, and wifi access point. I am working in Greenplum 6.22/Postgresql 9.4.26: ``` pdap=# SELECT version(); ``` | version | | :------...

pdap=# SELECT version();

| version | | :------ | | PostgreSQL 9.4.26 (Greenplum Database 6.22.2) | [db fiddle](https://dbfiddle.uk/vOP-wLu8) Logically, what I want to do is "If the start time from the next row is less than 15 minutes after the end time from this row, merge the two rows into one row with the earlier start time and the later end time." Here is an example table with some data:

CREATE TABLE network_test
( start_ts TIMESTAMPTZ,
  end_ts TIMESTAMPTZ,
  mac_addr MACADDR,
  access_point VARCHAR
);

INSERT INTO network_test
VALUES
('2023-08-14 13:21:10.289'::timestamptz, '2023-08-14 13:31:20.855'::timestamptz, '00:00:00:00:00:01'::macaddr, 'access_point_01'),
('2023-08-14 13:58:10.638'::timestamptz, '2023-08-14 13:58:22.668'::timestamptz, '00:00:00:00:00:01'::macaddr, 'access_point_01'),
('2023-08-14 13:58:22.727'::timestamptz, '2023-08-14 13:58:38.966'::timestamptz, '00:00:00:00:00:01'::macaddr, 'access_point_01'),
('2023-08-14 13:28:28.190'::timestamptz, '2023-08-14 13:28:28.190'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_02'),
('2023-08-14 13:28:44.167'::timestamptz, '2023-08-14 13:28:44.288'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_02'),
('2023-08-14 13:45:40.281'::timestamptz, '2023-08-14 13:46:02.726'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:02.964'::timestamptz, '2023-08-14 13:46:10.783'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:11.026'::timestamptz, '2023-08-14 13:46:18.803'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:19.037'::timestamptz, '2023-08-14 13:46:26.798'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:27.036'::timestamptz, '2023-08-14 13:46:34.815'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:35.057'::timestamptz, '2023-08-14 13:46:46.980'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:47.213'::timestamptz, '2023-08-14 13:46:54.946'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:55.189'::timestamptz, '2023-08-14 13:47:17.040'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:47:17.297'::timestamptz, '2023-08-14 13:47:25.106'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:55:25.381'::timestamptz, '2023-08-14 13:58:33.059'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03');

SELECT *
FROM network_test
ORDER BY mac_addr, access_point, start_ts

| start\_ts | end\_ts | mac\_addr | access\_point | | :---------|:-------|:---------|:-------------| | 2023-08-14 13:21:10.289+00 | 2023-08-14 13:31:20.855+00 | 00:00:00:00:00:01 | access\_point\_01 | | 2023-08-14 13:58:10.638+00 | 2023-08-14 13:58:22.668+00 | 00:00:00:00:00:01 | access\_point\_01 | | 2023-08-14 13:58:22.727+00 | 2023-08-14 13:58:38.966+00 | 00:00:00:00:00:01 | access\_point\_01 | | 2023-08-14 13:28:28.19+00 | 2023-08-14 13:28:28.19+00 | 00:00:00:00:00:02 | access\_point\_02 | | 2023-08-14 13:28:44.167+00 | 2023-08-14 13:28:44.288+00 | 00:00:00:00:00:02 | access\_point\_02 | | 2023-08-14 13:45:40.281+00 | 2023-08-14 13:46:02.726+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:46:02.964+00 | 2023-08-14 13:46:10.783+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:46:11.026+00 | 2023-08-14 13:46:18.803+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:46:19.037+00 | 2023-08-14 13:46:26.798+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:46:27.036+00 | 2023-08-14 13:46:34.815+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:46:35.057+00 | 2023-08-14 13:46:46.98+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:46:47.213+00 | 2023-08-14 13:46:54.946+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:46:55.189+00 | 2023-08-14 13:47:17.04+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:47:17.297+00 | 2023-08-14 13:47:25.106+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:55:25.381+00 | 2023-08-14 13:58:33.059+00 | 00:00:00:00:00:02 | access\_point\_03 | Here is what I would like the result to be: | start\_ts | end\_ts | mac\_addr | access\_point | | :---------|:-------|:---------|:-------------| | 2023-08-14 13:21:10.289+00 | 2023-08-14 13:31:20.855+00 | 00:00:00:00:00:01 | access\_point\_01 | | 2023-08-14 13:58:10.638+00 | 2023-08-14 13:58:38.966+00 | 00:00:00:00:00:01 | access\_point\_01 | | 2023-08-14 13:28:28.19+00 | 2023-08-14 13:28:44.288+00 | 00:00:00:00:00:02 | access\_point\_02 | | 2023-08-14 13:45:40.281+00 | 2023-08-14 13:58:33.059+00 | 00:00:00:00:00:02 | access\_point\_03 | The first session stays as it is. The 2nd and 3rd sessions are merged into one because they have the same mac address and access point, and there is less than 15 minutes between them. The same happens for the 4th and 5th sessions, as well as the 6th through the 15th. I can come close using window functions:

SELECT DISTINCT
       MIN(start_ts) OVER (PARTITION BY mac_addr, access_point, ROUND(EXTRACT(EPOCH FROM start_ts)/900)) AS start_ts,
       MAX(end_ts) OVER (PARTITION BY mac_addr, access_point, ROUND(EXTRACT(EPOCH FROM end_ts)/900)) AS end_ts,
       mac_addr,
       access_point
FROM network_test
ORDER BY mac_addr, access_point, start_ts

lpscott (33 rep)

Sep 8, 2023, 02:15 PM • Last activity: Sep 8, 2023, 03:28 PM

0 votes

0 answers

43 views

Incorrect results when using group by date_part

greenplum

We have a table with a date field and ids. A simple query like this returns the correct result: SELECT count(DISTINCT id_field) AS distinct_count FROM schema1.table1 WHERE date_part('month', date_field ) = 11 ; distinct_count ---------------------- 5645202 However, when trying to count distinct ids...

                                  We have a table with a date field and ids. A simple query like this returns the correct result:

    SELECT count(DISTINCT id_field) AS distinct_count FROM schema1.table1 WHERE date_part('month', date_field ) = 11 ;
    
    distinct_count
    ----------------------
                5645202

However, when trying to count distinct ids grouped by month we get incorrect results:

    SELECT date_part('month',date_field ), count(DISTINCT id_field) AS distinct_count
    FROM schema1.table1 GROUP BY 1 ORDER BY 1 desc;

    date_part | distinct_count
    -----------+----------------------
            12 |              5637167
            11 |              5645426
            10 |              5705702
             9 |              5633101
             8 |              5619553
             7 |              5636407
             6 |              5598244
             5 |              5658568
             4 |              5591066
             3 |              5595882
             2 |              5646399
             1 |              5584825
    (12 rows)

Month 11 has 5,645,426 IDs counted (incorrect) vs 5,645,202 (correct). Could this be an issue with the DB engine itself? 
                                

Alex Polkhovsky (101 rep)

Oct 28, 2022, 03:13 PM

1 votes

1 answers

563 views

Greenplum add a column to table and insert value to it in incremental order

postgresql auto-increment greenplum

I have a table st.student like following, which has one column. | STUDENT_ID| |:--------| | 100001 | | 100002 | | 100003 | | 100004 | | 100005 | | 100006 | I need to insert a column to this and values in it should be incremental one. Like the following. Note that NEW_STUDENT_ID can start from any va...

                                  I have a table st.student like following, which has one column.

| STUDENT_ID| 
|:--------|
| 100001   | 
| 100002   | 
| 100003  | 
| 100004   | 
| 100005   | 
| 100006   | 

I need to insert a column to this and values in it should be incremental one. Like the following. Note that NEW_STUDENT_ID can start from any value. Once it starts, it's continuous. What's the query for this ?

| STUDENT_ID      | NEW_STUDENT_ID   |
|:--------|:---------|
| 100001   | 349009   |
| 100002   | 349010  | 
| 100003  | 349011  | 
| 100004   | 349012    | 
| 100005   | 349013   | 
| 100006   | 349014   |
                                

Shankar (13 rep)

May 9, 2022, 01:34 PM • Last activity: May 9, 2022, 02:08 PM

1 votes

2 answers

1047 views

Replicating PostgreSQL data into Citus/Greenplum?

postgresql replication data-warehouse greenplum citus

I need to integrate data from 3 different PostgreSQL databases (OLTP application backends) in a data warehouse. For the data warehouse itself I consider using Citus or Greenplum. There is a requirement that the data from applications has to be synced with the data warehouse as close to real time as...

                                  I need to integrate data from 3 different PostgreSQL databases (OLTP application backends) in a data warehouse. For the data warehouse itself I consider using Citus or Greenplum. There is a requirement that the data from applications has to be synced with the data warehouse as close to real time as possible (everything above 3-5 minutes delay is unacceptable, real time replication would be the best). In this regard I have the following questions:

 1. Will Postgres logical replication work with Citus? Citus is a Postgres extension, can you treat a Citus cluster as an ordinary Postgres database? If yes, then logical replication should theoretically work, but how does it deal with distributed tables?
 2. Greenplum is a Postgres fork, so will Postgres logical replication work with it at all? I have also read that Greenplum is not optimized for OLTP workloads, does that mean it will break when I try to ingest OLTP data into it?
 3. If logical replication does not work with Citus/Greenplum, then how to stream data from Postgres? Do I need to stream logical-level WAL into Kafka and then write custom logic for translating it into SQL statements on the target database? Are there any tools for that?

Bonus question: does anyone have experience with both Citus and Greenplum, especially with their SQL limitations? I know that Citus does not fully support correlated subqueries and recursive CTEs, does Greenplum have any similar limitations?

I would appreciate any help with these questions, I tried googling but there is little or no info on the subject, could you please give at least some direction?
                                

Denis Arharov (101 rep)

Feb 4, 2021, 12:44 PM • Last activity: Jan 13, 2022, 10:15 AM

2 votes

1 answers

2594 views

Transpose rows to column in postgresql

postgresql pivot psql greenplum

The following query gives me a list of 1222 distinct columns: ``` select distinct column_name from information_schema.columns where table_name like 'fea_var%'; ``` I want to create one base table which has all the 1222 rows from this query as columns. `fea_var%` tables are just empty tables with col...

The following query gives me a list of 1222 distinct columns:

select distinct column_name from information_schema.columns where table_name like 'fea_var%';

I want to create one base table which has all the 1222 rows from this query as columns. fea_var% tables are just empty tables with columns. So, the output should be an empty table with those 1222 columns.

Reshmi Nair (31 rep)

Dec 15, 2021, 07:31 PM • Last activity: Dec 16, 2021, 11:35 AM

0 votes

0 answers

288 views

MS SQL Linked Server Insert Error

sql-server migration linked-server greenplum

As part of data migration, I have created linked server for Greenplum database in MS SQL 2019. I am able to query using linked server. Select & insert commands are working fine for most of the tables. While doing querying for one column which is having data as ""1111-1111-1111.zip;zzzz-zzzz-zzzz-111...

                                  As part of data migration, I have created linked server for Greenplum database in MS SQL 2019. I am able to query using linked server. Select & insert commands are working fine for most of the tables.

While doing querying for one column which is
  having data as ""1111-1111-1111.zip;zzzz-zzzz-zzzz-1111.zip;" format and 
  data type as character varying (varchar) and
  max length less than 8000 characters in GPDB, 
Getting below error.

> OLE DB Provided "MSDASQL" for linked server "LinkedServer" returned
> message "Requested conversion is not supported". cannot get the
> current row value of column "[MSDASQL].column" from OLE DB provider
> "MSDASQL" for linked server.

Greenplum using UTF8 encoding & MS SQL using Latin1_General_CI_AS collation. 

Manual insert command in MS SQL with the same data is working fine in SSMS. 
I tried insert statement using cast still fails. Select query with collate (Latin1_General_CI_AS) also fails with same error. 
                                

Rajes (1 rep)

Nov 16, 2021, 07:18 AM • Last activity: Nov 16, 2021, 11:16 AM

0 votes

1 answers

4837 views

Greenplum ERROR: Canceling query because of high VMEM usage

memory errors greenplum

We have a GreenPlum Cluster which we have set up recently and getting this error on a single query run: current group id is 140611, group memory usage 40720 MB, group shared memory quota is 31320 MB, slot memory quota is 0 MB, global freechunks memory is 1044 MB, global safe memory threshold is 1048...

                                  We have a GreenPlum Cluster which we have set up recently and getting this error on a single query run:

current group id is 140611, group memory usage 40720 MB, group shared memory quota is 31320 MB, slot memory quota is 0 MB, global freechunks memory is 1044 MB, global safe memory threshold is 1048 MB (runaway_cleaner.c:197) SQL state: XX0

I'll be happy to post the settings we currently have in place that may help able to troubleshoot further.

Here are some basic ones I can share:

Cluster has got 128GB RAM on each node host
SWAP on each Node is 32GB

     groupid  |   groupname   | concurrency | cpu_rate_limit | memory_limit | memory_shared_quota | memory_spill_ratio | memory_auditor | cpuset
     24964400 | my_user_group    | 10          | 60             | 60           | 80                  | 0                  | vmtracker      | -1

Matias (141 rep)

Jul 27, 2020, 10:33 AM • Last activity: Oct 21, 2021, 03:13 AM

0 votes

1 answers

1523 views

Get partition column for a table in Greenplum

partitioning greenplum

How can I find what the partitioning scheme is for a table? Is it in information_schema or in the pg_class related tables?

                                  How can I find what the partitioning scheme is for a table? Is it in information_schema or in the pg_class related tables?
                                

PhilHibbs (539 rep)

Jan 30, 2017, 03:24 PM • Last activity: Oct 21, 2020, 11:06 AM

0 votes

1 answers

118 views

How to delete all rows in a Greenplum row oriented table

delete disk-space greenplum

In Greenplum, what is the best practice to delete all rows from a table? DELETE FROM TABLE or TRUNCATE TABLE? I dont have any child tables or foreign key constraints.

                                  In Greenplum, what is the best practice to delete all rows from a table? DELETE FROM TABLE or TRUNCATE TABLE? I dont have any child tables or foreign key constraints.
                                

Siva (1 rep)

Dec 13, 2019, 05:30 PM • Last activity: Dec 13, 2019, 06:12 PM

0 votes

1 answers

103 views

Fetching rows from million rows table(Optimization)

postgresql optimization greenplum

I have millions of rows in a table of Greenplum database out of which I have to fetch around 45k rows and store them in a python List. It's taking more than 2hrs to fetch the data. How can I optimize the time taken to fetch data? resultList = [] for(item in list): result = SELECT column_1, ... colum...

                                  I have millions of rows in a table of Greenplum database out of which I have to fetch around 45k rows and store them in a python List.

It's taking more than 2hrs to fetch the data. How can I optimize the time taken to fetch data?

    resultList = []
    for(item in list):
      result = SELECT column_1, ... column_n from TABLE WHERE column = item
      resultList.append(result)

Krishna Kumar S (1 rep)

Jun 12, 2019, 07:57 AM • Last activity: Jun 12, 2019, 11:30 AM

2 votes

1 answers

338 views

How to narrow down records in SQL based on conditions

postgresql greenplum

Customer Rank Joining_date salary A 2 2017-10-12 500 A 1 2017-10-10 800 A 1 2017-10-20 400 B 2 2017-05-20 200 B 2 2017-05-15 100 c 3 2017-06-10 600 c 4 2017-06-05 600 Logic: For a given customer if the Rank is 1 ,then retain all those records with rank 1 and drop rest if customer has records with di...

                                       Customer   Rank      Joining_date       salary
        A         2        2017-10-12         500
        A         1        2017-10-10         800     
        A         1        2017-10-20         400
        B         2        2017-05-20         200
        B         2        2017-05-15         100
        c         3        2017-06-10         600
        c         4        2017-06-05         600

Logic: For a given customer 

  
  if the Rank is 1 ,then retain all those records with rank 1 and 
         drop rest 
  
 if customer has records with different rank ,select the record 
  based on latest rank (order by rank desc) and drop the other 

  if the customer has records with same rank (other than 1),then 
  select the 
  record based on the lowest salary (order by salary asc)**

Expected result 

     Customer   Rank      Joining_date       salary
        A         1        2017-10-10         800     
        A         1        2017-10-20         400
        B         2        2017-05-15         100
        c         4        2017-06-05         600
                                

user8545255 (123 rep)

Aug 15, 2018, 11:35 PM • Last activity: Aug 17, 2018, 09:21 AM

0 votes

1 answers

291 views

Count days between dates and find largest gap

date gaps-and-islands greenplum

I need to find all members in our database from a calendar month that do not have a gap of 10+ days between sending data / or in the month total i.e. everyone in April 2018. The table is set up like so (although much larger): +-----------+------------+ | member_id | data_date | +-----------+--------...

                                  I need to find all members in our database from a calendar month that do not have a gap of 10+ days between sending data / or in the month total i.e. everyone in April 2018. The table is set up like so (although much larger):

    +-----------+------------+
    | member_id |  data_date |
    +-----------+------------+
    |         1 | 2018-04-10 |
    |         5 | 2018-04-16 |
    |         1 | 2018-04-11 |
    |         2 | 2018-04-12 |
    |         3 | 2018-04-13 |
    |         4 | 2018-04-12 |
    |         5 | 2018-04-15 |
    |         3 | 2018-04-19 |
    |         2 | 2018-04-17 |
    |         1 | 2018-04-18 |
    |         5 | 2018-04-10 |
    |         2 | 2018-04-18 |
    |         1 | 2018-04-08 |
    |         2 | 2018-04-03 |
    |         3 | 2018-04-02 |
    |         4 | 2018-04-14 |
    |         5 | 2018-04-15 |
    |         3 | 2018-04-16 |
    |         2 | 2018-04-19 |
    |         1 | 2018-04-14 |
    +-----------+------------+

(member_id,data_date) is defined UNIQUE. Each data_date represents one day that data was sent. There are duplicate data_dates for each member_id. I am running PostgreSQL 8.2.15. It is Greenplum.

There are up to 30 data_dates for each member_id in the month, I am having trouble figuring out how to find the largest gap without data being sent in the entire month for each member. 



Here is an example of some test data:

    create temp table tempdata  (
      member_id integer NOT NULL,
      data_date date
    );

    INSERT INTO tempdata(member_id, data_date) VALUES
       (1, '2017-04-01')
     , (1, '2017-04-02')
     , (1, '2017-04-03')
     , (1, '2017-04-04')
     , (1, '2017-04-05')
     , (1, '2017-04-06')
     , (1, '2017-04-07')
     , (1, '2017-04-08')
     , (1, '2017-04-09')
     , (1, '2017-04-10')
     , (1, '2017-04-11')
     , (1, '2017-04-12')
     , (1, '2017-04-13')
     , (1, '2017-04-14')
     , (1, '2017-04-15')
     , (1, '2017-04-16')
     , (1, '2017-04-17')
     , (1, '2017-04-18')
     , (1, '2017-04-19')
     , (1, '2017-04-20')
     , (1, '2017-04-21')
     , (1, '2017-04-22')
     , (1, '2017-04-23')
     , (1, '2017-04-24')
     , (1, '2017-04-25')
     , (1, '2017-04-26')
     , (1, '2017-04-27')
     , (1, '2017-04-28')
     , (1, '2017-04-29')
     , (1, '2017-04-30')
     , (2, '2017-04-09')
     , (2, '2017-04-10')
     , (2, '2017-04-11')
     , (2, '2017-04-12')
     , (3, '2017-04-01')
     , (3, '2017-04-02')
     , (3, '2017-04-03')
     , (3, '2017-04-04')
     , (3, '2017-04-05')
     , (3, '2017-04-06')
     , (3, '2017-04-07')
     , (3, '2017-04-08')
     , (3, '2017-04-09')
     , (3, '2017-04-10')
     , (3, '2017-04-11')
     , (3, '2017-04-12')
     , (3, '2017-04-13')
     , (3, '2017-04-14')
     , (3, '2017-04-15')
     , (3, '2017-04-16')
     , (3, '2017-04-17')
     , (3, '2017-04-18')
     , (3, '2017-04-19')
     , (3, '2017-04-20')
     , (3, '2017-04-21')
     , (3, '2017-04-22')
     , (3, '2017-04-23')
     , (3, '2017-04-24')
     , (3, '2017-04-25')
     , (3, '2017-04-26')
     , (3, '2017-04-27')
     , (3, '2017-04-28')
     , (3, '2017-04-29')
     , (3, '2017-04-30')
     , (4, '2017-04-01')
     , (4, '2017-04-02')
     , (4, '2017-04-03')
     , (4, '2017-04-04')
     , (4, '2017-04-05')
     , (4, '2017-04-06')
     , (4, '2017-04-07')
     , (4, '2017-04-08')
     , (4, '2017-04-09')
     , (4, '2017-04-10')
     , (4, '2017-04-11')
     , (4, '2017-04-12')
     , (4, '2017-04-13')
     , (4, '2017-04-14')
     , (4, '2017-04-15')
     , (4, '2017-04-16')
     , (4, '2017-04-17')
     , (4, '2017-04-18')
     , (4, '2017-04-19')
     , (4, '2017-04-20')
     , (4, '2017-04-21')
     , (4, '2017-04-22')
     , (5, '2017-04-01')
     , (5, '2017-04-02')
     , (5, '2017-04-03')
     , (5, '2017-04-04')
     , (5, '2017-04-05')
     , (5, '2017-04-06')
     , (5, '2017-04-07')
     , (5, '2017-04-08')
     , (5, '2017-04-09')
     , (5, '2017-04-10')
     , (5, '2017-04-11')
     , (5, '2017-04-12')
     , (5, '2017-04-13')
     , (5, '2017-04-14')
     , (5, '2017-04-15')
     , (5, '2017-04-16')
     , (5, '2017-04-17')
     , (5, '2017-04-18')
     , (5, '2017-04-22')
     , (5, '2017-04-23')
     , (5, '2017-04-24')
     , (5, '2017-04-25')
     , (5, '2017-04-26')
     , (5, '2017-04-27')
     , (5, '2017-04-29')
     , (5, '2017-04-30')
     , (6, '2017-04-01')
     , (6, '2017-04-02')
     , (6, '2017-04-03')
     , (6, '2017-04-04')
     , (6, '2017-04-05')
     , (6, '2017-04-06')
     , (6, '2017-04-07')
     , (6, '2017-04-08')
     , (6, '2017-04-09')
     , (6, '2017-04-10')
     , (7, '2017-04-01')
     , (7, '2017-04-04')
     , (7, '2017-04-05')
     , (7, '2017-04-06')
     , (7, '2017-04-07')
     , (7, '2017-04-08')
     , (7, '2017-04-09')
     , (7, '2017-04-11')
     , (7, '2017-04-12')
     , (7, '2017-04-13')
     , (7, '2017-04-14')
     , (7, '2017-04-15')
     , (7, '2017-04-16')
     , (7, '2017-04-17')
     , (7, '2017-04-18')
     , (7, '2017-04-19')
     , (7, '2017-04-21')
     , (7, '2017-04-22')
     , (7, '2017-04-26')
     , (7, '2017-04-27')
     , (7, '2017-04-28')
     , (7, '2017-04-30')
     , (8, '2017-04-02')
     , (8, '2017-04-03')
     , (8, '2017-04-04')
     , (8, '2017-04-05')
     ;    
                                

DataGwynn (13 rep)

Jun 25, 2018, 09:40 PM • Last activity: Jun 27, 2018, 08:45 PM

1 votes

1 answers

427 views

Can a plpgsql stored procedure monitor for a query and kill it?

plpgsql greenplum

I want to write a function that monitors for a particular query becoming a blocker (i.e. it is both blocked by a query, and is blocking another query) and terminate it. Here is my current code, which I have amended to use `left join` for testing purposes so that it does not require the query to be b...

                                  I want to write a function that monitors for a particular query becoming a blocker (i.e. it is both blocked by a query, and is blocking another query) and terminate it. Here is my current code, which I have amended to use left join for testing purposes so that it does not require the query to be blocked:

    CREATE OR REPLACE FUNCTION monitor_sql_and_terminate_blocker(IN p_query text, OUT result text)
      RETURNS text AS
    $BODY$
    DECLARE
      monitor_sql text;
      rec record;
    BEGIN
    
      monitor_sql = '
    with blocker as
    ( 
      select distinct
        waiting.pid  pid
      , other.pid    blocker
      from pg_stat_activity
      join pg_catalog.pg_locks  waiting
        on waiting.pid = pg_stat_activity.procpid 
        and not waiting.granted
      join pg_catalog.pg_locks  other
        on ( ( other."database" = waiting."database" 
               and other.relation = waiting.relation ) 
             or other.transactionid = waiting.transactionid ) 
        and other.pid  waiting.pid
      where current_query not like ''%%'' 
    )
    , blockers as
    (
      select pid, array_to_string(array_agg(blocker),'','') blocker_list
      from blocker
      group by pid
    )
    , blocking as
    (
      select blocker, array_to_string(array_agg(pid),'','') blocking_list
      from blocker
      group by blocker
    )
    select
        procpid
    from pg_stat_activity
    left join blockers on blockers.pid = pg_stat_activity.procpid
    left join blocking on blocking.blocker = pg_stat_activity.procpid
    where current_query = ''' || p_query || '''
    ';
    
    LOOP
      FOR rec IN EXECUTE monitor_sql
      LOOP
        RAISE NOTICE 'Terminating procpid %', rec.procpid;
        PERFORM pg_terminate_backend(rec.procpid);
      END LOOP;
      PERFORM pg_sleep(1);
    END LOOP;
    
    END;
    $BODY$
      LANGUAGE plpgsql VOLATILE;

It is invoked like this:

    select monitor_sql_and_terminate_blocker('select * from very_large_table')

However, it just loops infinitely and never does anything. If I run the query manually, it finds the process and returns the procpid which I can then terminate manually.

This is because of transaction isolation, the function only sees the queries that were running when it started. If I run the monitor function while the query is running, it kills it successfully and then keeps trying to kill it again and again. What can I do to work around it?

My current solution is to move the loop out into a shell script that runs psql to invoke a version of this code that runs once and then exits.
                                

PhilHibbs (539 rep)

May 31, 2017, 12:02 PM • Last activity: Apr 9, 2018, 05:03 AM

Showing page 1 of 20 total questions