Sample Header Ad - 728x90

Greenplum/PostgreSQL: Querries run individually, but won't run in plpgsql function

2 votes
0 answers
23 views
We are running Greenplum 6.28.1, which is built on PostgreSQL 9.4.26. We have a master node and 6 server nodes, all running Oracle Linux release 8.10. We have 5 queries that progressively build temp tables that are use to generate a final result set. Those queries, in the order they are run, are:
drop table if exists hourly_time_windows;
create temp table hourly_time_windows as (
	WITH
        time_windows as (
            select 
                (generate_series(0, 23, 1)::text || ' Hours'::text)::interval as window_start_interval,
                (generate_series(1, 24, 1)::text || ' Hours'::text)::interval as window_end_interval
            ),
        dates_and_users as (
			select distinct
					activity_date,
					career_account
			from network_logs_processed.hourly_internet_activity_building log
			where activity_date >= '2024-02-01'::date and activity_date  0
			THEN 1
			ELSE 0 
			END) as device_moved_buildings,
		sum(megabytes_transferred) as megabytes_transferred
	from 
		ip.hourly_activity_with_building_lag data
	left outer join 
		(  
            select schedule.*, banner.career_account from utility_features.student_class_schedule_detail schedule
            left outer join banner_oltp.banner_lookup banner 
                on schedule.person_uid = banner.person_uid::numeric
		) class
		on 
			data.activity_date = class.calendar_date
			and data.career_account = class.career_account
			and data.building = class.building 
			and (
					(data.session_start_in_hour between class.begin_time and class.end_time)
					OR
					(data.session_end_in_hour between class.begin_time and class.end_time)
					OR 
					(data.session_end_in_hour > class.end_time and data.session_start_in_hour  0 and
	            campus in ('PWL','CEC') and
	            schedule_type not in ('IND','RES','EX') and
	            substring(course_identification from '[a-zA-Z]+#"[0-9]#"%' for '#')::int <= 4 and   --undergrad courses only
	            sub_academic_period in ('1','F8','FHS','M1','M2','M3','M12','M23','S8','SHS') and
	            registration_status like 'R%'
            ),
        housed_students as (
    		select distinct academic_period, person_uid
		    from utility_features.resident_student_room_details
            ),
        full_student_list as (
    		select academic_period, person_uid::numeric from registered_students
    		UNION
    		select academic_period, person_uid::numeric from housed_students
            )
    select 
    	full_student_list.academic_period, 
    	full_student_list.person_uid, 
    	banner.career_account 
    from 
    	full_student_list
    left outer join banner_oltp.banner_lookup banner
    	on full_student_list.person_uid = banner.person_uid::numeric
) distributed by (career_account);
drop table if exists aggregated_hourly_data;
create temp table aggregated_hourly_data as (
	select
		hourly_time_windows.career_account,
		banner.puid,
		banner.person_uid,
		ac.academic_period,
		hourly_time_windows.activity_date,
		hourly_time_windows.window_start,
		hourly_time_windows.window_end,
		sum(not_in_residential) as not_in_residential,
		sum(in_class) as in_class,
		sum(device_moved_buildings) as device_moved_buildings,
		sum(megabytes_transferred) as megabytes_transferred
	from 
		hourly_time_windows 
	left outer join hourly_activity_with_movement_flag
	on 	
		hourly_time_windows.career_account = hourly_activity_with_movement_flag.career_account
		and hourly_time_windows.window_start = hourly_activity_with_movement_flag.window_start
	left outer join banner_oltp.banner_lookup banner 
	on 
		hourly_time_windows.career_account = banner.career_account
	left outer join utility.academic_calendar ac 
	on 
		hourly_time_windows.window_start between ac.last_term_end_date + INTERVAL '1 day' and ac.end_date
	inner join included_students
		on hourly_time_windows.career_account = included_students.career_account
			and ac.academic_period = included_students.academic_period
	group by 1,2,3,4,5,6,7
) distributed by (career_account);
Here is my problem: If I run each query directly and count the rows in the temp table after it is created, all five queries complete in a minute or less. (That's 60 seconds to run all 5, not 60 seconds for each of them.) I created a plpgsql function to run those five queries. The first four completed in about the same time it takes when I run them directly, and the final row counts for all four tables were exactly the same. But I've let the function run for 30+ minutes and the fifth query still never completes. I also tried creating a plpgsql function to run just the first four queries and then running the fifth query directly. Again, the function completes very quickly, and the temp tables it creates have the same row counts as when I run the queries directly, but the fifth query still does not complete. I know PostgreSQL optimizes things differently when run in a function rather than individually, but I really thought that running just the first four in a function and the fifth directly would give different results. I am kind of at my wit's end here. Has anyone run into anything like this before?
Asked by lpscott (33 rep)
Mar 25, 2025, 02:59 PM