Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
1
votes
1
answers
1399
views
Redshift - is it possible to run query that select from external table - only if table exist
what i am try to do is something like this: select case when (/* here should come sql statement that check if table exist */) then (select count(*) from my_external_table) else 0 end I know that theoretically i could use the SVV_ALL views , but it's sometimes it's cuasing the query to be slow .. As...
what i am try to do is something like this:
select
case when (/* here should come sql statement that check if table exist */)
then
(select count(*) from my_external_table)
else
0
end
I know that theoretically i could use the SVV_ALL views , but it's sometimes it's cuasing the query to be slow ..
As an example for what i am looking for , this postgres query could be great for me , but Redshift do not support the
to_regclass
function:
select
case when exist(select * from (select to_regclass('my_table') as rg) where rg IS NOT NULL )
then
(select count(*) from my_table)
else
0
end
Eyal leshem
(145 rep)
Apr 27, 2022, 08:57 AM
• Last activity: Aug 5, 2025, 10:04 PM
1
votes
1
answers
205
views
Redshift (Serverless) SVV_TABLE_INFO shows huge storage sizes for tiny tables
My Redshift serverless shows massive storage size usage for tiny tables that so far have had only a couple DDL statements, only inserts, and are overall tiny tables. When I run `select "schema", "table", "tbl_rows", "size" FROM SVV_TABLE_INFO where "schema" = 'x' and "table = 'y'` I get 36 rows with...
My Redshift serverless shows massive storage size usage for tiny tables that so far have had only a couple DDL statements, only inserts, and are overall tiny tables.
When I run
select "schema", "table", "tbl_rows", "size" FROM SVV_TABLE_INFO where "schema" = 'x' and "table = 'y'
I get 36 rows with 2580 "size", which according to Redshift docs are 1mb blocks, so 2.6 GB of storage used with 36 rows. The SVV_TABLE_INFO
columns empty
, unsorted
, vacuum_sort_benefit
are all 0.
The fun part is when I run select * from x.y
I can copy the entire resultset to my clipboard and it comes to ~23kb total.
The AWS Redshift Serverless Web GUI similarly reports a whopping 1.1 TB of storage used (the same as when running sum("size")
against the SVV table btw) in the cluster. There is at most 100gb used in total.
Can someone help me figure out how/where those huge storage numbers come from?
## EDIT - Full SVV Dump for one table
{
"database": "xxx",
"schema": "x",
"table_id": 2234810,
"table": "y",
"encoded": "Y",
"diststyle": "KEY(id)",
"sortkey1": "received_at",
"max_varchar": 65535,
"sortkey1_enc": "none",
"sortkey_num": 1,
"size": 2580,
"pct_used": 0.0040,
"empty": 0,
"unsorted": 0.00,
"stats_off": 0.00,
"tbl_rows": 36,
"skew_sortkey1": 1.00,
"skew_rows": 100.00,
"estimated_visible_rows": 36,
"risk_event": null,
"vacuum_sort_benefit": 0.00,
"create_time": "2024-08-05T07:48:07.454Z"
}
Killerpixler
(131 rep)
Sep 3, 2024, 03:02 PM
• Last activity: Jul 25, 2025, 03:48 AM
0
votes
2
answers
718
views
Explicit locks in RedShift
I'm new to AWS Redshift and I wanted to know How i can implement my use-case. I have a Redshift table which gets lot of Inserts/Selects. In some cases I need to delete some data from that table. I don't want my Delete to impact other writes and reads. Can someone suggest what is the best locking mod...
I'm new to AWS Redshift and I wanted to know How i can implement my use-case.
I have a Redshift table which gets lot of Inserts/Selects.
In some cases I need to delete some data from that table. I don't want my Delete to impact other writes and reads.
Can someone suggest what is the best locking mode available in Redshift to implement my use case.
I don't want to end up with dadlocks and some locks hanging in my application.
I tried without locks and I'm observing the following violation:
Serializable isolation violation on table - 156300, transactions forming the cycle are: 6416378, 6416913 (pid:3770);
Also I went through the Redshift cxplicit locks but couldn't find any Official documentation for the same.
I know that Redshift is a wrapper on PostgreSQL, so can I use that documentation for implementing?
user3865748
(21 rep)
Dec 9, 2019, 12:14 PM
• Last activity: Jul 1, 2025, 05:03 AM
2
votes
1
answers
99
views
Ignoring trailing spaces when grouping by varchar type in Redshift
I created a table in Redshift and entered data as below. ``` create table public.temp ( id int, name varchar(20) ); insert into public.temp (id, name) values (1, 'bob'), (2, 'bob '); ``` And when counting based on the name column, it returns ('bob', 2). ``` select name, count(1) from public.temp gro...
I created a table in Redshift and entered data as below.
create table public.temp (
id int,
name varchar(20)
);
insert into public.temp (id, name) values (1, 'bob'), (2, 'bob ');
And when counting based on the name column, it returns ('bob', 2).
select name, count(1) from public.temp group by name;
Result: ('bob', 2);
When grouping by varchar type in redshift it seems to ignore trailing spaces.
And we ran another test.
select t, count(1)
from (
select 'a' as t
union all
select 'a ' as t
) as a
group by t;
Surprisingly, it returns the result including the trailing spaces.
('a', 1), ('a ', 1)
What is the difference between the two query execution methods and why do I get these results?
임승현Seunghyun Lim
(21 rep)
Jun 23, 2025, 02:08 AM
• Last activity: Jun 23, 2025, 08:05 AM
1
votes
1
answers
219
views
Why does Redshift ignore length when grouping by VARCHAR of only spaces?
Imagine I have a column `name` that is `n` rows and the `nth` entry is equal to `REPEAT(' ', n)`. For `SELECT DISTINCT name FROM table` I would expect to receive `n` rows because each name is unique by definition. For dialects like BigQuery and Snowflake, I do receive this. However for Redshift I on...
Imagine I have a column
name
that is n
rows and the nth
entry is equal to REPEAT(' ', n)
. For SELECT DISTINCT name FROM table
I would expect to receive n
rows because each name is unique by definition. For dialects like BigQuery and Snowflake, I do receive this. However for Redshift I only receive one row. Why is this?
Tanner Clary
(11 rep)
Jan 29, 2024, 07:01 PM
• Last activity: Jun 22, 2025, 03:03 PM
3
votes
1
answers
466
views
Is Redshift translating "dbo" schema to "public" schema?
We have a strange issue which has appeared in the last couple of days. We have two Amazon Redshift clusters. One cluster has our production workload, and the other cluster has our non-production workload. There are 4 copies of the same database. 1 production copy, and 3 copies in non-production. In...
We have a strange issue which has appeared in the last couple of days.
We have two Amazon Redshift clusters. One cluster has our production workload, and the other cluster has our non-production workload. There are 4 copies of the same database. 1 production copy, and 3 copies in non-production.
In the last few days, data extracts began failing in our non-production environment. Upon further investigation, we see that the error is:
> ERROR: schema "dbo" does not exist
So, upon closer inspection. Sure enough, the query is using
dbo
as the schema and there is no dbo
schema in the database. All of the tables exist in the public
schema. This would make sense if this didn't work for months prior to this event, and is actually still working in production.
The original (simplified) query looks like this:
select top 1 * from [dbo].[my_table];
In this case, dbo
does not exist in any environment. my_table
exists in the public
schema. However, the above query works fine still on our production cluster, while it now fails, with the invalid schema error on our non-production cluster.
I also tried changing the query to:
select top 1 * from "dbo"."my_table";
I get the same results. Works fine in production. Does not work in non-production. In fact, if I try select top 1 * from "public"."my_table";
I get the same results (as dbo
) in production, and the query works as expected in non-production. And, if I try select top 1 * from "asdf"."my_table"
it fails universally.
I also checked the Redshift user's search path, in case that had something to do with it. In each environment the search patch is default: $user,public
I confirmed on all the databases that the dbo
schema does NOT exist, and that all the tables are in public
. I also checked the cluster version and update history. I identified that the non-production cluster is running version 1.0.44126
, and the production cluster is running version 1.0.43931
. The extracts did start failing after this update. I cannot find any reference to version 1.0.44126
online and have no idea what has changed here. (C'mon AWS, update your documentation!)
I can't find any reference to Redshift translating dbo
to public
or any other related feature that might do that.
I am able to reproduce the issue by issuing the following two statements:
create table public.test_table ( test integer );
select * from dbo.test_table;
I can also do: drop table dbo.test_table;
It will work fine on my 1.0.43931
cluster, and fail on the 1.0.44126
cluster. Clearly there is a mechanism here which is treating these two schemas names the same.
**Why does the query select top 1 * from "dbo"."my_table";
work at all when the actual table is "public"."my_table"
?**
**Why did it stop working all the sudden in version 1.0.44126 (if that is the case)?**
Appleoddity
(334 rep)
Nov 28, 2022, 04:12 PM
• Last activity: Jun 16, 2025, 11:01 AM
3
votes
0
answers
1020
views
Use or include external schema in search_path on Redshift
# Narrative I have a sql script that creates a bunch of tables in a temporary schema name in Redshift. I don't want to repeat the schema name a bunch of times, so I would like to do something like the following at the top of the script: use long_external_schema_name; My understanding is that in Reds...
# Narrative
I have a sql script that creates a bunch of tables in a temporary schema name in Redshift. I don't want to repeat the schema name a bunch of times, so I would like to do something like the following at the top of the script:
use long_external_schema_name;
My understanding is that in Redshift (inheriting from Postgres), you would do:
set search_path to '$user', public, long_external_schema_name;
However, I get the following error:
ERROR: External schema "long_external_schema_name" cannot be set in search_path
Because, it is an **external schema**.
# Question
Is there any equivalent way that I could stay DRY and write the external schema name only once while I create a bunch of tables in it?
# More Context
Note, I know I have lots of options in bash (arguments, sed, vim, etc) to replace the schema name in the script, but I'm trying to do something more native to Redshift / psql.
combinatorist
(233 rep)
Aug 20, 2019, 08:03 PM
• Last activity: Jun 5, 2025, 06:34 PM
0
votes
1
answers
240
views
Get the Replication/Lag time of Redshift
I'm currently running RedShift with pushes of all data from our production Postgresql databases every 10 minutes or so. Periodically, the ETL process from Postgresql to Redshift gets delayed or backed up. Is there any way to monitor the lag time between RedShift and Postgresql?
I'm currently running RedShift with pushes of all data from our production Postgresql databases every 10 minutes or so. Periodically, the ETL process from Postgresql to Redshift gets delayed or backed up. Is there any way to monitor the lag time between RedShift and Postgresql?
user2694306
(121 rep)
Jun 8, 2016, 03:16 AM
• Last activity: May 24, 2025, 05:05 PM
0
votes
1
answers
918
views
What's the equivalent of URLencode for Redshift?
Using Redshift, I need to encode a string before I pass it as a GET string in a URL. Is there an equivalent of `URLencode()` for Redshift SQL? I am passing data from Redshift to an email marketing program, that will then use the data to form a custom link. I am already doing a number of things to pr...
Using Redshift, I need to encode a string before I pass it as a GET string in a URL. Is there an equivalent of
URLencode()
for Redshift SQL?
I am passing data from Redshift to an email marketing program, that will then use the data to form a custom link. I am already doing a number of things to prep the data for the email marketing program as part of my SELECT statement and it would be cleanest if I could prep one field with URLencode()
as part of the other data prep.
Polyvinyl
(1 rep)
Nov 18, 2021, 04:25 PM
• Last activity: May 20, 2025, 02:03 PM
1
votes
1
answers
332
views
How to force Redshift/Postgres to aggregate nth_value?
I'm trying to retrieve the 2nd, 3rd, 4th order/transaction from an orders table using a query that looks like this: select customer_email, nth_value(created_at, 1) over (partition by customer_email), nth_value(created_at, 2) over (partition by customer_email), nth_value(created_at, 3) over (partitio...
I'm trying to retrieve the 2nd, 3rd, 4th order/transaction from an orders table using a query that looks like this:
select
customer_email,
nth_value(created_at, 1) over (partition by customer_email),
nth_value(created_at, 2) over (partition by customer_email),
nth_value(created_at, 3) over (partition by customer_email),
nth_value(created_at, 4) over (partition by customer_email)
from
fact_orders
where
group by
customer_email,
created_at
limit 100
My goal is to get the 1st, 2nd, 3rd, and 4th order for each customer.
I'm forced to do a group by on customer_email and created_at, which results in a row for each order and date by a customer.
The nth_value yields the correct results, but I would prefer a result set that looks as so:
customer_email, first_order, second_order, third_order, fourth_order
With only 1 record per customer.
How can I achieve this?
john
(153 rep)
Jul 10, 2018, 04:43 AM
• Last activity: Apr 21, 2025, 11:03 AM
1
votes
1
answers
387
views
User has more privileges in redshift than shown by relevant query
Working in `Redshift` ``` dev=# SELECT table_catalog, table_schema, table_name, privilege_type FROM information_schema.table_privileges WHERE grantee = 'myser'; table_catalog | table_schema | table_name | privilege_type ---------------+--------------+------------+---------------- (0 rows) ``` Howeve...
Working in
Redshift
dev=# SELECT table_catalog, table_schema, table_name, privilege_type
FROM information_schema.table_privileges
WHERE grantee = 'myser';
table_catalog | table_schema | table_name | privilege_type
---------------+--------------+------------+----------------
(0 rows)
However the user is able to perform select on several randomly tested pg_catalog
schema tables, e.g. SELECT * FROM pg_class_info;
, SELECT * FROM pg_class;
etc.
The only GRANT
command I run after creating myser
was:
GRANT SELECT ON stl_connection_log TO myuser;
after setting
SET search_path TO pg_catalog;
what am I missing?
pkaramol
(213 rep)
Feb 24, 2021, 03:32 PM
• Last activity: Apr 14, 2025, 05:06 PM
2
votes
1
answers
4875
views
Unexpected Redshift support for the ISNULL() function. Is this documented?
**As far as I know, the `ISNULL()` function is only compatible with T-SQL.** Yet, I am working on a Redshift cluster, using Datagrip, that accepts the `ISNULL()` function and returns results without error. I'm even getting the contextual helptext on it in the IDE. I have found zero Redshift document...
**As far as I know, the
ISNULL()
function is only compatible with T-SQL.**
Yet, I am working on a Redshift cluster, using Datagrip, that accepts the ISNULL()
function and returns results without error. I'm even getting the contextual helptext on it in the IDE.
I have found zero Redshift documentation and zero PostgreSQL documentation that even refers to the ISNULL()
function, let alone confirms support of it. **Is there documentation out there for support of ISNULL()
? Is this function unofficially supported, since it works, but I can't find any documentation?**
Redshift environment info:
DBMS: Redshift (ver. 8.0.2)
Driver: Redshift JDBC Driver (ver. 2.0.0.3, JDBC4.2)
Effective version: Amazon Redshift (ver. 1.0.35480)
dev_etter
(762 rep)
Feb 11, 2022, 05:48 PM
• Last activity: Mar 2, 2025, 08:32 AM
0
votes
1
answers
596
views
Strange error in Redshift stored procedure
I have a stored procedure in Redshift that is loading data from a bunch of S3 JSON files in a loop. Every time a file is loaded, its name is stored into a log table on Redshift side so i can link it to records in my staging tables, which i also use to avoid loading same file twice. Any error during...
I have a stored procedure in Redshift that is loading data from a bunch of S3 JSON files in a loop.
Every time a file is loaded, its name is stored into a log table on Redshift side so i can link it to records in my staging tables, which i also use to avoid loading same file twice.
Any error during the loading is also recorded.
Anyways, everything worked fine for some time, but during the last run, I got following error (this is the exact error message):
> catalog is missing 1 attribute(s) for relid 5477788
Checking AWS documentation for Redshift, and looking into tables **STL_ERROR**, **STL_LOAD_ERRORS**, **SVL_STORED_PROC_MESSAGES**, **SVL_STORED_PROC_CALL**, **pg_class** did not clarify anything.
Any googling points only to PostgreSQL mentioning possible database corruption. I know that Redshift is based on PostgreSQL, but, how can i check it specifically for Redshift, that my database or table(s) are not corrupted?
Has anyone encountered similar error in Redshift before?
Oleksandr
(121 rep)
Mar 17, 2023, 10:46 AM
• Last activity: Feb 22, 2025, 07:03 AM
2
votes
1
answers
3437
views
Redshift- How to use previous row's calculations in current row
I am using redshift to determine projected inventory for next few week. I have receipts, current inventory and forecast. ei= ei(previous row) + receipts - forecast wk bi r f ei 1 100 20 80 40 2 50 0 3 30 15 15 4 40 14 41 5 10 31 create table ra_analytics.Test_1Test (wk int, bi int ,r int ,f int ,ei...
I am using redshift to determine projected inventory for next few week. I have receipts, current inventory and forecast.
ei= ei(previous row) + receipts - forecast
wk bi r f ei
1 100 20 80 40
2 50 0
3 30 15 15
4 40 14 41
5 10 31
create table ra_analytics.Test_1Test (wk int, bi int ,r int ,f int ,ei int);
insert into ra_analytics.Test_1Test values (1, 100 ,20 ,80 ,0);
insert into ra_analytics.Test_1Test values (2, 0 ,0 ,50 ,0);
insert into ra_analytics.Test_1Test values (3, 0 ,30 ,15 ,0);
insert into ra_analytics.Test_1Test values (4, 0 ,40 ,14 ,0);
insert into ra_analytics.Test_1Test values (5, 0 ,0 ,10 ,0);
select * from ra_analytics.test_1test order by wk
--Update first week of inventory
update ra_analytics.test_1test
set Ei= bi+r-f
from ra_analytics.test_1test where wk=1
--for next weeks inventory using lag function
select wk, bi, r, f, lag(ei,1) over(order by wk) +r -f as endinv
from ra_analytics.test_1test
order by wk
igotsar
(21 rep)
May 11, 2018, 02:36 PM
• Last activity: Feb 7, 2025, 12:01 AM
5
votes
2
answers
5061
views
Redshift: Return rows if value exists in cell array
How do I return the rows that include a particular value in a column's cell array? Imagine we have a table like so: id name phone values 1 Taylor xxx.xxx.xxxx [4,6,5] 2 John yyy.yyy.yyyy [1,5,2] 3 Peter zzz.zzz.zzzz [6,2,6] I need to create an SQL query that will return the rows where the value '6'...
How do I return the rows that include a particular value in a column's cell array?
Imagine we have a table like so:
id name phone values
1 Taylor xxx.xxx.xxxx [4,6,5]
2 John yyy.yyy.yyyy [1,5,2]
3 Peter zzz.zzz.zzzz [6,2,6]
I need to create an SQL query that will return the rows where the value '6' exists in the
values
array. So the expected output would be:
id name phone values
1 Taylor xxx.xxx.xxxx [4,6,5]
3 Peter zzz.zzz.zzzz [6,2,6]
We are working in Redshift. So, if it is easier, the possibility of using the json_extract_array_element_text('json_string', pos)
function could be used. Please note that the length of the values arrays could be different from each other.
Myg0t
(51 rep)
Jan 3, 2017, 04:18 PM
• Last activity: Jan 25, 2025, 08:08 AM
2
votes
1
answers
381
views
What code underlies the PG_GET_LATE_BINDING_VIEW_COLS sys information function?
We want to transition to late-binding views, but late-binding view metadata is not presented by some clients (workbench/J, Toad, pgcli) which makes it hard to discover column names in these views. AWS [provides][1] a function, PG_GET_LATE_BINDING_VIEW_COLS, to get the metadata, but we can't seem to...
We want to transition to late-binding views, but late-binding view metadata is not presented by some clients (workbench/J, Toad, pgcli) which makes it hard to discover column names in these views. AWS provides a function, PG_GET_LATE_BINDING_VIEW_COLS, to get the metadata, but we can't seem to load this into a redshift table. Are there PG sys tables storing the late-binding view columns, or what underlies this function?
respondeo
(21 rep)
Jul 26, 2018, 04:42 PM
• Last activity: Dec 26, 2024, 02:05 AM
0
votes
2
answers
1650
views
How to query redshift SUPER for non empty objects
**Context:** I have a column called `event_properties` which has a lot of values of `{}` which is a redshift SUPER datatype. A lot of values are of empty object such as: [![empty event properties][1]][1] **Objective:** Filter out values that aren't an "empty object" **What I've Tried**: - casting as...
**Context:**
I have a column called
**Objective:**
Filter out values that aren't an "empty object"
**What I've Tried**:
- casting as ::text, will result in null values
- the
event_properties
which has a lot of values of {}
which is a redshift SUPER datatype. A lot of values are of empty object such as:

len
(length) function is not supported for SUPER type
**What I'm trying to accomplish:**
(the queries below don't work, but should demonstrate what I'm trying to accomplish)
select event_properties from native_app_events where event_properties::text != '{}'
select event_properties from native_app_events where len(event_properties) > 5
Andrew Wei
(205 rep)
Jan 30, 2023, 08:45 PM
• Last activity: May 10, 2024, 08:53 PM
7
votes
1
answers
571
views
Deduplicate SELECT statements in relational division
I have a query that does a lot of duplicate work: SELECT visitor_id, '1'::text AS filter FROM events WHERE id IN (SELECT event_id FROM params WHERE key = 'utm_campaign' AND value = 'campaign_one') AND id IN (SELECT event_id FROM params WHERE key = 'utm_source' AND value = 'facebook') GROUP BY visito...
I have a query that does a lot of duplicate work:
SELECT visitor_id, '1'::text AS filter
FROM events
WHERE id IN (SELECT event_id FROM params
WHERE key = 'utm_campaign' AND value = 'campaign_one')
AND id IN (SELECT event_id FROM params
WHERE key = 'utm_source' AND value = 'facebook')
GROUP BY visitor_id
UNION ALL
SELECT visitor_id, '2'::text AS filter
FROM events
WHERE id IN (SELECT event_id FROM params
WHERE key = 'utm_campaign' AND value = 'campaign_two')
AND id IN (SELECT event_id FROM params
WHERE key = 'utm_source' AND value = 'facebook')
GROUP BY visitor_id
As you can see, it performs different filtering on the params table 4 times. I'm using Redshift and although it scans this table very quickly, I have quite a few of these statements
UNION
ed together. Is there a way to rewrite the SQL using CASE
/IF
statements?
The example uses key = 'utm_source' AND value = 'facebook'
in both, but this is not necessarily true for all selects.
Sam
(647 rep)
Aug 17, 2015, 11:30 PM
• Last activity: May 10, 2024, 12:44 AM
0
votes
1
answers
711
views
redshift drop table with + and spaces in name
I have a table that a user created which appears to be from a python variable that never interpolated. proddb01=# \dt+ *transtable* List of relations schema | name | type | owner | description --------+-----------------+-------+--------+------------- public | + transtable + | table | app | (1 row) I...
I have a table that a user created which appears to be from a python variable that never interpolated.
proddb01=# \dt+ *transtable*
List of relations
schema | name | type | owner | description
--------+-----------------+-------+--------+-------------
public | + transtable + | table | app |
(1 row)
I am unable to drop it, I can only describe it with the \dt+ and wildcard method. Can anyone point me in the right direction?
jaysunn
(11 rep)
Jul 6, 2017, 01:02 PM
• Last activity: Mar 13, 2024, 05:56 PM
0
votes
0
answers
114
views
Best practice for applying complex business logic during transformation in ETL from Aurora db to Redshift data warehouse
I'll be pushing data from an Amazon Aurora db to a Redshift data warehouse. The painful part is that the transformation portion of the ETL workflow relies *heavily* on business logic, that lives within an application's codebase, in order to be derived into a useful format for the data warehouse. We...
I'll be pushing data from an Amazon Aurora db to a Redshift data warehouse. The painful part is that the transformation portion of the ETL workflow relies *heavily* on business logic, that lives within an application's codebase, in order to be derived into a useful format for the data warehouse. We don't want to remove the business logic from the application layer.
My thinking is a couple of different options.
Option 1:
Using AWS Glue as the ETL framework
1. Extract data
2. Apply business logic on that data by calling an API (via AWS Glue Network Connection) that applies the business logic during transformation
3. Load into Redshift
Option 2:
1. Hourly background job in application to extract records that need to be pushed to Redshift
2. Apply business logic on that data in the background job, transform, and stage it in a table on Aurora
3. Use AWS Glue to load from Aurora to Redshift
All these tools are new to me, so it's difficult to know which to use. Glue, DMS, Data Pipeline? Any advice?
mstrom
(143 rep)
Feb 4, 2024, 01:14 PM
• Last activity: Feb 4, 2024, 01:37 PM
Showing page 1 of 20 total questions