Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

1 answers

137 views

Postgis distance filter does not work as expected

I have a issue with simple PostGIS query. Here is a query: SELECT id, location, ST_Distance_Sphere(location, ST_MakePoint(21.012228699999998, 52.2296756)) AS distance FROM pins WHERE ST_Distance_Sphere(location, ST_MakePoint(21.012228699999998, 52.2296756)) > 6008000.1836031 ORDER BY distance ASC, i...

                                  I have a issue with simple PostGIS query.

Here is a query:

    SELECT 
      id,
      location,
      ST_Distance_Sphere(location, ST_MakePoint(21.012228699999998, 52.2296756)) AS distance
    FROM pins
    WHERE
      ST_Distance_Sphere(location, ST_MakePoint(21.012228699999998, 52.2296756)) > 6008000.1836031
    ORDER BY distance ASC, id ASC 
    LIMIT 11;

And here is a result:

 

    id  |                      location                      |     distance     
    -----+----------------------------------------------------+------------------
      58 | 0101000020E6100000E253B4CB61C1E13FB7ED0CC45974EF3F |  6008000.1836031
     162 | 0101000020E6100000DE133C027D95E03F3C15FD84A6BAEF3F | 6008157.46639754
     186 | 0101000020E6100000D339D99403BFE33F541C1E242AD5EE3F |  6008257.4084038
      55 | 0101000020E6100000B8042A79A500E63F775E82BA1908EE3F | 6008878.66960698
      98 | 0101000020E6100000CD48F884CD1BED3FAD51BA15F017EC3F | 6008923.81556693
      56 | 0101000020E6100000EECE76EEFE68E63F1681C811377DED3F | 6010325.52640522
     195 | 0101000020E6100000EB87B04B4DC0E33F0AEAE22BECADED3F | 6012118.48976013
     100 | 0101000020E6100000603039EEF27FD83FE70E98797AAEEF3F | 6012318.05377066
     108 | 0101000020E610000045BA52BB3EA0E23F4B96A28081D5ED3F | 6012631.63019378
     117 | 0101000020E61000003ABE6BC7FAD9D43F4D1CF4D130EEEF3F |  6013175.7742802
      91 | 0101000020E610000093B62A082965ED3F911CF0DA8FBFEA3F | 6013178.70288832

For some reason row with id = 58 is returned and its distance is 6008000.1836031 despite I've used where filter ST_Distance_Sphere(location, ST_MakePoint(21.012228699999998, 52.2296756)) > 6008000.1836031

I guess Postgis calculates distance with greater precision?
How I can avoid this situation?
                                

user606521 (1415 rep)

Aug 25, 2016, 08:49 AM • Last activity: Aug 4, 2025, 11:05 AM

0 votes

2 answers

30 views

How to correctly dumpall a PostgreSQL cluster when in the newer version one extension is separated into 2 extensions

postgresql postgis pg-dump pg-restore postgresql-extensions

I have a PostgreSQL version 10 and a PostGIS extension. In this old version of PostgreSQL, the PostGIS has also support for raster. But in the newer versions of PostGIS, the raster support in a separate extension called: postgis_raster. Thus, we wont have this line in our dumpall file: ``` CREATE EX...

CREATE EXTENSION postgis_raster;

And when I restore it it tells me it does not recognize the raster type! My file is very big if I do not zip it before storing it on the disk. If I zip it, I wont be able to change the lines in the file to add this extension to the dump file manually. I was thinking to do a pgdumpall with --global-only flag. Then, later dump each DB one by one using pgdump. However, I was afraid that I may miss a detail from my DB cluster. Is there a way to ask the pgdumpall to consider that postgis_raster is a separate extension and should be added to the dump file? How can I safely dump and restore my cluster?

milad (101 rep)

Jul 29, 2025, 03:09 PM • Last activity: Jul 31, 2025, 07:47 AM

0 votes

2 answers

6414 views

Psycopg2 Errors on SQL statement when trying to copy data from CSV file into PostgreSQL database

postgresql postgis python

I am not a developer or PostgreSQL DB admin, so this may be basic questions. Logistics: Windows 10 server / pgAdmin 4 / Postgres 10 / Python 2.7.13 I'm using a python script to ingest external data, create a CSV file and copy that into Postgres 10. I keep getting the following error: **Psycopg2.Prog...

                                  I am not a developer or PostgreSQL DB admin, so this may be basic questions.  
Logistics: Windows 10 server / pgAdmin 4 / Postgres 10 / Python 2.7.13

I'm using a python script to ingest external data, create a CSV file and copy that into Postgres 10. I keep getting the following error: 
**Psycopg2.ProgrammingError: syntax error at or near "VALUES"**

I have a two part question - 1) I can not see the syntax error in the following sql statement

    def insert_csv_data(sqlstmt):
    with get_conn('pg') as db:
        cur = db.cursor()
        sqlcopy = "COPY irwin (fire_id,name,type,acres,date_time,state,county,admin_unit,land_cat,commander,perc_cntnd,cont_date,gacc,lat,long,geom,updated,imo) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,ST_SetSRID(ST_MakePoint(%s, %s),4326)%s,%s) FROM STIN DELIMITER ',' CSV HEADER"

        with open(csv_file, 'r') as f:
            #next(f)# Skipping the first line header row
            cur.copy_expert(sqlcopy, f, size=8000)
            db.commit()
            cur.close()

And 2) Once that is resolved I'm expecting to get an error about the geometry column in postgres. If someone would also peek at the code snippets and let me know if anything jumps out I would SO APPRECIATE IT!

This snippet pulls the external data in order but I don't think I've coded this correctly to pull the lat/long into the geom field. 

                # Lat 15 - double
            if not attributes['InitialLatitude'] is None:
                lat = str(attributes['InitialLatitude']).replace('\n', '')
            else:
                lat = '0'

            #Long 16 - double
            if not attributes['InitialLongitude'] is None:
                long = str(attributes['InitialLongitude']).replace('\n', '')
            else:
                long = '0'

            # geom is not defined - script is dumping the geometry into the IMO field
            geom = str(attributes['InitialLatitude']) + ' ' + str(attributes['InitialLongitude'])

I added a Geom header to the csv data. Please help - thanks!


                                

Cödingers Cat (1 rep)

Jun 10, 2019, 06:59 PM • Last activity: Jul 25, 2025, 02:04 PM

1 votes

1 answers

149 views

Copy postgis layer from S3 to Heroku

postgresql restore postgis aws heroku

I have a dump of a postgis layer (layer.dump), which I am trying to add to my heroku database (mydatabase). The postgis layer is stored on S3 (https://s3.amazonaws.com/layer.dump). I would like to add the layer to to the heroku database and previously used `heroku pgbackups:restore DATABASE 'https:/...

                                  I have a dump of a postgis layer (layer.dump), which I am trying to add to my heroku database (mydatabase). The postgis layer is stored on S3 (https://s3.amazonaws.com/layer.dump) . I would like to add the layer to to the heroku database and previously used heroku pgbackups:restore DATABASE 'https://s3.amazonaws.com/layer.dump '. However, the new heroku pg:backups restore 'https://s3.amazonaws.com/layer.dump ' DATABASE deletes all data from the target database before restoring the backup (https://devcenter.heroku.com/articles/heroku-postgres-backups) . Is there still a way to only restore a single table and leave the remaining tables in the database untouched?
                                

Anne (143 rep)

Sep 2, 2015, 06:44 PM • Last activity: Jul 18, 2025, 07:06 PM

1 votes

1 answers

50 views

Does ST_AsMVTGeom/ST_AsMVT truncate data, or is my query wrong?

postgresql postgis

Here's the individual activity, which is GeoJSON displayed on the map rather than doing any ST_AsMVTGeom/ST_AsMVT work: [![individual activity via GeoJSON][1]][1] Here's the result of the ST_AsMVTGeom/ST_AsMVT query: [![activities via ST_AsMVT][2]][2] If you compare the purple lines in the second im...

Here's the individual activity, which is GeoJSON displayed on the map rather than doing any ST_AsMVTGeom/ST_AsMVT work:

Here's the result of the ST_AsMVTGeom/ST_AsMVT query:

If you compare the purple lines in the second image to the individual activity display in the first image, you can see how the stretch of 15th Ave S is missing in the ST_AsMVT version. I'm _decently_ certain that the underlying data for this activity is a point at the fork in the road on 15th Ave S and then another point down on E 62nd St, with no coordinates existing in between, because this person said they drew this route manually. This is query I'm running to produce the tiles (where #{z}, #{x}, #{y}, and #{id} are passed-in variables):

WITH bounds AS (
  SELECT ST_TileEnvelope(#{z}, #{x}, #{y}) AS geom
),
tile AS
(
  SELECT ST_AsMVTGeom(ST_Transform(activities.geog::geometry, 3857), bounds.geom) AS geom, id
  FROM activities, bounds
  WHERE activities.geog::geometry && ST_Transform(bounds.geom, 4326)
  AND user_id = #{id}
)
SELECT ST_AsMVT(tile, 'activities', 4096, 'geom', 'id') AS mvt
FROM tile;

- PostgreSQL 14.10 - PostGIS 3.4.0 Is there something about my query that's causing the data to be dropped and not displayed, or is this the expected result out of ST_AsMVTGeom/ST_AsMVT?

James Chevalier (111 rep)

Mar 12, 2024, 05:20 PM • Last activity: Jun 10, 2025, 11:59 PM

3 votes

1 answers

226 views

Access points of geometric path data in PostgreSQL 8.1

postgresql spatial postgis postgresql-8.1

Summary: I would like to perform basic operations on the points of geometric path data contained in a PostgreSQL 8.1 server which I do not have administrative access to. In particular I would like to be able to iterate over these points using SQL and to access paths as arrays (or something functiona...

                                  Summary:

I would like to perform basic operations on the points of geometric path data contained in a PostgreSQL 8.1 server which I do not have administrative access to. In particular I would like to be able to iterate over these points using SQL and to access paths as arrays (or something functionally equivalent).

---

Background:

I'm a novice database user who is working with an unsupported legacy application which is using PostgreSQL 8.1 as a back-end database. I would like to be able to manually run some sanity checks on the data in the database in pure SQL (or PL/pgSQL), i.e. I'd like to be able to avoid using an external programming language, which is what I'm currently doing.

Several tables in the database include columns containing geometric data. Here is a description of one such table:

    psql -U "${appuser}" -d "${appdatabase}" -c '\d+ path_table'

                Table "public.path_table"
     Column |  Type   | Modifiers | Description 
    --------+---------+-----------+-------------
     id     | integer | not null  | 
     path   | path    | not null  | 
    Indexes:
        "path_table_pkey" PRIMARY KEY, btree (id)
    Check constraints:
        "path_table_id_check" CHECK (id > 0)
    Has OIDs: no


I consulted the section on [Geometric Functions and Operators](http://www.postgresql.org/docs/8.1/static/functions-geometry.html)  from the [PostgreSQL 8.1 Manual](http://www.postgresql.org/docs/8.1/static/) , but to no avail.

Since this might be relevant, I'll mention that I believe that PostGIS 1.3.6 is installed, i.e.:

    $ psql -U "${appuser}" -d "${appdatabase}" -c 'SELECT PostGIS_full_version();'
    
                                      postgis_full_version                                  
    ----------------------------------------------------------------------------------------
     POSTGIS="1.3.6" GEOS="3.1.1-CAPI-1.6.0" PROJ="Rel. 4.7.1, 23 September 2009" USE_STATS
    (1 row)


My understanding is that I can get a list of all installed functions by using the psql \df meta-command, i.e.

    $ psql -U "${appuser}" -d "${appdatabase}" -c '\df'

My understanding is also that all of the available PostGIS commands should be prefixed by the string 'ST_' (denoting spatial type), so that the following should produce a list of all the available PostGIS commands:

    $ psql -U "${appuser}" -d "${appdatabase}" -Alt -c '\df' | cut -d'|' -f2 | grep -i '^st_'

Inspecting this list didn't produce any likely candidates besides the ST_Dump function, which doesn't appear to apply to a path type:

    SELECT st_dump(path_column) FROM table;

    ERROR:  function st_dump(path) does not exist
    HINT:  No function matches the given name and argument types. You may need to add explicit type casts.

Apparently the st_dump argument needs to be of type geometry, so I tried following the hint and casting the path to a geometry type:

    SELECT st_dump(path_column::geometry) FROM table;

    ERROR:  cannot cast type path to geometry

I also consulted the [Geometry Accessors](http://postgis.net/docs/manual-1.3/ch06.html#id438545)  of the [PostGIS 1.3.6 Manual](http://postgis.net/docs/manual-1.3)  but apparently I don't know enough to be able to cull a solution from the documentation.
                                

igal (345 rep)

Nov 6, 2015, 03:05 PM • Last activity: Jun 9, 2025, 12:07 AM

0 votes

1 answers

540 views

KNN search two arrays with 2000 elements

postgresql postgresql-9.3 postgis nearest-neighbor

**What I did:** 1. installed postgresql 9.3.1 2. installed postgis 2.1.2 3. I tried to find the distance by selecting column input_variable_1 with value 50. I got the following result. select input_variable_1 50 as distance from test_ata limit 10; distance 48 2 17 29 44 37 45 17 49 2 (10 rows) **wha...

                                  **What I did:**

 1. installed postgresql 9.3.1
 2. installed postgis 2.1.2
 3. I tried to find the distance by selecting column input_variable_1 with value 50.

I got the following result.

    select input_variable_1  50  as distance  from test_ata limit 10;
     distance 
           48
            2
           17
           29
           44
           37
           45
           17
           49
            2
    (10 rows)


**what i want:**

*input*: column which stores array values say 2000 elements in a single record.

I came across cube extension and postGIS there i can get distance between two points by using tree and distance concepts for lower dimentsions upto 100.

 1. How to find the distance between two arrays having elements atleast 2000 in postgresql?
 2. which tree is best for 2000 dimension indexing in postgresql?
 3. finally i want to know how to use KNN search in postgresql?


                                

harishankar (9 rep)

Mar 23, 2017, 02:15 PM • Last activity: May 23, 2025, 09:08 PM

1 votes

1 answers

273 views

PostgreSQL - PostGIS ST_SetValues - int array cast to double and back

postgresql postgis array

I am using `ST_SetValues` with array defined as `ARRAY[[9, 9], [9, 9]]::double precision[][]`. If I store 32bit `integer` value in this array can I retrieve this value exactly after casting from `double` to `integer`? Or is there some limited precission for `integer` part of the `double`. I want to...

                                  I am using ST_SetValues with array defined as ARRAY[[9, 9], [9, 9]]::double precision[][].

If I store 32bit integer value in this array can I retrieve this value exactly after casting from double to integer?

Or is there some limited precission for integer part of the double. I want to store "packed" RGBA into a single 32bit integer there and "unpack" single color channels after output in my app.

As from documentation, double should be 64bit, so in my oppinion, this should be possible, but maybe I am missing something.

Martin Perry (231 rep)

Nov 6, 2016, 12:53 PM • Last activity: May 20, 2025, 09:01 AM

0 votes

1 answers

48 views

How to migrate a PostgreSQL database with the postgis schema?

postgresql postgis

I've been tasked to migrate a bunch of postgres databases from version 10.5 to 16. Most contain the PostGIS extension (ver. 2.4.4), I've successfully migrated a few via the postgis_restore.pl script. However, I am having difficulties when a database has the postgis schema present. I don't understand...

                                  I've been tasked to migrate a bunch of postgres databases from version 10.5 to 16. Most contain the PostGIS extension (ver. 2.4.4), I've successfully migrated a few via the postgis_restore.pl script. However, I am having difficulties when a database has the postgis schema present. I don't understand why these "gis" databases have the postgis schema and some do not. I don't know the history of these databases perhaps they moved from an even older version of postgres/postgis, hence the presence of the postgis schema.

I'm getting the classic error during migration:

> ERROR:  type "postgis.geometry" does not exist

So far the only solution has been for me to create the table structures and tinker with the the_geom column before migrating with postgis_restore.pl. This is OK when there are only a few objects that need created. But this becomes burdensome when I have to create dozens of objects due to this issue. Is there an easier way to handle this? I thought about pg_dump the postgis schema with all the tables, operators, functions, but should I be doing that?

Eric Swiggum (694 rep)

May 8, 2025, 05:07 PM • Last activity: May 8, 2025, 06:20 PM

0 votes

1 answers

327 views

Does daily pg_dump mess up postgres cache?

postgresql performance postgis pg-dump cache

I migrated my geospatial Postgres 12.5 database to another cloud provider. I use postgis and I have around 35GB of data and 8GB of memory. Performances are way worse than on my previous provider, and new provider claims this is because the pg cache has to been "warmed up" everyday after automatic pg...

                                  I migrated my geospatial Postgres 12.5 database to another cloud provider. I use postgis and I have around 35GB of data and 8GB of memory.

Performances are way worse than on my previous provider, and new provider claims this is because the pg cache has to been "warmed up" everyday after automatic pg_dump backuping operations occuring in the night.

Geospatial queries that would normally take 50ms sometimes take 5-10s on first request, and some that would run in 800ms take minutes.

Is there something else looming or is the technical support right ?

If so, should I disable daily backups ? Or can I somehow use a utility function to restore the cache ? (pg_prewarm ?)

Pak (101 rep)

Feb 25, 2021, 10:04 AM • Last activity: Apr 27, 2025, 10:04 PM

1 votes

1 answers

374 views

Howto do a hard upgrade from postgis-1.5 to postgis-2.0

postgis

I need to upgrade from postgis 1.5 to postgis 2.0 but i cant backup my data and restore it can someone give me a link where it is clear enough for doing this?

                                  I need to upgrade from postgis 1.5 to postgis 2.0 but i cant backup my data and restore it can someone give me a link where it is clear enough for doing this?
                                

USER123

May 5, 2014, 04:19 PM • Last activity: Apr 24, 2025, 07:07 AM

4 votes

1 answers

804 views

Unable to send WHERE parameters to subquery

postgresql subquery postgis

I have the following problem (in PostgreSQL in combination with PostGIS): I have the following nice query who clusters multiple records. Although it uses PostGIS function, my problem is not PostGIS related, so bear with me ;) SELECT count(*), ST_Centroid(ST_Collect("geom")) AS "geom" FROM ( SELECT k...

                                  I have the following problem (in PostgreSQL in combination with PostGIS):



I have the following nice query who clusters multiple records. Although it uses PostGIS function, my problem is not PostGIS related, so bear with me ;)

    SELECT count(*), ST_Centroid(ST_Collect("geom")) AS "geom"
    FROM (
        SELECT kmeans(ARRAY[ST_X("geom"), ST_Y("geom")], 5) OVER (), "geom"
        FROM "table"
        WHERE "geom" && ST_GeomFromText("Bounding box") --It's this line
    ) AS ksub
    GROUP BY kmeans
    ORDER BY kmeans;

Where "Bounding box" is a polygon.

This gives me a result. So far so good.

I use Geoserver which retrieves the data from the database based on a bounding box. It does so by requesting all the records using the following query:

    SELECT "count", "geom" FROM "table" WHERE "geom" && ST_GeomFromText("Bounding Box");

Where "Bounding Box" is again a polygon.

I'm not able to change the structure of this query.
The reason I need Geoserver is because it will convert the results to a different format (GeoJSON).

Is it possible to write a view that sends the WHERE parameters (or the value "Bounding box" to the subquery? It's not possible to leave the WHERE statement outside of the subquery because the subquery has to calculate on the bounding box.
If you put the WHERE statement outside of the subquery, the subquery will calculate the clusters for all the data and then the WHERE statement will limit it to the bounding box. This is not what I want.
Or perhaps somebody knows how to rewrite the query so the WHERE statement from Geoserver will succeed?
                                

kwarts (49 rep)

Mar 24, 2014, 12:22 PM • Last activity: Apr 22, 2025, 01:06 PM

0 votes

1 answers

542 views

How to make ST_DWithin to use meters for distance in Postgis?

postgis

I have already asked a similar question here: https://dba.stackexchange.com/questions/182839/find-the-nearest-geo-points-across-two-tables-with-postgis-or-without but now I am stuck around the correct geo-reference to calculate the distance in ST_DWithin. What I have did with my tables is creating a...

                                  I have already asked a similar question here:

https://dba.stackexchange.com/questions/182839/find-the-nearest-geo-points-across-two-tables-with-postgis-or-without 

but now I am stuck around the correct geo-reference to calculate the distance in ST_DWithin. What I have did with my tables is creating a column geom in this way:

	ALTER TABLE my_table ADD COLUMN geom geometry(Point, 4326);
	update my_table set geom = ST_SetSRID(ST_MakePoint(longitude, latitude), 4326);
	CREATE INDEX my_table_geom_idx ON my_table USING gist(geom);

Using this setup will take the function ST_DWithin to use as distance parameter "degrees" instead of "meters".
How can I set the geom to work with meters? Is this the right setup?

	ALTER TABLE my_table ADD COLUMN geom geometry(Point, 2163);
	update my_table set geom = ST_SetSRID(ST_MakePoint(longitude, latitude), 2163);
	CREATE INDEX my_table_geom_idx ON my_table USING gist(geom);

                                

Randomize (1203 rep)

Aug 25, 2017, 07:54 AM • Last activity: Apr 14, 2025, 04:02 PM

0 votes

2 answers

858 views

Creating a specific index tree structure from keys that have the tree-structure baked in? Indexing H3 Hexes

postgresql index postgresql-performance postgis

**Preface: I am NOT a DBA, please be gentle on my terminology and explanations.** I have data that already has a structure baked into it and need to create an index in my DB that represents the hierarchical relationships in this data. These are actually indexes from Uber H3 (https://h3geo.org/docs/c...

                                  **Preface: I am NOT a DBA, please be gentle on my terminology and explanations.**

I have data that already has a structure baked into it and need to create an index in my DB that represents the hierarchical relationships in this data. These are actually indexes from Uber H3 (https://h3geo.org/docs/core-library/h3indexing) . Each index is a 64-bit value, and a hierarchical relationship already exists within this index, something I wish to take advantage of.

This data describes individual hexagons in the H3 Hierarchal Geospatial Indexing System . Each hexagon has a unique index that describes itself and all 'parent' hexagons up to the base-cells. Since each hexagon is subdivided into 7 more, and each of those into 7 more, this 64-bit index already has a tree-structure baked into it.

## The Index:

The H3 index (described here ) describes a hierarchical relationship between hexagons starting with the 122 base cells that can then be recursively subdivided into 7 children as visualized here: 

This data pretty much has a tree structure baked into it. The index (8d299d10145543f for example) describes the base-cell index, and the index of each hexagon down to itself. There are 15 resolutions (I'm only using 13), each resolution is the subdivision of the resolution above. 

This exists as bit 20 -> 64, with each resolution having 3-bits with values between 0 -> 6 in the 64-bit index. Bit 13 -> 19 representing the 122 base cells.

**Index Example:**

 * Index Hex: 8d299d10145543f
 * Index Binary: 0000100011010010100110011101000100000001010001010101010000111111
 * Index Hierarchy: 0010100 -> 110 -> 011 -> 101 -> 000 -> 100 -> 000 -> 001 -> 010 -> 001 -> 010 -> 101 -> 010 -> 000

As you can see that the 64-bit index already has a tree-structure in it. I wish to utilize this in the DB index.

## I will use this how?

I have a need to store lots (billions) of cells at a specific resolution (13). The primary key being the 64-bit H3 index. This will be read-heavy, write speed isn't a concern. I imagine that I'll have partitions for each of the 122 base cells, though that is a best guess on my part...

**Writing**

Writing would just be adding a few hundred thousand rows at a time. I'm not to concerned with write speed, for my read-heavy use case I could even do a once-a-day bulk write.

**Reading**

Reading is where things get  bit trickier. I will have all the keys I need beforehand, say 100k of them, and just need to retrieve those 100k rows. This gets somewhat messy (Imagine a 100k value IN() query...). Thankfully H3 provides a way for me to 'compress' this mass of individual indexes into their ideal parents (So I get an array of indexes at various resolutions that when unpacked at my specific resolution (13) I get the same 100k indexes back). This would leave me ~10k indexes at varying resolution levels to retrieve.

I would then want to retrieve all rows that have these specific parents (regardless of res), which is already represented by the 3-bit res levels as shown in the examples above. For instance retrieving every row that falls under the 0010100 -> 110 -> 011 -> 101 -> 000 -> 100 node.

My objective is to get this data for these indexes back as fast as possible, ideally several times faster than it would take me to re-generate it from scratch (~600ms to generate ~150k cells and their data)

As for DBMS, it will probably be PostgresSQL. I'm familiar with MySQL & MS SQL Server, but the later has prohibitive licensing costs and the former seems like a poor fit vs postgres. Please formulate answers with that in mind, **though if you have opinions on a more appropriate technology, I'm all ears.**

I do understand that the memory requirements for this will get quite steep... However, I'm willing to chew hundreds of gigs of RAM to get the read speeds I need. Right now I compute all the data on-demand, and my computational needs are way too high, my aim is to reduce that by storing the computed data in a DB and retrieving it instead.

**My Questions in order of importance:**
 1. Are my performance expectations reasonable?
 2. How can I create an index that can efficiently retrieve data following the structure that already exists in the H3 index? 
 3. How do I target specific tree-levels to retrieve all rows that fall under that node in a query?
 4. How do I handle queries that need to reference tens of thousands of keys?

Douglas Gaskell (131 rep)

Feb 6, 2021, 11:02 PM • Last activity: Apr 4, 2025, 05:03 AM

2 votes

1 answers

662 views

Installing PostGIS SFCGAL 1.4 extension in the latest official PostGIS docker image (Debian bullseye based)

postgresql postgis docker

I do need to use the [PostGIS SFCGAL extension][1] in its latest version (i.e. `1.4.1` at the time of writing). This extension is based (and needs) the `libsfcgal` system library. Because the official PostGIS docker image available at https://registry.hub.docker.com/r/postgis/postgis/ ([`postgis/pos...

I do need to use the PostGIS SFCGAL extension in its latest version (i.e. 1.4.1 at the time of writing). This extension is based (and needs) the libsfcgal system library. Because the official PostGIS docker image available at https://registry.hub.docker.com/r/postgis/postgis/ (postgis/postgis:15-master ) is based on Debian 11 (bullseye), libsfcgal version is 1.3.9-2: https://packages.debian.org/bullseye/libsfcgal1 and so does the PostGIS extension:

-- Debian 11 (latest official PG15/PostGIS3.3 docker hub image)
SELECT postgis_full_version();

POSTGIS="3.4.0dev 3.3.0rc2-335-gae95dd219"
  [EXTENSION]
    PGSQL="150"
    GEOS="3.12.0dev-CAPI-1.18.0"
    SFCGAL="1.3.10"
    PROJ="9.2.0"
    LIBXML="2.9.10"
    LIBJSON="0.15"
    LIBPROTOBUF="1.3.3"
    WAGYU="0.5.0 (Internal)"
    TOPOLOGY

Well, actually 1.3.10, but the libsfcgal1 lib is currently not installed in the PostGIS Docker image, don't ask me why, neither where SFCGAL actually comes from... it's like that. Whereas, e.g on Ubuntu 22.04.1:

# Ubuntu 22.04
POSTGIS="3.3.2 4975da8"
  [EXTENSION] 
    PGSQL="150"
    GEOS="3.10.2-CAPI-1.16.0"
    SFCGAL="SFCGAL 1.4.1, CGAL 5.3.1, BOOST 1.74.0"
    PROJ="8.2.1"
    LIBXML="2.9.13"
    LIBJSON="0.15"
    LIBPROTOBUF="1.3.3"
    WAGYU="0.5.0 (Internal)"
    TOPOLOGY

(LIBXML is also in an older state in the PostGIS Docker image than on Ubuntu. These two libs are the only one been older in the Docker image; all the other are more recent than on Ubuntu) On Debian (because PostGIS docker images are based on Debian), the version 1.4.1 is actually available in sid, aka testing: https://packages.debian.org/sid/libsfcgal1 I've tried to installed libsfcgal1 using the testing apt repository, but then I face some troubles, becase apparently, installing this new libsfcgal1 version also reset locales to some empty values, therefore, PostgreSQL cannot initialize a new database, neither connect properly to it:

# psql -U postgres

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LANG = "en_US.utf8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
psql (15.1 (Debian 15.1-1.pgdg110+1))
Type "help" for help.

This is blocking when spinning up a container based on a Dockerfile as, e.g.:

FROM postgis/postgis:15-master

RUN echo 'deb http://deb.debian.org/debian  testing main' >> /etc/apt/sources.list \
  && apt-get -y update \
  && DEBIAN_FRONTEND=noninteractive apt-get -y install \
  --no-install-recommends libsfcgal1

Then when starting a vanilla Compose file based on this Dockerimage:

$ docker compose up 

[+] Running 3/0
 ⠿ Network project_lan_access  Created                                                                                                                                                                        0.0s
 ⠿ Volume "project_pgdata"     Created                                                                                                                                                                        0.0s
 ⠿ Container project-db-1      Created                                                                                                                                                                        0.0s
Attaching to project-db-1
project-db-1  | The files belonging to this database system will be owned by user "postgres".
project-db-1  | This user must also own the server process.
project-db-1  | 
project-db-1  | initdb: error: invalid locale settings; check LANG and LC_* environment variables
project-db-1 exited with code 1
project-db-1  | The files belonging to this database system will be owned by user "postgres".
project-db-1  | This user must also own the server process.
project-db-1  | 
project-db-1  | initdb: error: invalid locale settings; check LANG and LC_* environment variables
project-db-1 exited with code 1
project-db-1  | initdb: error: invalid locale settings; check LANG and LC_* environment variables
project-db-1  | The files belonging to this database system will be owned by user "postgres".
project-db-1  | This user must also own the server process.
project-db-1  | 
project-db-1 exited with code 1
^CGracefully stopping... (press Ctrl+C again to force)
[+] Running 0/1
 ⠙ Container project-db-1  Stopping                                                                                                                                                                           0.2s
canceled

(which I have to manually stop because it's looping for ever...) #### EDIT By trying this Dockerfile:

FROM postgis/postgis:15-master

ENV LANGUAGE=en_US.UTF-8 # > /etc/apt/sources.list \
  && apt-get -y update \
  && DEBIAN_FRONTEND=noninteractive apt-get -y install \
  --no-install-recommends libsfcgal1

I have two new warnings (setlocale: LC_ALL: cannot change locale (en_US.UTF-8)) prior to the same error loop than before when spinning up the container:

$ docker compose up
[+] Running 2/2
 ⠿ Network project_lan_access  Created                                                                                                                                                                        0.0s
 ⠿ Container project-db-1      Created                                                                                                                                                                        0.1s
Attaching to project-db-1
project-db-1  | bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
project-db-1  | bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
project-db-1  | The files belonging to this database system will be owned by user "postgres".
project-db-1  | This user must also own the server process.
project-db-1  | 
project-db-1  | initdb: error: invalid locale settings; check LANG and LC_* environment variables
project-db-1 exited with code 1
project-db-1  | bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
project-db-1  | The files belonging to this database system will be owned by user "postgres".
project-db-1  | This user must also own the server process.
project-db-1  | 
project-db-1  | initdb: error: invalid locale settings; check LANG and LC_* environment variables
project-db-1 exited with code 1
project-db-1  | initdb: error: invalid locale settings; check LANG and LC_* environment variables
project-db-1  | The files belonging to this database system will be owned by user "postgres".
project-db-1  | This user must also own the server process.
project-db-1  | 
project-db-1 exited with code 1
^CGracefully stopping... (press Ctrl+C again to force)
[+] Running 1/0
 ⠿ Container project-db-1  Stopped                                                                                                                                                                            0.0s
canceled

Therefore, my question is: how to properly install the SFCGAL 1.4.1 extension (as it is available on Ubuntu 22.04 for example) in the latest PostGIS docker image?

s.k (424 rep)

Dec 23, 2022, 11:31 AM • Last activity: Mar 12, 2025, 12:06 PM

1 votes

1 answers

751 views

postgresql statement taking ages to cancel in pgAdmin III

postgresql pgadmin postgis

I was running a query in pgAdmin III and it was taking a long time. After 9 minutes I decided to cancel it using the stop button. The UI stated that it was cancelling and stayed in that state for slightly over an hour before it reported as cancelled. For all that time the CPU was around 98-99%. The...

                                  I was running a query in pgAdmin III and it was taking a long time. After 9 minutes I decided to cancel it using the stop button. The UI stated that it was cancelling and stayed in that state for slightly over an hour before it reported as cancelled. For all that time the CPU was around 98-99%.

The statement that was running was a call to a function. It was only part way through the function when it got stuck and all the time was taken on the following statement:

	SELECT ST_UNION(shape)
	FROM descendant_areas
	INTO best_fit_areas;

where descendant_areas is a temporary table with no indexes containing nine records. Shape is of type geography.

Versions:

PostgreSQL=PostgreSQL 9.5.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit

PostGIS=POSTGIS="2.2.2 r14797" GEOS="3.5.0-CAPI-1.9.0 r4084" PROJ="Rel. 4.9.2, 08 September 2015" GDAL="GDAL 2.0.2, released 2016/01/26" LIBXML="2.9.1" LIBJSON="0.12" TOPOLOGY RASTER

My question is: 

Is it normal for cancelling a query to take that long or is it indicative that there are config settings set up badly?

Andy Nichols (111 rep)

Jul 1, 2016, 11:21 AM • Last activity: Feb 17, 2025, 03:06 AM

0 votes

1 answers

408 views

How to move a PostgresSQL/ PostGIS to another drive

postgresql postgis

Firstly I'd like to say I know little about databases (GIS background) and I might have bluffed it all so far. But we wanted to set up a database for use with QGIS. My understanding was that the PostgresSQL made the server and PostGIS add on made the databases on that server. I'm not operating with...

                                  Firstly I'd like to say I know little about databases (GIS background) and I might have bluffed it all so far. 
But we wanted to set up a database for use with QGIS.
My understanding was that the PostgresSQL made the server and PostGIS add on made the databases on that server.
I'm not operating with a network or dedicated servers I have a C drive and a Q drive. We installed PostgresSQL (servers?) and PostGIS (database?) on my Q drive where I created and manipulated all my data.
But now we want to move it all to a portable drive (E) so that others can access the database by pluging into the drive is this possible? 

and if so how does someone who does not understand databases nor SQL do it?

Cheers
matthew

                                

Matt D (1 rep)

Jul 1, 2019, 11:23 PM • Last activity: Feb 10, 2025, 05:03 PM

0 votes

1 answers

4529 views

Are there limitations on using pg_dump on a materialized view such that the data won't be included in the results?

postgresql postgis materialized-view pg-dump geometry

I have a database, call it **maps**, which contains 2 schemas, **a** & **b**. In each schema I have a number of tables: **a.t1**, **a.t2**, **b.t1**, **b.t2** (plus others). The column sets on each of these tables is different, but there are a number of columns in common. I have defined a materialis...

                                  I have a database, call it **maps**, which contains 2 schemas, **a** & **b**. In each schema I have a number of tables: **a.t1**, **a.t2**, **b.t1**, **b.t2** (plus others). The column sets on each of these tables is different, but there are a number of columns in common.

I have defined a materialised view, **a.mv**, which brings in the common columns of the 4 tables listed, including a geometry column (which represents a geographic outline).

I want to backup the current contents of the view so that I can restore it on another server. To do this, I use this pg_dump command:

    pg_dump -h hostname -U username -t a.mv -f mv.sql maps

What I get as a result is the SQL to define the table, but no data. The view definitely has data in it, because I can select from it in PgAdmin (it was created multiple days ago and the underlying tables haven't been changed since)

I can dump the underlying tables, including data, with (eg)

    pg_dump -h hostname -U username -t a.t1 -f t1.sql maps

but not the view. From the limited matches I've found with googling this, what I'm trying should work, but I'm wondering if the presence of a geometry column in the dump is causing the issue (or this might be a complete red herring). FWIW, the total data in the view is fairly substantial - probably around 1GB. However, I've dumped all the underlying tables in schema **a** successfully (including the 2 tables referenced by the view, and others), which was larger (1.5GB)

Any ideas what the issue could be here? On the face of it, what I'm trying to do should work, but just doesn't and with no indicated errors.

khafka (79 rep)

Feb 6, 2020, 11:36 AM • Last activity: Jan 8, 2025, 01:03 PM

0 votes

1 answers

69 views

Postgres slow query on large tables when doing a search radius search and joining on large tables with indices

postgresql performance query-performance postgis

I've been experiencing extremely slow queries despite having indices (hopefully the correct indices) on my tables. What I'm doing is searching for providers within a search radius as well as their associated insurances. Each table has 10 to 100s of millions of rows. Query (took over 8 minutes!!). Af...

explain (analyze, buffers) SELECT
    p.id,
    p.npi,
    p.pqq_composite_score,
    jsonb_agg(pl.location_details)
    FROM provider.provider_location pl
    JOIN provider p on pl.provider_id = p.id
    JOIN provider_location_insurance pli ON pl.provider_id = pli.provider_id AND pl.location_id = pli.location_id
WHERE pli.insurance_id = 'c7d1b6aa-cb9a-44b9-8069-ce5a4476fdfa'
  AND pli.carrier_brand = 'some_brand_name'
  AND pl.confidence >= 3
  AND ST_DWithin(pl.geom, st_setsrid(ST_MakePoint(-87.6291, 41.8781), 4326), 1609.354 * 50)
group by p.id
limit 50

That query took about ~8.5 minutes

Limit  (cost=9626986.00..9626986.02 rows=1 width=67) (actual time=504568.135..504568.486 rows=50 loops=1)
"  Buffers: shared hit=10843973 read=3858153 dirtied=1838660 written=315789, temp read=32052 written=75066"
  I/O Timings: read=434866.551 write=3001.677
  ->  GroupAggregate  (cost=9626986.00..9626986.02 rows=1 width=67) (actual time=504568.134..504568.480 rows=50 loops=1)
        Group Key: p.id
"        Buffers: shared hit=10843973 read=3858153 dirtied=1838660 written=315789, temp read=32052 written=75066"
        I/O Timings: read=434866.551 write=3001.677
        ->  Sort  (cost=9626986.00..9626986.01 rows=1 width=588) (actual time=504568.088..504568.104 rows=63 loops=1)
              Sort Key: p.id
              Sort Method: external merge  Disk: 600504kB
"              Buffers: shared hit=10843973 read=3858153 dirtied=1838660 written=315789, temp read=32052 written=75066"
              I/O Timings: read=434866.551 write=3001.677
              ->  Nested Loop  (cost=20026.46..9626985.99 rows=1 width=588) (actual time=2945.990..501666.304 rows=895360 loops=1)
                    Join Filter: (pl.provider_id = p.id)
                    Buffers: shared hit=10843970 read=3858153 dirtied=1838660 written=315789
                    I/O Timings: read=434866.551 write=3001.677
                    ->  Nested Loop  (cost=20026.03..9626980.39 rows=1 width=585) (actual time=2945.398..404746.262 rows=895360 loops=1)
                          Buffers: shared hit=7739177 read=3333012 dirtied=1838660 written=273897
                          I/O Timings: read=347264.324 write=2578.856
"                          ->  Bitmap Heap Scan on ""provider_location_insurance_some_brand_name"" pli  (cost=20025.45..3845940.02 rows=1833404 width=32) (actual time=2941.360..115985.754 rows=1846415 loops=1)"
                                Recheck Cond: (insurance_id = 'c7d1b6aa-cb9a-44b9-8069-ce5a4476fdfa'::uuid)
                                Filter: (carrier_brand = 'some_brand_name'::text)
                                Heap Blocks: exact=1838660
                                Buffers: shared read=1840114 dirtied=1838660 written=154012
                                I/O Timings: read=86001.337 write=1467.246
"                                ->  Bitmap Index Scan on ""provider_location_insurance_Aetna_insurance_id_idx""  (cost=0.00..19567.10 rows=1833404 width=0) (actual time=2320.226..2320.226 rows=1846415 loops=1)"
                                      Index Cond: (insurance_id = 'c7d1b6aa-cb9a-44b9-8069-ce5a4476fdfa'::uuid)
                                      Buffers: shared read=1454
                                      I/O Timings: read=1858.023
                          ->  Memoize  (cost=0.58..31.29 rows=1 width=585) (actual time=0.155..0.155 rows=0 loops=1846415)
"                                Cache Key: pli.provider_id, pli.location_id"
                                Cache Mode: logical
                                Hits: 0  Misses: 1846415  Evictions: 1248029  Overflows: 0  Memory Usage: 256001kB
                                Buffers: shared hit=7739177 read=1492898 written=119885
                                I/O Timings: read=261262.987 write=1111.610
                                ->  Index Scan using provider_location_provider_id_location_id_key on provider_location pl  (cost=0.57..31.28 rows=1 width=585) (actual time=0.153..0.153 rows=0 loops=1846415)
                                      Index Cond: ((provider_id = pli.provider_id) AND (location_id = pli.location_id))
"                                      Filter: ((confidence >= 3) AND st_dwithin(geom, '0101000020E6100000787AA52C43E855C00E4FAF9465F04440'::geometry, '80467.7'::double precision))"
                                      Rows Removed by Filter: 1
                                      Buffers: shared hit=7739177 read=1492898 written=119885
                                      I/O Timings: read=261262.987 write=1111.610
                    ->  Index Scan using provider_id_idx on provider p  (cost=0.43..5.59 rows=1 width=35) (actual time=0.106..0.106 rows=1 loops=895360)
                          Index Cond: (id = pli.provider_id)
                          Buffers: shared hit=3104793 read=525141 written=41892
                          I/O Timings: read=87602.227 write=422.821
Planning:
  Buffers: shared hit=1049 read=43 dirtied=4
  I/O Timings: read=16.139
Planning Time: 21.215 ms
Execution Time: 504654.267 ms

Provider Location table definition which contains ~75 million rows:

Table "provider.provider_location"
         Column         |           Type           | Collation | Nullable |                                     Default
------------------------+--------------------------+-----------+----------+----------------------------------------------------------------------------------
 id                     | uuid                     |           | not null | gen_random_uuid()
 provider_id            | uuid                     |           | not null |
 location_id            | uuid                     |           | not null |
 created_at             | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
 updated_at             | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
 confidence             | integer                  |           |          |
 latitude               | double precision         |           |          |
 longitude              | double precision         |           |          |
 location_details       | jsonb                    |           |          |
 geom                   | geometry                 |           |          | generated always as (st_setsrid(st_makepoint(longitude, latitude), 4326)) stored
Indexes:
    "provider_location_pkey" PRIMARY KEY, btree (id)
    "provider_location_location_id_idx" hash (location_id)
    "provider_location_provider_id_idx" hash (provider_id)
    "provider_location_provider_id_location_id_key" UNIQUE CONSTRAINT, btree (provider_id, location_id
)
    "idx_provider_confidence_geom_idx" gist (geom, confidence)
    "idx_provider_location_geom" gist (geom) CLUSTER
    "provider_provider_location_confidence" btree (confidence)

And the Provider Location insurance table contains ~950 million rows. That table is partitioned by carrier_brand (a total of 135 partitions)

Partitioned table "provider.provider_location_insurance"
    Column     |           Type           | Collation | Nullable |      Default
---------------+--------------------------+-----------+----------+-------------------
 provider_id   | uuid                     |           | not null |
 location_id   | uuid                     |           | not null |
 insurance_id  | uuid                     |           | not null |
 carrier_brand | text                     |           | not null |
 created_at    | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
 updated_at    | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
Partition key: LIST (carrier_brand)
Indexes:
    "provider_location_insurance_pk" PRIMARY KEY, btree (provider_id, location_id, insurance_id, carrier_brand)
    "idx_provider_location_insurance_carrier" btree (carrier_brand)
    "idx_provider_location_insurance_insurance_id" btree (insurance_id)
    "idx_provider_location_insurance_location_id" btree (location_id)
    "idx_provider_location_insurance_provider_id" btree (provider_id)

My postgres instance has the following configs:

PostgreSQL version: 14
vCPU: 4
Memory 32GB

shared_buffers="25684MB"
work_mem="250MB"
max_parallel_workers_per_gather=2
effective_cache_size="26300784kB"

What am I doing wrong here? Do I need to create a separate "search" table that would avoid all of the joins on these big tables and then have the correct indices on each column(s)? From the plan it looks like the indices are being used. I've also CLUSTERED the geometry column to see if that would make a different but I'm starting to run out of ideas.

blin (3 rep)

Dec 6, 2024, 09:23 PM • Last activity: Dec 7, 2024, 03:14 AM

0 votes

1 answers

502 views

Why does PostGIS create an ugly table in my database called "spatial_ref_sys"?

postgresql postgis

I need to use PostGIS, and do use it, but it bothers me that it created this ugly "spatial_ref_sys" table in my database, in the main (public) schema, where I keep all my own tables. It looks ugly. It also has added a trillion "functions", which is not a big problem to me since I don't have any "fun...

                                  I need to use PostGIS, and do use it, but it bothers me that it created this ugly "spatial_ref_sys" table in my database, in the main (public) schema, where I keep all my own tables. It looks ugly.

It also has added a trillion "functions", which is not a big problem to me since I don't have any "functions" of my own, or use them, but the same principle stands for this as well.

Why, given that it's an extension/"PG library" of sorts, doesn't it at least use its own schema, as to separate itself from my own stuff? I assume that there is a very good reason, but I can't figure it out.

Ozro Madison (1 rep)

Feb 22, 2020, 10:56 PM • Last activity: Oct 28, 2024, 03:07 AM

Showing page 1 of 20 total questions