I'm running a large query that for various reasons, I've broken into a series of smaller queries. The example below is just to show how the query is broken up by id; the actual query is much more complex:
UPDATE target_table SET col1_target = col1_source FROM source_table WHERE id >= 0 AND id = 10 AND id = 20 AND id =', num, 'AND id <', num2, sep = "")
dbSendQuery(con, q1)
dbDisconnect(con)
}
lapply(num, query.func)
Rather than having this long sequence of queries run one after another, I wanted to try and take advantage of my server's multiple CPU's and run them in parallel. Using the R library
parallel
I've created a cluster and sent multiple queries to the cluster simultaneously:
no_cores <- detectCores() - 1
cl <- makeCluster(no_cores)
clusterExport(cl, "query.func")
clusterExport(cl, "num")
clusterEvalQ(cl, library(RPostgreSQL))
parLapply(cl, num, query.func)
stopCluster()
Where query.func
is defined as above.
When I run this R script at the terminal I receive no errors and if I check pg_stat_activity
I see a list of active queries, each incrementally operating on a separate block of data. However, when I use top to check the resource management on my system, I still only see one postgres process. When I look at the CPU usage, I only see one CPU active at a time.
Are these queries really running simultaneously on separate CPU's? My understanding was that, prior for 9.6, a single postgres query could not be split across multiple cores but each connection could utilize a separate core (related question) . Does the process I've outlined above open multiple connections and send a query to the database via each connection?
PostgreSQL 9.3/ Ubuntu 14.04 / R 3.3.2
Asked by Matt
(291 rep)
Jan 9, 2017, 06:20 PM
Last activity: Dec 8, 2021, 08:31 AM
Last activity: Dec 8, 2021, 08:31 AM