Parallel PostgreSQL Queries with R

1 vote
2 answers
1387 views
                          I'm running a large query that for various reasons, I've broken into a series of  smaller queries. The example below is just to show how the query is broken up by id; the actual query is much more complex: 

    UPDATE target_table SET col1_target = col1_source FROM source_table WHERE id >= 0  AND id = 10 AND id = 20  AND id =', num, 'AND id <', num2, sep = "")

             dbSendQuery(con, q1)

               dbDisconnect(con)
    }  

    lapply(num, query.func)

Rather than having this long sequence of queries run one after another, I wanted to try and take advantage of my server's multiple CPU's and run them in parallel. Using the R library parallel I've created a cluster and sent multiple queries to the cluster simultaneously:
              
    no_cores <- detectCores() - 1

     cl <- makeCluster(no_cores)

      clusterExport(cl, "query.func")
      clusterExport(cl, "num")

      clusterEvalQ(cl, library(RPostgreSQL))
      
        parLapply(cl, num, query.func)

    stopCluster()

Where query.func is defined as above. 

When I run this R script at the terminal I receive no errors and if I check pg_stat_activity I see a list of active queries, each incrementally operating on a separate block of data. However, when I use top to check the resource management on my system, I still only see one postgres process. When I look at the CPU usage, I only see one CPU active at a time. 

Are these queries really running simultaneously on separate CPU's? My understanding was that, prior for 9.6, a single postgres query could not be split across multiple cores but each connection could utilize a separate core (related question) . Does the process I've outlined above open multiple connections and send a query to the database via each connection? 

PostgreSQL 9.3/ Ubuntu 14.04 / R 3.3.2
Asked by Matt (291 rep)
Jan 9, 2017, 06:20 PM
Last activity: Dec 8, 2021, 08:31 AM
Parallel PostgreSQL Queries with R

Related Questions