Daniel Beer Atom | RSS | About

Low-latency bulk row transfer on a PostgreSQL server

25 Feb 2019

Some applications will require bulk reads and writes of large collections of rows from a table on a PostgreSQL server. If the collection of rows is patternless (not falling into a contiguous subset), it would seem that a separate round-trip query is required for each row. Fortunately, this is not the case, and large operations of this kind can be done with a small fixed number of round-trips.

Suppose we have the following simple table:

CREATE TABLE data (
    key    VARCHAR(128) PRIMARY KEY,
    value  BYTEA
);

We can do efficient bulk operations on it using a combination of temporary tables, and PostgreSQL’s COPY operation. On connection to the database, we create two temporary tables:

CREATE TEMPORARY TABLE __key_buffer (
    key    VARCHAR(128) PRIMARY KEY
);

CREATE TEMPORARY TABLE __write_buffer (
    key    VARCHAR(128) PRIMARY KEY,
    value  BYTEA
);

These tables are private to the connection and will be dropped on disconnect.

For testing and analysis, you may get better results by disabling query planner optimizations:

SET enable_bitmapscan = false;
SET enable_hashagg    = false;
SET enable_hashjoin   = false;
SET enable_material   = false;
SET enable_mergejoin  = false;
SET enable_sort       = false;

This will cause the query planner to stick to naive plans, which should nevertheless be asymptotically efficient.

You may also need to run VACUUM ANALYZE at least once after putting some data into the table, just so the query planner knows that it’s bigger than the temporary tables. Otherwise, the planner ends up scanning the big table and doing B-Tree lookups on the small one, rather than vice versa.

Bulk read

To perform a bulk read, we fill the key buffer with a list of keys we want to fetch data for, and then perform an inner join on the data table we want to read from:

TRUNCATE __key_buffer;
COPY __key_buffer (key) FROM STDIN WITH (FORMAT BINARY)
-- ... upload PGCOPY file ...
-- EOF
SELECT data.key, data.value FROM data
  INNER JOIN __key_buffer ON __key_buffer.key = data.key;

This requires three network round-trips. The final SELECT operation is as efficient as you’d expect. The analysis of the final SELECT with optimization disabled shows:

                            QUERY PLAN
---------------------------------------------------------------------------
 Nested Loop  (cost=0.28..138.38 rows=270 width=4)
   ->  Seq Scan on __key_buffer  (cost=0.00..12.70 rows=270 width=274)
   ->  Index Scan using data_pkey on data  (cost=0.28..0.46 rows=1 width=4)
     Index Cond: ((key)::text = (__key_buffer.key)::text)

This is analogous to the most obvious efficient implementation when doing the same operation on in-memory data structures: for each item in the list of things you want to read, do a logarithmic-time B-Tree lookup.

Bulk delete

To do a bulk deletion, fill the key buffer with a list of keys to delete, and then delete them in a single command:

TRUNCATE __key_buffer;
COPY __key_buffer (key) FROM STDIN WITH (FORMAT BINARY)
-- ... upload PGCOPY file ...
-- EOF
DELETE FROM data USING __key_buffer WHERE data.key = __key_buffer.key;

The naive plan is again what you’d hope for:

                                   QUERY PLAN
----------------------------------------------------------------------------------
 Delete on data  (cost=0.28..138.38 rows=270 width=12)
   ->  Nested Loop  (cost=0.28..138.38 rows=270 width=12)
         ->  Seq Scan on __key_buffer  (cost=0.00..12.70 rows=270 width=280)
         ->  Index Scan using data_pkey on data  (cost=0.28..0.46 rows=1 width=9)
               Index Cond: ((key)::text = (__key_buffer.key)::text)

Bulk write

Bulk inserts, if you know that the key/value pairs you want to insert have no existing duplicates, can be handled by a direct COPY to the table in question.

A bulk “put” operation (where we want to overwrite existing data) is slightly more complicated if you’re using a version of PostgreSQL prior to 9.x:

TRUNCATE __write_buffer;
COPY __write_buffer (key, value) FROM STDIN WITH (FORMAT BINARY)
-- ... upload PGCOPY file ...
-- EOF

BEGIN;
UPDATE data SET value=__write_buffer.value FROM __write_buffer
  WHERE data.key=__write_buffer.key;
DELETE FROM __write_buffer USING data WHERE __write_buffer.key=data.key;
INSERT INTO data (key, value) SELECT key, value FROM __write_buffer;
COMMIT;

For newer versions of PostgreSQL, you could omit the UPDATE and DELETE, and just add an ON CONFLICT clause to the final INSERT operation:

INSERT INTO data (key, value) SELECT key, value AS nv FROM __write_buffer
  ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value

On the other hand, if you know that every key you want to modify already exists in the destination table, you can just omit the DELETE and INSERT operations.

The UPDATE operation performs one B-Tree lookup on the data table for every row in the write buffer, as expected:

                                   QUERY PLAN
----------------------------------------------------------------------------------
 Update on data  (cost=0.28..129.00 rows=240 width=47)
   ->  Nested Loop  (cost=0.28..129.00 rows=240 width=47)
         ->  Seq Scan on __write_buffer  (cost=0.00..12.40 rows=240 width=312)
         ->  Index Scan using data_pkey on data  (cost=0.28..0.48 rows=1 width=9)
               Index Cond: ((key)::text = (__write_buffer.key)::text)

The DELETE operation is similar:

                                   QUERY PLAN
----------------------------------------------------------------------------------
 Delete on __write_buffer  (cost=0.28..129.00 rows=240 width=12)
   ->  Nested Loop  (cost=0.28..129.00 rows=240 width=12)
         ->  Seq Scan on __write_buffer  (cost=0.00..12.40 rows=240 width=280)
         ->  Index Scan using data_pkey on data  (cost=0.28..0.48 rows=1 width=9)
               Index Cond: ((key)::text = (__write_buffer.key)::text)

The query plan for the final INSERT is trivial:

                               QUERY PLAN
-------------------------------------------------------------------------
 Insert on data  (cost=0.00..12.40 rows=240 width=306)
   ->  Seq Scan on __write_buffer  (cost=0.00..12.40 rows=240 width=306)