For the purpose of the test I created following table:
CREATE TABLE measurement
( measurement_id bigint NOT NULL, valid_ts timestamp with time zone NOT NULL, measurement_value numeric(19,4) NOT NULL, CONSTRAINT pk_mv_raw PRIMARY KEY (measurement_id, valid_ts) ) WITH (OIDS=FALSE)
I decided to test the insertion of 1000 records to the table. The data for the recors was generated before running of any of test methods. Four test methods were created to reflect ususal approaches:
- VSI (Very Stupid Inserts) - executing queries made of concatenated Strings one by one
- SPI (Stupid Prepared Inserts) - similar to VSI but using prepared statements
- BPI (Batched Prepared Inserts) - prepared inserts, executed in batches of various length
- CPI (Copy Inserts) - inserts based on COPY FROM, executed in batches of various length
VSI
for (int i=0; i<testSize; i++) { insertSQL = "insert into measurement values (" + measurementIds[i] +",'"+ timestamps[i] +"',"+values[i]+")"; insert.execute(insertSQL); }
SPI
PreparedStatement insert = conn.prepareStatement("insert into measurement values (?,?,?)");
for (int i=0; i<testSize; i++) { insert.setLong(1,measurementIds[i]); insert.setTimestamp(2, timestamps[i]); insert.setBigDecimal(3, values[i]); insert.execute(); }
BPI
PreparedStatement insert = conn.prepareStatement("insert into measurement values (?,?,?)");
for (int i=0; i<testSize; i++) { insert.setLong(1,measurementIds[i]); insert.setTimestamp(2, timestamps[i]); insert.setBigDecimal(3, values[i]); insert.addBatch(); if (i % batchSize == 0) { insert.executeBatch(); } } insert.executeBatch();
CPI
StringBuilder sb = new StringBuilder(); CopyManager cpManager = ((PGConnection)conn).getCopyAPI(); PushbackReader reader = new PushbackReader( new StringReader(""), 10000 ); for (int i=0; i<testSize; i++) { sb.append(measurementIds[i]).append(",'") .append(timestamps[i]).append("',") .append(values[i]).append("\n"); if (i % batchSize == 0) { reader.unread( sb.toString().toCharArray() ); cpManager.copyIn("COPY measurement FROM STDIN WITH CSV", reader ); sb.delete(0,sb.length()); } } reader.unread( sb.toString().toCharArray() ); cpManager.copyIn("COPY measurement FROM STDIN WITH CSV", reader );
I hoped to get some improvements for using COPY FROM instead of batched inserts but not expected no big gain. But the results were a pleasant surprise. For a batch of size 50 (as defined in the original aplication I wanted to improve) the COPY FROM gave 40% improvement. I expect some improvements when data come from a stream and skip the StringBuffer-with-PushbackReader exercise.
See the graphs yourself - the number following the method abbreviation is the size of the batch.
Average time in milliseconds |
All the 200 runs individually |
source code for the "benchmark"
Very useful analysis, thanks a lot!
ReplyDeleteHave a look at https://github.com/eclecticlogic/pedal-dialect. Its a framework that allows the copy command to work directly with JPA entities.
ReplyDeleteVery interesting, thanks!
ReplyDeleteVery interesting, thanks!
ReplyDeleteThank you for this analysis, it saves life!
ReplyDeleteCreating a temporary table as TEMPORARY or even UNLOGGED might lead to a signifficant speedup. It can also be used for further computation/data manipulation with user defined indexing very efficiently.
ReplyDeleteGreat article, thanks!
ReplyDelete