In the last post of my “Loading data fast” series I showed how DML error logging can be used to prevent batch loads from failing when one or multiple rows can’t be inserted. Now the question is how much performance impact errors have on the mechanism. What I try to prove is that a developer should not just blindly use DML error logging instead of thinking whether the data he intents to insert is primary valid. So let’s have a look:
First I create the same simple table with a primary key on it that I used before:
CREATE TABLE TESTLOADTABLE (id NUMBER, text VARCHAR2(255)); table TESTLOADTABLE created. CREATE UNIQUE INDEX TESTLOADTABLE_PK ON TESTLOADTABLE(id); unique index TESTLOADTABLE_PK created. ALTER TABLE TESTLOADTABLE ADD PRIMARY KEY (id) USING INDEX TESTLOADTABLE_PK; table TESTLOADTABLE altered.
Then I create the error logging table:
BEGIN DBMS_ERRLOG.CREATE_ERROR_LOG('TESTLOADTABLE','ERR_TESTLOADTABLE'); END; anonymous block completed
So far so good, now let’s execute following tests:
1) Load 10k rows with no errors
2) Load 10k rows with 100 (1%) rows failing
3) Load 10k rows with 1000 (10%) rows failing
4) Load 10k rows with 2500 (25%) rows failing
5) Load 10k rows with 5000 (50%) rows failing
6) Load 10k rows with 7500 (75%) rows failing
And just out of curiosity let’s see how long a regular insert with no error logging takes
7) Load 10k rows with no errors and no error logging
Test 1 – no errors:
long startOverall = System.currentTimeMillis(); PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED"); for (int i=0;i<10000;i++) { stmt.setInt(1, i); stmt.setString(2, "test" + i); stmt.addBatch(); } long startExecute = System.currentTimeMillis(); stmt.executeBatch(); long endExecute = System.currentTimeMillis(); conn.commit(); long endOverall = System.currentTimeMillis(); System.out.println("Executing batch took: " + (endExecute-startExecute)); System.out.println("Executing overall took: " + (endOverall-startOverall)); System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute))); Executing batch took: 861 Executing overall took: 1020 Preparation took: 159
So overall it took 1020 milli seconds while the job spent 861 milli seconds in the execution of the insert.
Test 2 – 100 errors (1%):
PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)"); for (int i=0;i<10000;i++) { if (i%100 == 0) { stmt0.setInt(1, i); stmt0.setString(2, "test" + i); stmt0.addBatch(); } } stmt0.executeBatch(); conn.commit(); long startOverall = System.currentTimeMillis(); PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED"); for (int i=0;i<10000;i++) { stmt.setInt(1, i); stmt.setString(2, "test" + i); stmt.addBatch(); } long startExecute = System.currentTimeMillis(); stmt.executeBatch(); long endExecute = System.currentTimeMillis(); conn.commit(); long endOverall = System.currentTimeMillis(); System.out.println("Executing batch took: " + (endExecute-startExecute)); System.out.println("Executing overall took: " + (endOverall-startOverall)); System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute))); Executing batch took: 1017 Executing overall took: 1069 Preparation took: 52
This time it took quite a bit long to execute the batch, 1017 milli seconds. The reason for this is obvious. Oracle now has not only to insert 10k rows but also to reject rows and insert them into another table.
Test 3 – 1000 (10%) errors:
PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)"); for (int i=0;i<10000;i++) { if (i%10 == 0) { stmt0.setInt(1, i); stmt0.setString(2, "test" + i); stmt0.addBatch(); } } stmt0.executeBatch(); conn.commit(); long startOverall = System.currentTimeMillis(); PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED"); for (int i=0;i<10000;i++) { stmt.setInt(1, i); stmt.setString(2, "test" + i); stmt.addBatch(); } long startExecute = System.currentTimeMillis(); stmt.executeBatch(); long endExecute = System.currentTimeMillis(); conn.commit(); long endOverall = System.currentTimeMillis(); System.out.println("Executing batch took: " + (endExecute-startExecute)); System.out.println("Executing overall took: " + (endOverall-startOverall)); System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute))); Executing batch took: 1420 Executing overall took: 1470 Preparation took: 50
And as you can see the more rows that are failing, the longer the insert takes. So let’s have a look at the rest of the tests and see how bad it gets:
Test 4 – 2500 (25%) errors:
PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)"); for (int i=0;i<10000;i++) { if (i%4 == 0) { stmt0.setInt(1, i); stmt0.setString(2, "test" + i); stmt0.addBatch(); } } stmt0.executeBatch(); conn.commit(); long startOverall = System.currentTimeMillis(); PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED"); for (int i=0;i<10000;i++) { stmt.setInt(1, i); stmt.setString(2, "test" + i); stmt.addBatch(); } long startExecute = System.currentTimeMillis(); stmt.executeBatch(); long endExecute = System.currentTimeMillis(); conn.commit(); long endOverall = System.currentTimeMillis(); System.out.println("Executing batch took: " + (endExecute-startExecute)); System.out.println("Executing overall took: " + (endOverall-startOverall)); System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute))); Executing batch took: 1877 Executing overall took: 1961 Preparation took: 84
Test 5 – 5000 (50%) errors:
PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)"); for (int i=0;i<10000;i++) { if (i%2 == 0) { stmt0.setInt(1, i); stmt0.setString(2, "test" + i); stmt0.addBatch(); } } stmt0.executeBatch(); conn.commit(); long startOverall = System.currentTimeMillis(); PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED"); for (int i=0;i<10000;i++) { stmt.setInt(1, i); stmt.setString(2, "test" + i); stmt.addBatch(); } long startExecute = System.currentTimeMillis(); stmt.executeBatch(); long endExecute = System.currentTimeMillis(); conn.commit(); long endOverall = System.currentTimeMillis(); System.out.println("Executing batch took: " + (endExecute-startExecute)); System.out.println("Executing overall took: " + (endOverall-startOverall)); System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute))); Executing batch took: 2680 Executing overall took: 2765 Preparation took: 85
Test 6 – 7500 (75%) errors:
PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)"); for (int i=0;i<10000;i++) { if (i<=7500) { stmt0.setInt(1, i); stmt0.setString(2, "test" + i); stmt0.addBatch(); } } stmt0.executeBatch(); conn.commit(); long startOverall = System.currentTimeMillis(); PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED"); for (int i=0;i<10000;i++) { stmt.setInt(1, i); stmt.setString(2, "test" + i); stmt.addBatch(); } long startExecute = System.currentTimeMillis(); stmt.executeBatch(); long endExecute = System.currentTimeMillis(); conn.commit(); long endOverall = System.currentTimeMillis(); System.out.println("Executing batch took: " + (endExecute-startExecute)); System.out.println("Executing overall took: " + (endOverall-startOverall)); System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute))); Executing batch took: 3349 Executing overall took: 3412 Preparation took: 63
So as you can see, the more errors you get, the longer your batch will need to execute. This is only logical of course as the more errors you get, the more exceptions are thrown which then lead to more rows inserted into the error table. The main takeaway is that your insert is not failing and you can smoothly query the error logging table and act appropriately. Just in comparison:
- Insert no errors: 861ms
- Insert 1% errors: 1017ms
- Insert 10% errors: 1420ms
- Insert 25% errors: 1877ms
- Insert 50% errors: 2680ms
- Insert 75% errors: 3349ms
Now let’s do one more test and see how fast a regular insert without DML error logging is. You would think it’s the same but during my experiments if found an interesting fact:
Test 7 – no errors, no error logging:
long startOverall = System.currentTimeMillis(); PreparedStatement stmt = conn.prepareStatement("INSERT /* regular bulk insert */ INTO testloadtable (id, text) VALUES (?,?)"); for (int i=0;i<10000;i++) { stmt.setInt(1, i); stmt.setString(2, "test" + i); stmt.addBatch(); } long startExecute = System.currentTimeMillis(); stmt.executeBatch(); long endExecute = System.currentTimeMillis(); conn.commit(); long endOverall = System.currentTimeMillis(); System.out.println("Executing batch took: " + (endExecute-startExecute)); System.out.println("Executing overall took: " + (endOverall-startOverall)); System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute))); Executing batch took: 212 Executing overall took: 372 Preparation took: 160
So a regular insert takes only 212ms while a DML error logging insert takes 861ms. That’s 4 times longer!
The reason for this is because of the unpublished bug 11865420 (My Oracle Support Doc Id: 11865420.8). Once you download the patch and update the system accordingly the insert with DML error logging is just as fast as without:
Regular batch insert:
long startOverall = System.currentTimeMillis(); PreparedStatement stmt = conn.prepareStatement("INSERT /* regular bulk insert */ INTO testloadtable (id, text) VALUES (?,?)"); for (int i=0;i<10000;i++) { stmt.setInt(1, i); stmt.setString(2, "test" + i); stmt.addBatch(); } long startExecute = System.currentTimeMillis(); stmt.executeBatch(); long endExecute = System.currentTimeMillis(); conn.commit(); long endOverall = System.currentTimeMillis(); System.out.println("Executing batch took: " + (endExecute-startExecute)); System.out.println("Executing overall took: " + (endOverall-startOverall)); System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute))); Executing batch took: 248 Executing overall took: 399 Preparation took: 151
DML error logging insert:
long startOverall = System.currentTimeMillis(); PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED"); for (int i=0;i<10000;i++) { stmt.setInt(1, i); stmt.setString(2, "test" + i); stmt.addBatch(); } long startExecute = System.currentTimeMillis(); stmt.executeBatch(); long endExecute = System.currentTimeMillis(); conn.commit(); long endOverall = System.currentTimeMillis(); System.out.println("Executing batch took: " + (endExecute-startExecute)); System.out.println("Executing overall took: " + (endOverall-startOverall)); System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute))); Executing batch took: 227 Executing overall took: 384 Preparation took: 157
Conclusion: This is the last article of the series Loading data fast! In this series I’ve not only shown how bad it is to commit after each row (remember Autocommit in JDBC!) but also how much more speed you can get out of your program by doing batch inserts. I’ve also shown how to deal with potential errors during batch inserts and how much performance impact errors produce. The latter I have done to make it clear that developers should not just throw DML error logging inserts at every problem as they will lose the performance benefit again that batch inserts provide. So now, go over your code and batch up! 😉
This approach could hurt on large datasets, you want to use oracle specific tools(sqlloader, data pump) to load large data at faster rate. Jdbc approach can never be faster than doing odbc way.
.Raju
LikeLike
Obviously on very large data sets you will have to start thinking about your batching process. You would probably not want to load everything in one go and rerun potential hours of work nor would you like your users being waiting for the data to show up.
This has nothing to do with ODBC, ODBC and JDBC are both drivers, you probably meant OCI but even that is not true as you can run the JDBC driver over the OCI interface if you want.
LikeLike