How to get the month name in the IntelliJ copyright velocity template

Every time I try to create a copyright profile in IntelliJ (or PyCharm), I find myself googling around the web trying to find out how to get the month name, e.g. August, into the template. Unfortunately, the IntelliJ documentation is not very helpful by just saying:

IntelliJ documentation about date formatting
Continue reading “How to get the month name in the IntelliJ copyright velocity template”

Java class finding tool: ClassFinder

I got into more coding again lately and that is a good thing I think. I’m a believer of the German saying “Wer rastet, der rostet” or in English: A rolling stone gathers no moss! Java seems to be my main programming language since quite a while now. But believe it or not I’m still coding some PL/SQL as well now and then. Anyway, sometimes when you get some ClassNotFound exceptions in Java it can be quite handy to have a little tool that can scan .jar and .zip files for the class that you’re missing. Yes, there are tons of those little programs out there on the web but as you might have guessed by the title I’ve decided to write yet another class finding tool. Damn, I really should have called it YACFT! But instead I did go for ClassFinder. Why did I write yet another class finding tool then? Well, although there are some good ones out there, non of them fulfilled all of my requirements – at least none of those that I have found. First I was a big fan of Xigole’s classfinder tool. A really slick, fast and easy to use command-line class finding tool which is really great for Unix systems where you ssh in. However, I soon got fed up with it when I did have a proper graphical interface available, whether it was Windows or X11 or simply my MacBook Pro. So I started searching for other tools and soon found that they were either just GUI based or just GUI based and on Windows only (Why? I don’t get it!) or GUI based on ugly AWT. But all that I wanted was a slick tool like Xigole’s that was clever enough to spawn a GUI when there was X11 available but still provide a nice CLI when there wasn’t. Well, as I said I couldn’t find one, so I wrote my own and here it is, ClassFinder:

ClassFinder Window-mode

It’s really easy to use:

  1. Give it either a file or a folder to search – files currently supported are .jar, .war, .ear, .zip, .rar, .class, .java
  2. Give it the class name that you’re looking for
  3. Shall the class name be case sensitive?
  4. Shall the folder be searched recursively?
  5. Hit Search!

Continue reading “Java class finding tool: ClassFinder”

Loading data fast – DML error logging performance

In the last post of my “Loading data fast” series I showed how DML error logging can be used to prevent batch loads from failing when one or multiple rows can’t be inserted. Now the question is how much performance impact errors have on the mechanism. What I try to prove is that a developer should not just blindly use DML error logging instead of thinking whether the data he intents to insert is primary valid. So let’s have a look:

First I create the same simple table with a primary key on it that I used before:

CREATE TABLE TESTLOADTABLE (id NUMBER, text VARCHAR2(255));
table TESTLOADTABLE created.

CREATE UNIQUE INDEX TESTLOADTABLE_PK ON TESTLOADTABLE(id);
unique index TESTLOADTABLE_PK created.

ALTER TABLE TESTLOADTABLE ADD PRIMARY KEY (id) USING INDEX TESTLOADTABLE_PK;
table TESTLOADTABLE altered.

Then I create the error logging table:

BEGIN
  DBMS_ERRLOG.CREATE_ERROR_LOG('TESTLOADTABLE','ERR_TESTLOADTABLE');
END;
anonymous block completed

So far so good, now let’s execute following tests:

1) Load 10k rows with no errors
2) Load 10k rows with 100 (1%) rows failing
3) Load 10k rows with 1000 (10%) rows failing
4) Load 10k rows with 2500 (25%) rows failing
5) Load 10k rows with 5000 (50%) rows failing
6) Load 10k rows with 7500 (75%) rows failing
And just out of curiosity let’s see how long a regular insert with no error logging takes
7) Load 10k rows with no errors and no error logging

Test 1 – no errors:

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 861
Executing overall took: 1020
Preparation took: 159

So overall it took 1020 milli seconds while the job spent 861 milli seconds in the execution of the insert.

Test 2 – 100 errors (1%):

PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)");
for (int i=0;i<10000;i++)
{
  if (i%100 == 0)
  {
    stmt0.setInt(1, i);
    stmt0.setString(2, "test" + i);
    stmt0.addBatch();
  }
}
stmt0.executeBatch();
conn.commit();

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 1017
Executing overall took: 1069
Preparation took: 52

This time it took quite a bit long to execute the batch, 1017 milli seconds. The reason for this is obvious. Oracle now has not only to insert 10k rows but also to reject rows and insert them into another table.

Test 3 – 1000 (10%) errors:

PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)");
for (int i=0;i<10000;i++)
{
  if (i%10 == 0)
  {
    stmt0.setInt(1, i);
    stmt0.setString(2, "test" + i);
    stmt0.addBatch();
  }
}
stmt0.executeBatch();
conn.commit();

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 1420
Executing overall took: 1470
Preparation took: 50

And as you can see the more rows that are failing, the longer the insert takes. So let’s have a look at the rest of the tests and see how bad it gets:

Test 4 – 2500 (25%) errors:

PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)");
for (int i=0;i<10000;i++)
{
  if (i%4 == 0)
  {
    stmt0.setInt(1, i);
    stmt0.setString(2, "test" + i);
    stmt0.addBatch();
  }
}
stmt0.executeBatch();
conn.commit();

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 1877
Executing overall took: 1961
Preparation took: 84

Test 5 – 5000 (50%) errors:

PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)");
for (int i=0;i<10000;i++)
{
  if (i%2 == 0)
  {
    stmt0.setInt(1, i);
    stmt0.setString(2, "test" + i);
    stmt0.addBatch();
  }
}
stmt0.executeBatch();
conn.commit();

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 2680
Executing overall took: 2765
Preparation took: 85

Test 6 – 7500 (75%) errors:

PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)");
for (int i=0;i<10000;i++)
{
  if (i<=7500)
  {
    stmt0.setInt(1, i);
    stmt0.setString(2, "test" + i);
    stmt0.addBatch();
  }
}
stmt0.executeBatch();
conn.commit();

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 3349
Executing overall took: 3412
Preparation took: 63

So as you can see, the more errors you get, the longer your batch will need to execute. This is only logical of course as the more errors you get, the more exceptions are thrown which then lead to more rows inserted into the error table. The main takeaway is that your insert is not failing and you can smoothly query the error logging table and act appropriately. Just in comparison:

  • Insert no errors: 861ms
  • Insert 1% errors: 1017ms
  • Insert 10% errors: 1420ms
  • Insert 25% errors: 1877ms
  • Insert 50% errors: 2680ms
  • Insert 75% errors: 3349ms

Now let’s do one more test and see how fast a regular insert without DML error logging is. You would think it’s the same but during my experiments if found an interesting fact:

Test 7 – no errors, no error logging:

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* regular bulk insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 212
Executing overall took: 372
Preparation took: 160

So a regular insert takes only 212ms while a DML error logging insert takes 861ms. That’s 4 times longer!
The reason for this is because of the unpublished bug 11865420 (My Oracle Support Doc Id: 11865420.8). Once you download the patch and update the system accordingly the insert with DML error logging is just as fast as without:

Regular batch insert:

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* regular bulk insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 248
Executing overall took: 399
Preparation took: 151

DML error logging insert:

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 227
Executing overall took: 384
Preparation took: 157

Conclusion: This is the last article of the series Loading data fast! In this series I’ve not only shown how bad it is to commit after each row (remember Autocommit in JDBC!) but also how much more speed you can get out of your program by doing batch inserts. I’ve also shown how to deal with potential errors during batch inserts and how much performance impact errors produce. The latter I have done to make it clear that developers should not just throw DML error logging inserts at every problem as they will lose the performance benefit again that batch inserts provide. So now, go over your code and batch up! 😉

Loading data fast – Batch inserts and errors (DML error logging)

In the last post of my series “Loading data fast” I showed how batch inserts can actually make a huge difference in insert performance. Now one question remains: What happens when an error occurs like a unique key violation? The answer is: The insert statement will fail with an error and stop. Now, if you didn’t catch the exception, it will be raised and the commit will never be issued which will lead to a loss of all previous inserted data as well. However, if you catch the exception and make sure that you issue a commit afterwards, you will at least have your previous successful inserted data in the table if that is suitable for the business logic  – basically meaning that you know where you’ve stopped. Let’s have a look:

In Java you will have to run the PreparedStatement.executeBatch() routine in a try block and execute the Connection.commit() routine in the finally block, like this:


PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<ROWS;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}

try
{
  stmt.executeBatch();
}
catch (BatchUpdateException e)
{
  System.out.println("Error during my batch insert example");
  e.printStackTrace();
}
finally
{
  conn.commit();
}

In PL/SQL you will have to run your FORALL statement in a new block, like this:


DECLARE
  TYPE tab is TABLE OF testloadtable%ROWTYPE;
  myvals tab;
BEGIN
  SELECT rownum, 'x'
    BULK COLLECT INTO myvals
      FROM dual
        CONNECT BY LEVEL <= 10000;
  BEGIN
    FORALL i IN myvals.FIRST..myvals.LAST
      INSERT INTO testloadtable (id, text)
        VALUES (myvals(i).id, myvals(i).text);
  EXCEPTION
    WHEN DUP_VAL_ON_INDEX THEN
      DBMS_OUTPUT.PUT_LINE('Unique key violation!');
  END;
  COMMIT;
END;

By doing this, you will at least keep all previous inserted data. However, you will still have to fix the error and run the rest of the batch again and latter can especially in batch load scenarios be rather difficult. If you catch the exception, you will have to find the right position in your batch again in order to continue. In some cases that could imply to rebuild the entire batch again. If you don’t catch the exception, because your batch is one logical unit and either it succeeds or will be rolled back, then you will have to start from scratch again. So the conclusion is: Neither of the two options is optimal!
Fortunately, Oracle provides a feature called “DML error logging“! This feature allows you to simply log all errors into an error table and continue the execution of your DML statement with no error being raised! This feature works with all DML statements: Insert, Update, Merge and Delete. I will, however, focus only on insert statements in this post.

First let’s prove what I just said. All I do is to create a primary key on the table and insert a row before I execute my batch.


CREATE TABLE TESTLOADTABLE (id NUMBER, text VARCHAR2(255));
table TESTLOADTABLE created.

CREATE UNIQUE INDEX TESTLOADTABLE_PK ON TESTLOADTABLE(id);
unique index TESTLOADTABLE_PK created.

ALTER TABLE TESTLOADTABLE ADD PRIMARY KEY (id) USING INDEX TESTLOADTABLE_PK;
table TESTLOADTABLE altered.

Now that I have my primary key created, I insert 1 row with the id 3000 before I execute my batch. The batch contains 10k rows with an id range of 0 – 9999. Once it reaches the row with the id 3000 (3001st row, as I start with id 0), I’ll receive an exception that the row already exists. Note, that all other rows would be fine and valid except this single one with the id 3000. Nevertheless, as I do not make sure that a commit is happening, I will lose all of them:

conn.prepareStatement("INSERT INTO testloadtable (id, text) VALUES (3000,'x')").execute();
conn.commit();

stmt = conn.prepareStatement("INSERT /* addBatch insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<ROWS;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
stmt.executeBatch();
conn.commit();

As expected, I get following exception back:

Exception in thread "main" java.sql.BatchUpdateException: ORA-00001: unique constraint (TEST.SYS_C0010986) violated

And when I do a count on the table, I have only my previously inserted row in there:


SELECT COUNT(*) FROM testloadtable
COUNT(*)
--------
1

Now let’s see what I get when I do catch the exception and make sure to issue a commit:

conn.prepareStatement("INSERT INTO testloadtable (id, text) VALUES (3000,'x')").execute();
conn.commit();

PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<ROWS;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}

try
{
  stmt.executeBatch();
}
catch (BatchUpdateException e)
{
  System.out.println("Error during my batch insert example");
  e.printStackTrace();
}
finally
{
  conn.commit();
}

The output shows that I caught the exception successfully:


Error during my batch insert example
java.sql.BatchUpdateException: ORA-00001: unique constraint (TEST.SYS_C0010986) violated

And a count on the table now shows that I have 3001 rows in it; my previously inserted row with id 3000 and the first 3000 rows from the batch with id 0 – 2999:

SELECT COUNT(*) FROM testloadtable
COUNT(*)
--------
3001

Now as said before, the great thing about DML error logging is, that it offers you the ability to load your entire batch without receiving an error at all. Instead the error(s) will be logged into a separate error table which can then be queried and appropriate actions taken afterwards. All you need to do is to modify your DML statement to include the LOG ERRORS clause. You will also have to create an error logging table via the DBMS_ERRLOG package first. Optionally, you can:

  • Include a tag that gets added to the error log to help identify the statement that caused errors
  • Include the REJECT LIMIT subclause, which allows you to define a upper limit of errors that can be encountered before the statement fails. If you have a batch of 10k rows where 7k rows are failing, it probably doesn’t make much sense anymore to continue as there seems to be a bigger problem. This clause offers you the functionality to still raise the error if a certain threshold is reached. I guess in order to make sure that nobody accidentally adds the error logging clause and so suppresses all errors, the default value is 0. Which means that if you omit the REJECT LIMIT clause, an error will be logged into the error logging table but the statement will also be terminated. If you want to log all errors but raise no error you will have to define UNLIMITED

The error table itself is a copy of the target table with a few more columns, telling you things like error number, error message and so forth. But the important part is that it will contain all the columns of your target table and all those columns will contain the values from the failing row of the insert. This means that you will not lose any information of your batch and do not need to re-execute the entire batch again!
Let’s see how it works! First I create an error logging table called ERR_TESTLOADTABLE:


BEGIN
  DBMS_ERRLOG.CREATE_ERROR_LOG('TESTLOADTABLE','ERR_TESTLOADTABLE');
END;
anonymous block completed
DESC err_testloadtable
Name            Null Type
--------------- ---- --------------
ORA_ERR_NUMBER$      NUMBER
ORA_ERR_MESG$        VARCHAR2(2000)
ORA_ERR_ROWID$       UROWID()
ORA_ERR_OPTYP$       VARCHAR2(2)
ORA_ERR_TAG$         VARCHAR2(2000)
ID                   VARCHAR2(4000)
TEXT                 VARCHAR2(4000)

All I then have to do, is to modify my original code so that the insert statement includes a “LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED” clause at the end of it:

conn.prepareStatement("INSERT INTO testloadtable (id, text) VALUES (3000,'x')").execute();
conn.commit();

PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<ROWS;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
stmt.executeBatch();
conn.commit();

Although I inserted the row with the id 3000 first, no error was reported anymore during the execution. However, if I have a look at my ERR_TESTLOADTABLE, I see following:


SELECT * FROM ERR_TESTLOADTABLE
ORA_ERR_NUMBER$ ORA_ERR_MESG$                                             ORA_ERR_ROWID$  ORA_ERR_OPTYP$ ORA_ERR_TAG$ ID    TEXT
--------------- --------------------------------------------------------- --------------- -------------- ------------ ----- --------
1               ORA-00001: unique constraint (TEST.SYS_C0010987) violated                 I                           3000  test3000

The error logging table shows me:

  1. The error number “1” which means “ORA-00001”
  2. The error message
  3. RowId: In this case null as it was an insert and the row didn’t exist before
  4. The operation type “I” indicating Insert
  5. An empty error tag, as I have not defined one
  6. ID column value from the failing row
  7. Text column value from the failing row

Conclusion: DML error logging allows you to use batch DML statements that won’t fail on errors. This gives you the benefit of maximum flexibility when you perform batch operation!
In my next post I will take a look at the performance impact of DML error logging.

Loading data fast – regular insert vs. bulk insert

In my last post I talked about how persisting data can become the bottleneck on large high scale systems nowadays. I also talked about that more and more people tend to think that databases are simply slow, seeing them as just big I/O systems. And I talked about how lots of applications are still inserting data as they used to do years ago rather to use bulk inserts.

In this post I will show you how bulk inserts can actually boost your inserts and therefore you systems. I will use a simple example showing the difference between:

  • Single row insert with commit
  • Single row insert with only one final commit
  • Bulk insert with final commit

Let’s assume you have a java program that needs to load some data from a file into a single table. Each line in the file represents a row in the database. I won’t go into how to read from the file and build your data together. This is out of scope for this post and not relevant to show the benefit of bulk inserts over regular ones. First let’s build a simple two column table, including an “id” column as NUMBER and a “text” column as VARCHAR2:

CREATE TABLE TESTLOADTABLE (id NUMBER, text VARCHAR2(255));

table TESTLOADTABLE created.

For each test I truncate the table first just to make sure that I always load the same amount of data into the same empty table. I add a comment in the statements, so that I can separate them out later on in the trace file.

The first example loads 10,000 rows into the table. It will simply insert an incrementing counter and a string into the table followed by a commit.

conn.prepareStatement("TRUNCATE TABLE testloadtable").execute();
conn.prepareStatement("ALTER SESSION SET SQL_TRACE=TRUE").execute();

PreparedStatement stmt = conn.prepareStatement("INSERT /* conventional insert with commit */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.execute();
  conn.commit();
}

conn.prepareStatement("ALTER SESSION SET SQL_TRACE=FALSE").execute();

Looking at the trace file it took the program 2.21 seconds to load these 10,000 rows. You can also see that the statement got actually executed 10,000 times – the commits unfortunately don’t show up in the formatted trace file but they would be listed in the raw data trace file.

 SQL ID: 337xy5qc84nsq Plan Hash: 0

INSERT /* conventional insert with commit */ INTO testloadtable (id, text)
 VALUES
 (:1 ,:2 )

call     count       cpu    elapsed       disk      query    current        rows
 ------- ------  -------- ---------- ---------- ---------- ----------  ----------
 Parse        1      0.00       0.00          0          0          0           0
 Execute  10000      2.06       2.21          2         90      20527       10000
 Fetch        0      0.00       0.00          0          0          0           0
 ------- ------  -------- ---------- ---------- ---------- ----------  ----------
 total    10001      2.06       2.21          2         90      20527       10000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 111
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
 ---------- ---------- ----------  ---------------------------------------------------
 0          0          0  LOAD TABLE CONVENTIONAL  (cr=5 pr=2 pw=0 time=1637 us)

The next test puts the commit outside of the loop. So I still add the data row by row to the table but the commit itself happens only once after all rows were loaded:

conn.prepareStatement("TRUNCATE TABLE testloadtable").execute();
conn.prepareStatement("ALTER SESSION SET SQL_TRACE=TRUE").execute();

PreparedStatement stmt = conn.prepareStatement("INSERT /* conventional insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.execute();
}
conn.commit();

conn.prepareStatement("ALTER SESSION SET SQL_TRACE=FALSE").execute();

As the results show the statement was still executed 10000 times. However, this time it took only 1.19 seconds to insert all the data. So by just moving the commit to the end, after all inserts were done, I gained already 57% more performance! Although it seems that commits are light weighted and don’t do much, the database still has some tasks to accomplish to make sure that your transaction is saved and visible. And of course instead of having only 1 transaction, I have 10,000 in this case.

 SQL ID: drsv4dw4037zj Plan Hash: 0

INSERT /* conventional insert */ INTO testloadtable (id, text)
 VALUES
 (:1 ,:2 )

call     count       cpu    elapsed       disk      query    current        rows
 ------- ------  -------- ---------- ---------- ---------- ----------  ----------
 Parse        1      0.00       0.00          0          0          0           0
 Execute  10000      1.09       1.19          2        114      10562       10000
 Fetch        0      0.00       0.00          0          0          0           0
 ------- ------  -------- ---------- ---------- ---------- ----------  ----------
 total    10001      1.09      1.19          2        114      10562       10000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 111
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
 ---------- ---------- ----------  ---------------------------------------------------
 0          0          0  LOAD TABLE CONVENTIONAL  (cr=5 pr=2 pw=0 time=1507 us)

The Oracle JDBC driver does support bulk inserts. What you can do is that you add all your data to a “batch” and then execute the entire batch. This gives you the advantage that there is only 1 INSERT statement executed which inserts all your data into the table at once! So instead of 10,000 round-trips, I only have 1 sending all the data over in one big chunk:

conn.prepareStatement("TRUNCATE TABLE testloadtable").execute();
conn.prepareStatement("ALTER SESSION SET SQL_TRACE=TRUE").execute();

PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
stmt.executeBatch();
conn.commit();

conn.prepareStatement("ALTER SESSION SET SQL_TRACE=FALSE").execute();

Now these results are pretty amazing! There was only 1 execution of that insert statement and that loaded the entire batch in only 0.06 seconds!

 SQL ID: gfkg1d43va20y Plan Hash: 0

INSERT /* addBatch insert */ INTO testloadtable (id, text)
 VALUES
 (:1 ,:2 )

call     count       cpu    elapsed       disk      query    current        rows
 ------- ------  -------- ---------- ---------- ---------- ----------  ----------
 Parse        1      0.00       0.00          0          0          0           0
 Execute      1      0.05       0.06          0        129        394       10000
 Fetch        0      0.00       0.00          0          0          0           0
 ------- ------  -------- ---------- ---------- ---------- ----------  ----------
 total        2      0.05       0.06          0        129        394       10000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 111
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
 ---------- ---------- ----------  ---------------------------------------------------
 0          0          0  LOAD TABLE CONVENTIONAL  (cr=139 pr=0 pw=0 time=51971 us)

As you can see, bulk inserting can give your system a huge performance boost! In this example here we are not talking about an improvement of percents anymore, but about factors! Imagine what this could gain you when you have even more data!

This post shows that persisting data into a database is not just simply slow but it also matters a lot how you actually insert that data!