Bitmap index usecases

In my previous post I talked about bitmap indexes, where they should be used and where they shouldn’t be used or what you have to keep in mind when using them. Today I show you a example where they are very helpful. Remember I said that bitmap indexes where designed for read intensive environments and that they are useful on columns with low cardinality. Lets assume you have a table for all your employees in your company. The company has three different locations, lets say 1=New York; 2=London; 3=Paris:

SQL> CREATE TABLE employees
(
 empId      NUMBER,
 deptId     NUMBER,
 name       VARCHAR2(50),
 gender     VARCHAR2(6)
);

Table created.

Now I insert lets say 100,000 employees distributed over the three locations. The distribution on the locations is based on a random number out of DBMS_RANDOM. Same for the gender:

SQL> BEGIN
 2    FOR n IN 1..100000 LOOP
 3      INSERT INTO employees VALUES (n, ROUND(DBMS_RANDOM.VALUE(0.5,3.5),0), 'Employee ' || n, DECODE(ROUND(DBMS_RANDOM.VALUE(1,2),0),1,'MALE','FEMALE'));
 4    END LOOP;
 5    COMMIT;
 6  END;
 7  /

PL/SQL procedure successfully completed.

A quick look at the distributions:

SQL> SELECT gender, COUNT(*) FROM employees GROUP BY gender;

GENDER   COUNT(*)
------ ----------
FEMALE      49985
MALE        50015

SQL> SELECT deptId, COUNT(*) from employees GROUP BY deptId;

 DEPTID   COUNT(*)
---------- ----------
 1      33532
 2      33456
 3      33012

So as you can see the distribution is good and the cardinality is “bad”. As the next step I create a normal index on it and look how it behaves.

SQL> CREATE INDEX EMPLOYEES_GENDER_I001 ON employees (gender);

Index created.

SQL> CREATE INDEX EMPLOYEES_DEPT_I001 ON employees (deptId);

Index created.

SQL> exec DBMS_STATS.GATHER_TABLE_STATS(USER, 'EMPLOYEES', CASCADE=>TRUE);

PL/SQL procedure successfully completed.

Now lets have a closer look at some selects and their explain plans. First I just count the number of female employees and the number of employees in the New York (deptId=1) office. As I have indexes on both columns Oracle should retrieve the data directly from them – no need to go to the table:

SQL> set autotrace traceonly explain;
SQL> SELECT COUNT(*) FROM employees WHERE gender = 'FEMALE';

Execution Plan
----------------------------------------------------------
Plan hash value: 938208788

-----------------------------------------------------------------------------------------------
| Id  | Operation             | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                       |     1 |     7 |    88  (18)| 00:00:01 |
|   1 |  SORT AGGREGATE       |                       |     1 |     7 |            |          |
|*  2 |   INDEX FAST FULL SCAN| EMPLOYEES_GENDER_I001 | 50108 |   342K|    88  (18)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 2 - filter("GENDER"='FEMALE')

SQL> SELECT COUNT(*) FROM employees WHERE deptId = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 3728864332

-----------------------------------------------------------------------------------------
| Id  | Operation         | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                     |     1 |     3 |    74   (9)| 00:00:01 |
|   1 |  SORT AGGREGATE   |                     |     1 |     3 |            |          |
|*  2 |   INDEX RANGE SCAN| EMPLOYEES_DEPT_I001 | 34186 |   100K|    74   (9)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 2 - access("DEPTID"=1)

I got an INDEX FAST FULL SCAN and an INDEX RANGE SCAN – fair enough. Costs are 88 and 74. Now what if I have a more complex statement? What if I want to have all male employees who are either in the New York or the London office:

SQL> SELECT COUNT(*) FROM employees WHERE gender = 'MALE' AND deptId IN (1,2);

Execution Plan
----------------------------------------------------------
Plan hash value: 1712853197

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |     1 |     9 |   177  (16)| 00:00:01 |
|   1 |  SORT AGGREGATE    |           |     1 |     9 |            |          |
|*  2 |   TABLE ACCESS FULL| EMPLOYEES | 34047 |   299K|   177  (16)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 2 - filter("GENDER"='MALE' AND ("DEPTID"=1 OR "DEPTID"=2))

This leads already to a TABLE ACCESS FULL with costs of 177. Now the same story with bitmap indexes on the columns:

SQL> DROP INDEX EMPLOYEES_GENDER_I001;

Index dropped.

SQL> DROP INDEX EMPLOYEES_DEPT_I001;

Index dropped.

SQL> CREATE BITMAP INDEX EMPLOYEES_GENDER_I001 ON employees (gender);

Index created.

SQL> CREATE BITMAP INDEX EMPLOYEES_DEPT_I001 ON employees (deptId);

Index created.

SQL> exec DBMS_STATS.GATHER_TABLE_STATS(USER, 'EMPLOYEES', CASCADE=>TRUE);

PL/SQL procedure successfully completed.

First lets do the counts again:

SQL> SELECT COUNT(*) FROM employees WHERE gender = 'FEMALE';

Execution Plan
----------------------------------------------------------
Plan hash value: 3004312514

-----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                       |     1 |     7 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE             |                       |     1 |     7 |            |          |
|   2 |   BITMAP CONVERSION COUNT   |                       | 51145 |   349K|     3   (0)| 00:00:01 |
|*  3 |    BITMAP INDEX SINGLE VALUE| EMPLOYEES_GENDER_I001 |       |       |            |          |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 3 - access("GENDER"='FEMALE')

SQL> SELECT COUNT(*) FROM employees WHERE deptId = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2661849472

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |     1 |     3 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE             |                     |     1 |     3 |            |          |
|   2 |   BITMAP CONVERSION COUNT   |                     | 34039 |    99K|     3   (0)| 00:00:01 |
|*  3 |    BITMAP INDEX SINGLE VALUE| EMPLOYEES_DEPT_I001 |       |       |            |          |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 3 - access("DEPTID"=1)

As you can see the optimizer chose another plan now for both tables. Instead of an INDEX FAST FULL SCAN and an INDEX RANGE SCAN I got now both times a BITMAP INDEX SINGLE VALUE and a BITMAP CONVERSION COUNT. Costs dropped from 88 and 74 to 3! Now the complexer statement:

SQL> SELECT COUNT(*) FROM employees WHERE gender = 'MALE' AND deptId IN (1,2);

Execution Plan
----------------------------------------------------------
Plan hash value: 597175463

-------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                       |     1 |     9 |     9   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE               |                       |     1 |     9 |            |          |
|   2 |   BITMAP CONVERSION COUNT     |                       | 33998 |   298K|     9   (0)| 00:00:01 |
|   3 |    BITMAP AND                 |                       |       |       |            |          |
|*  4 |     BITMAP INDEX SINGLE VALUE | EMPLOYEES_GENDER_I001 |       |       |            |          |
|   5 |     BITMAP OR                 |                       |       |       |            |          |
|*  6 |      BITMAP INDEX SINGLE VALUE| EMPLOYEES_DEPT_I001   |       |       |            |          |
|*  7 |      BITMAP INDEX SINGLE VALUE| EMPLOYEES_DEPT_I001   |       |       |            |          |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 4 - access("GENDER"='MALE')
 6 - access("DEPTID"=1)
 7 - access("DEPTID"=2)

I got again another plan and here you can see the power of bitmap indexes now. Oracle takes the department ids and does now a BINARY OR on the bitmap. Out of the result it does a BINARY AND with the gender bitmap and look at the result: The costs dropped from 177 on a FULL TABLE SCAN to 9 on some bitmap operations! So as you can see on this short demonstration bitmap indexes are quite powerful!

BITMAP index

In the last few months I got more and more involved into the Data Warehouse side. This is actually not too bad – learning some new stuff brings some change into the daily routine. So on this post I tell something about BITMAP indexes as I just had this subject in work. Normally I don’t write about indexes as Richard Foote is way better in that but however:

Bitmap indexes got introduced into Oracle with 7.3 but are currently only available in the Enterprise and Personal Edition. So why did I mention the Data Warehouse above? Well, because BITMAP indexes are designed for them and ad hoc query environments. Usually you use them on columns with low cardinality. However they are NOT designed for OLTP environments where a lot of data manipulation happens in many concurrent sessions. Reason for that is because you will face some locking issues but more on that later. So how does a BITMAP index look like? The index stores a bitmap for a single value on all the rows. Lets take the emp table for example (I took here the example from Tom Kyte‘s book Expert Oracle Database Architecture site 448). I create a bitmap index on the job column. You can have a lot of employees but the jobs will keep within a limit so you will have a low cardinality there. Your index would look like this:

Value/Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14
ANALYST 0 0 0 0 0 0 0 1 0 1 0 0 1 0
CLERK 1 0 0 0 0 0 0 0 0 0 1 1 0 1
MANAGER 0 0 0 1 0 1 1 0 0 0 0 0 0 0
PRESIDENT 0 0 0 0 0 0 0 0 1 0 0 0 0 0
SALESMAN 0 1 1 0 1 0 0 0 0 0 0 0 0 0

So instead of a normal B*Tree index you don’t get a new index entry with every value pointing to the rowid but you have a bitmap for each single value and Oracle puts either a 0 or an 1 in there for a specific row. One thing to mention here: BITMAP indexes also store NULL values as 0. So why is there an issue with locking on concurrent sessions? The reason for that is that one single bitmap entry points to many rows. In the above example thing about the CLERK entry. One company will have many clerks but you have only one index entry with the underlying bitmap. Oracle cannot lock individual bits in a bitmap index entry, it has to lock the specific entry with the entire bitmap. Any other session which also tries to update the same index entry at the same time will be locked until the first session does a commit or rollback! So lets have a look into this:

First I create a new table. The table has just one column “departementID”. Lets stay at the example with the employees. We got many employees which belong to 3 different departments. Normally you would have columns like name and so forth but for this I concentrate on the departmentId:

SQL> CREATE TABLE BITMAPTEST (departmentId NUMBER) TABLESPACE DATA1;

Table created.

Now I add a bitmap index on the departmentId. All I have to do is to put the word BITMAP between CREATE and INDEX:

SQL> CREATE BITMAP INDEX BITMAPTEST_BI001 ON BITMAPTEST (departmentId);

Index created.

So, now I have two sessions:

Session 1 has the SID 1539
Session 2 has the SID 1470

Now session 1 inserts a row with the first departmentId:

SESS1 – 1539:

SQL> INSERT INTO BITMAPTEST VALUES (1);

1 row created.

No commit so far. Now a second session inserts a new row with the same department – another employee who started at the same department:

SESS2 – 1470:

SQL> INSERT INTO BITMAPTEST VALUES (1);

The insert is locked with a “enq: TX – row lock contention until the first session commits or does a rollback:

SQL> SELECT sid FROM v$session_wait WHERE event = 'enq: TX - row lock contention';

SID
----------
1470

After I do a commit the lock is released:

SESS1 – 1539:

 SQL> COMMIT;

Commit complete.

The second session can now create the row:

SESS2 – 1470:

1 row created.

As I said before the lock is only on the bitmap of the specific index entry. So if session 1 (session 2 did not commit or rollback yet) tries now to insert another row on another department everything is fine:

SESS1 – 1539:

SQL> INSERT INTO BITMAPTEST VALUES (2);

1 row created.

What does that mean on an OLTP environment? As an OLTP environment is very write/modification intensive system concurrent sessions will lock each other out when they try do modify a bitmap index entry. Here a small test:

I take again the bitmaptest table with the departmentId. Now I insert 1,000 rows with 3 different department IDs like this:

DECLARE
departmentID  NUMBER:=1;
maxLoop       NUMBER := 1000;
BEGIN
FOR n1 IN 1..maxLoop LOOP
IF departmentID = 4 THEN
departmentID := 1;
END IF;
INSERT INTO bitmapTest VALUES (departmentID);
departmentID := departmentID+1;
END LOOP;
COMMIT;
END;
/

Note: The commit happens after the loop. So all other sessions will have to wait on this one to finish before they can start.

I run this PL/SQL now in 100 concurrent sessions over a shell script:

processes=100;
i=0;

while [ $i -le $processes ];
do
runInserts.sh &
i=`expr $i + 1`;
done;

Now lets see what happens. To monitor the locks I just do a simple SQL on v$session_wait. I should see there lots of waits with “enq: TX – row lock contention”. So I run the test and what I get is:

SQL> SELECT count(*), event, state, wait_class
FROM v$session_wait
GROUP BY event, state, wait_class
ORDER BY 1 DESC;

COUNT(*) EVENT                                                            STATE               WAIT_CLASS
---------- ---------------------------------------------------------------- ------------------- -----------------
136 SQL*Net message from client                                      WAITING             Idle
90 enq: TX - row lock contention                                    WAITING             Application
12 rdbms ipc message                                                WAITING             Idle
1 buffer busy waits                                                WAITED SHORT TIME   Concurrency
1 smon timer                                                       WAITING             Idle
1 wait for unread message on broadcast channel                     WAITING             Idle
1 Streams AQ: qmn coordinator idle wait                            WAITING             Idle
1 SQL*Net message to client                                        WAITED SHORT TIME   Network
1 pmon timer                                                       WAITING             Idle
1 Streams AQ: qmn slave idle wait                                  WAITING             Idle
1 Streams AQ: waiting for messages in the queue                    WAITING             Idle

11 rows selected.

As you can see I have 90 sessions with row lock contention. When I reexecute this statement several times the amount of sessions reduces just slowly as just one session is running after each other:

SQL> r
1  SELECT count(*), event, state, wait_class
2   FROM v$session_wait
3    GROUP BY event, state, wait_class
4*    ORDER BY 1 DESC

COUNT(*) EVENT                                                            STATE               WAIT_CLASS
---------- ---------------------------------------------------------------- ------------------- -----------------
136 SQL*Net message from client                                      WAITING             Idle
86 enq: TX - row lock contention                                    WAITING             Application
12 rdbms ipc message                                                WAITING             Idle
1 buffer busy waits                                                WAITED SHORT TIME   Concurrency
1 smon timer                                                       WAITING             Idle
1 wait for unread message on broadcast channel                     WAITING             Idle
1 Streams AQ: qmn coordinator idle wait                            WAITING             Idle
1 SQL*Net message to client                                        WAITED SHORT TIME   Network
1 pmon timer                                                       WAITING             Idle
1 Streams AQ: qmn slave idle wait                                  WAITING             Idle
1 Streams AQ: waiting for messages in the queue                    WAITING             Idle

11 rows selected.

SQL> r
1  SELECT count(*), event, state, wait_class
2   FROM v$session_wait
3    GROUP BY event, state, wait_class
4*    ORDER BY 1 DESC

COUNT(*) EVENT                                                            STATE               WAIT_CLASS
---------- ---------------------------------------------------------------- ------------------- -----------------
136 SQL*Net message from client                                      WAITING             Idle
83 enq: TX - row lock contention                                    WAITING             Application
12 rdbms ipc message                                                WAITING             Idle
1 buffer busy waits                                                WAITED SHORT TIME   Concurrency
1 smon timer                                                       WAITING             Idle
1 wait for unread message on broadcast channel                     WAITING             Idle
1 Streams AQ: qmn coordinator idle wait                            WAITING             Idle
1 SQL*Net message to client                                        WAITED SHORT TIME   Network
1 pmon timer                                                       WAITING             Idle
1 Streams AQ: qmn slave idle wait                                  WAITING             Idle
1 Streams AQ: waiting for messages in the queue                    WAITING             Idle

11 rows selected.

So when I pick out one of those sessions I can have a look on what I am waiting on just be doing a select on v$session and all_objects:

SQL> SELECT count(*), s.row_wait_obj#
FROM v$session s, v$session_wait w
WHERE s.sid=w.sid
AND w.event = 'enq: TX - row lock contention'
GROUP BY s.row_wait_obj#;
COUNT(*) ROW_WAIT_OBJ#
---------- -------------
96       7154476

96 sessions waiting on the object with the ID 7154476 which is

SQL> SELECT object_name, object_type FROM all_objects WHERE object_id = 7154476;

OBJECT_NAME                    OBJECT_TYPE
------------------------------ -------------------
BITMAPTEST_I001                INDEX

as you can see my BITMAP index! So what does this mean? As I wrote already before: BITMAP index are designed for data warehouse or read intensive environments to speed up queries where you have a low cardinality on it. As Tom Kyte already wrote in his book: Low cardinality always depends on the data. In a table with 10,000 rows a low cardinality might be 10 while in a 10 million table 10,000 might be a low cardinality! It always depends, there is no cut-and-dried answer for this. You simply have to try it out. If you want to use a BITMAP index in an OLTP environment be aware of the locking! There might be valid cases to use them inside there as for example on the “gender” column of a “employees” table to speed up reports. It always depends if the table you load is heavily modified or not and if you load the table concurrent or not!

Sources:

  • Expert Oracle Database Architecture by Thomas Kyte (ISBN: 1-59059-530-0) – Chapter 11 – Site 448 Bitmap Indexes