In my previous post I talked about bitmap indexes, where they should be used and where they shouldn’t be used or what you have to keep in mind when using them. Today I show you a example where they are very helpful. Remember I said that bitmap indexes where designed for read intensive environments and that they are useful on columns with low cardinality. Lets assume you have a table for all your employees in your company. The company has three different locations, lets say 1=New York; 2=London; 3=Paris:
SQL> CREATE TABLE employees ( empId NUMBER, deptId NUMBER, name VARCHAR2(50), gender VARCHAR2(6) ); Table created.
Now I insert lets say 100,000 employees distributed over the three locations. The distribution on the locations is based on a random number out of DBMS_RANDOM. Same for the gender:
SQL> BEGIN 2 FOR n IN 1..100000 LOOP 3 INSERT INTO employees VALUES (n, ROUND(DBMS_RANDOM.VALUE(0.5,3.5),0), 'Employee ' || n, DECODE(ROUND(DBMS_RANDOM.VALUE(1,2),0),1,'MALE','FEMALE')); 4 END LOOP; 5 COMMIT; 6 END; 7 / PL/SQL procedure successfully completed.
A quick look at the distributions:
SQL> SELECT gender, COUNT(*) FROM employees GROUP BY gender; GENDER COUNT(*) ------ ---------- FEMALE 49985 MALE 50015 SQL> SELECT deptId, COUNT(*) from employees GROUP BY deptId; DEPTID COUNT(*) ---------- ---------- 1 33532 2 33456 3 33012
So as you can see the distribution is good and the cardinality is “bad”. As the next step I create a normal index on it and look how it behaves.
SQL> CREATE INDEX EMPLOYEES_GENDER_I001 ON employees (gender); Index created. SQL> CREATE INDEX EMPLOYEES_DEPT_I001 ON employees (deptId); Index created. SQL> exec DBMS_STATS.GATHER_TABLE_STATS(USER, 'EMPLOYEES', CASCADE=>TRUE); PL/SQL procedure successfully completed.
Now lets have a closer look at some selects and their explain plans. First I just count the number of female employees and the number of employees in the New York (deptId=1) office. As I have indexes on both columns Oracle should retrieve the data directly from them – no need to go to the table:
SQL> set autotrace traceonly explain; SQL> SELECT COUNT(*) FROM employees WHERE gender = 'FEMALE'; Execution Plan ---------------------------------------------------------- Plan hash value: 938208788 ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 7 | 88 (18)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 7 | | | |* 2 | INDEX FAST FULL SCAN| EMPLOYEES_GENDER_I001 | 50108 | 342K| 88 (18)| 00:00:01 | ----------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("GENDER"='FEMALE') SQL> SELECT COUNT(*) FROM employees WHERE deptId = 1; Execution Plan ---------------------------------------------------------- Plan hash value: 3728864332 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3 | 74 (9)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 3 | | | |* 2 | INDEX RANGE SCAN| EMPLOYEES_DEPT_I001 | 34186 | 100K| 74 (9)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("DEPTID"=1)
I got an INDEX FAST FULL SCAN and an INDEX RANGE SCAN – fair enough. Costs are 88 and 74. Now what if I have a more complex statement? What if I want to have all male employees who are either in the New York or the London office:
SQL> SELECT COUNT(*) FROM employees WHERE gender = 'MALE' AND deptId IN (1,2); Execution Plan ---------------------------------------------------------- Plan hash value: 1712853197 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 9 | 177 (16)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 9 | | | |* 2 | TABLE ACCESS FULL| EMPLOYEES | 34047 | 299K| 177 (16)| 00:00:01 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("GENDER"='MALE' AND ("DEPTID"=1 OR "DEPTID"=2))
This leads already to a TABLE ACCESS FULL with costs of 177. Now the same story with bitmap indexes on the columns:
SQL> DROP INDEX EMPLOYEES_GENDER_I001; Index dropped. SQL> DROP INDEX EMPLOYEES_DEPT_I001; Index dropped. SQL> CREATE BITMAP INDEX EMPLOYEES_GENDER_I001 ON employees (gender); Index created. SQL> CREATE BITMAP INDEX EMPLOYEES_DEPT_I001 ON employees (deptId); Index created. SQL> exec DBMS_STATS.GATHER_TABLE_STATS(USER, 'EMPLOYEES', CASCADE=>TRUE); PL/SQL procedure successfully completed.
First lets do the counts again:
SQL> SELECT COUNT(*) FROM employees WHERE gender = 'FEMALE'; Execution Plan ---------------------------------------------------------- Plan hash value: 3004312514 ----------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 7 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 7 | | | | 2 | BITMAP CONVERSION COUNT | | 51145 | 349K| 3 (0)| 00:00:01 | |* 3 | BITMAP INDEX SINGLE VALUE| EMPLOYEES_GENDER_I001 | | | | | ----------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("GENDER"='FEMALE') SQL> SELECT COUNT(*) FROM employees WHERE deptId = 1; Execution Plan ---------------------------------------------------------- Plan hash value: 2661849472 --------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 3 | | | | 2 | BITMAP CONVERSION COUNT | | 34039 | 99K| 3 (0)| 00:00:01 | |* 3 | BITMAP INDEX SINGLE VALUE| EMPLOYEES_DEPT_I001 | | | | | --------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("DEPTID"=1)
As you can see the optimizer chose another plan now for both tables. Instead of an INDEX FAST FULL SCAN and an INDEX RANGE SCAN I got now both times a BITMAP INDEX SINGLE VALUE and a BITMAP CONVERSION COUNT. Costs dropped from 88 and 74 to 3! Now the complexer statement:
SQL> SELECT COUNT(*) FROM employees WHERE gender = 'MALE' AND deptId IN (1,2); Execution Plan ---------------------------------------------------------- Plan hash value: 597175463 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 9 | 9 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 9 | | | | 2 | BITMAP CONVERSION COUNT | | 33998 | 298K| 9 (0)| 00:00:01 | | 3 | BITMAP AND | | | | | | |* 4 | BITMAP INDEX SINGLE VALUE | EMPLOYEES_GENDER_I001 | | | | | | 5 | BITMAP OR | | | | | | |* 6 | BITMAP INDEX SINGLE VALUE| EMPLOYEES_DEPT_I001 | | | | | |* 7 | BITMAP INDEX SINGLE VALUE| EMPLOYEES_DEPT_I001 | | | | | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("GENDER"='MALE') 6 - access("DEPTID"=1) 7 - access("DEPTID"=2)
I got again another plan and here you can see the power of bitmap indexes now. Oracle takes the department ids and does now a BINARY OR on the bitmap. Out of the result it does a BINARY AND with the gender bitmap and look at the result: The costs dropped from 177 on a FULL TABLE SCAN to 9 on some bitmap operations! So as you can see on this short demonstration bitmap indexes are quite powerful!