Vishal's blog: Pivotal performance improvement in OBIEE

This article shows a method to improve the performance of the pivot view of OBIEE.

Assumption: We are working with the Oracle 11G rdbms system

Drawback: We will have to create one opaque view per pivot table. It is a bad practice to change repository for something related to presentation but the performance gain is phenomenal

Pivoting data has always been one of the typical requirements in business intelligence. It is a method of displaying data in a more intuitive way.

For example, consider the following data

Name Dept no Color of the shirt

Tom 40 Red

Dick 40 Blue

Harry 50 Blue

Mary 50 Red

Julia 50 Red

If we wanted to find out the number of employees per department per color of shirt then, we could place the data in either of the following 2 ways

40 50

Red 1 2

Blue 1 1

Count Dept no Color of the shirt

1 40 Red

1 40 Blue

2 50 Red

1 50 Blue

It's easy to see the 1^st representation of the aggregation is easier to read. The 1^st representation is called the pivot view

To generate the pivot view, OBIEE first collects the entire set of data and then the presentation server organizes it in the desired format to display it

This method suffers on 2 accounts

a. The number of logical i/o from the DB is huge because OBIEE retrieves the entire result set. Huge number of i/o can be proved by a number of techniques such as the trace of the query and runstats.sql (given by Tom Kyte). Explain plan with statistics can also give us the info about consistent gets (set autotrace on explain statistics). The plan with statics is at the bottom of the article

b. Because all the rows are fetched from the database, the number of round trips from the db server to the client are huge

a. All these rows will have to be sent to the presentation server. This will involve an added delay

We will use OBIEE's session log to find the time taken by the query to execute on the DB and the time taken for the generation of the pivot view.

Modus operandi

1. We will create a 2 answers that show the pivoted data using 2 techniques

2. We will measure performance of each step in each technique

To do this test, I am using the following query to generate 100,000 records

SELECT DBMS_RANDOM.STRING ('u', 10) AS employee_name,

CEIL (DBMS_RANDOM.VALUE (1, 10)) * 10 AS dept_number,

DECODE (FLOOR (DBMS_RANDOM.VALUE (1, 5)),

1, 'Blue',

2, 'Green',

3, 'Purple',

4, 'Red'

) AS shirt_color,

1 employee_count

FROM DUAL

CONNECT BY ROWNUM <= 100000

Now, let's start digging

Oracle 11G has a new function called pivot and I intend to leverage it for my benefit. My physical and BMM models are weird but those are not the focus of this article

I have created my physical and BMM is such a way that my queries fetch data from a single physical source. The intention is to not add any extra load on the database just because OBIEE needs one fact and one dimension table to execute a query

My Physical Layer:

1. The dummy table is an opaque view with the following query

select 1 from dual

2. Employees is an actual table generated with the query written above

3. Pivoted_table is again an opaque view with the following query

select *

from

(select dept_number,shirt_color,employee_count

from employees

)

pivot

count(employee_count)

for shirt_color in ('Blue' as Blue,'Green' as Green,'Purple' as Purple,'Red' as Red))

The above query is doing most of the trick. In this query, I am doing a count of the employees and then grouping then on the shirt_color field. Anything in the 'in' clause here, becomes the column of my pivoted table. So the output of this query will have 5 columns, namely: dept_number, Blue, Green, Purple and Red.

Since we know the column names, generated by the pivot query, so we can create the same for our opaque view as shown below

My BMM Layer:

I have kept the same model in BMM as well but I have created Right outer joins here. Creating right outer join ensures that all data from Pivoted_table and Employees is selected irrespective of the records in the dummy table.

So my BMM looks like the following

Both the joins here are right outer joins

My Presentation Layer:

My presentation layer does not have any manipulation. I have dragged my BMM catalog to the presentation layer

My web catalog:

I have created 2 answers, one each on Pivoted_table and Employees tables

The answer on Pivoted_table with table view looks like the following

Now my other answer is on Employees table. I have selected the pivot view in the compound layout as shown below

Let's compare the performance now

Below are the screenshots of the 2 session logs. The 1^st one is of the answer built on Pivoted_Table and the 2^nd is on Employees table.

As we can see the the 1^st only returned the 9 desired records. This is a huge benefit because we are reducing the traffic all across the channel. Right from the Db server to the repository to the presentation server to the engine generating the table view. The 2^nd log shows that 100000 records were returned

Now let's dig a little deeper. Let's bifurcate the time into 2 parts

1. Time taken to execute the query

2. Time taken to generate the chart

I am pasting only the relevant portions of the session log here

Log of the answer on Pivoted_table:

+++Administrator:2f0000:2f0004:----2012/01/15 02:37:12

-------------------- Sending query to database named Pivot Pool (id: <<8464>>):

select *

from

(select dept_number,shirt_color,employee_count

from employees

)

pivot

(count(employee_count)

for shirt_color in ('Blue' as Blue,'Green' as Green,'Purple' as Purple,'Red' as Red))

+++Administrator:2f0000:2f0004:----2012/01/15 02:37:12

-------------------- Execution Node: <<8464>>, Close Row Count = 9, Row Width = 32 bytes

+++Administrator:2f0000:2f0004:----2012/01/15 02:37:12

-------------------- Execution Node: <<8448>> Sort, Close Row Count = 9, Row Width = 32 bytes

+++Administrator:2f0000:2f0004:----2012/01/15 02:37:13

-------------------- Query Status: Successful Completion

+++Administrator:2f0000:2f0004:----2012/01/15 02:37:13

-------------------- Rows 9, bytes 288 retrieved from database query id: <<8464>>

+++Administrator:2f0000:2f0004:----2012/01/15 02:37:13

-------------------- Physical query response time 0 (seconds), id <<8464>>

+++Administrator:2f0000:2f0004:----2012/01/15 02:37:13

-------------------- Physical Query Summary Stats: Number of physical queries 1, Cumulative time 0, DB-connect time 0 (seconds)

+++Administrator:2f0000:2f0004:----2012/01/15 02:37:13

-------------------- Rows returned to Client 9

+++Administrator:2f0000:2f0004:----2012/01/15 02:37:13

-------------------- Logical Query Summary Stats: Elapsed time 1, Response time 0, Compilation time 0 (seconds)

Log of the answer on Employees:

+++Administrator:2f0000:2f0003:----2012/01/15 02:32:57

-------------------- Sending query to database named Pivot Pool (id: <<8409>>):

select T3334.SHIRT_COLOR as c1,

T3334.DEPT_NUMBER as c2,

T3334.EMPLOYEE_NAME as c3

from

EMPLOYEES T3334

+++Administrator:2f0000:2f0003:----2012/01/15 02:33:09

+++Administrator:2f0000:2f0003:----2012/01/15 02:34:16

-------------------- Execution Node: <<8397>> DbGateway Exchange, Close Row Count = 100000, Row Width = 8048 bytes