Saturday, 14 January 2012

Pivotal performance improvement in OBIEE


This article shows a method to improve the performance of the pivot view of OBIEE. 

Assumption: We are working with the Oracle 11G rdbms system
Drawback: We will have to create one opaque view per pivot table. It is a bad practice to change repository for something related to presentation but the performance gain is phenomenal

Pivoting data has always been one of the typical requirements in business intelligence. It is a method of displaying data in a more intuitive way. 

For example, consider the following data
Name               Dept no                        Color of the shirt
Tom                  40                                 Red
Dick                 40                                 Blue
Harry               50                                 Blue
Mary                50                                 Red
Julia                 50                                  Red

If we wanted to find out the number of employees per department per color of shirt then, we could place the data in either of the following 2 ways
1.       


     40                                    50


Red                                                1                                        2
Blue                                               1                                        1

2.       
Count               Dept no                        Color of the shirt
1                      40                                 Red
1                      40                                  Blue
2                      50                                 Red
1                      50                                 Blue

It's easy to see the 1st representation of the aggregation is easier to read. The 1st representation is called the pivot view

To generate the pivot view, OBIEE first collects the entire set of data and then the presentation server organizes it in the desired format to display it
This method suffers on 2 accounts
      1.       
a.   The number of logical i/o from the DB is huge because OBIEE retrieves the entire result set. Huge number of i/o can be proved by a number of techniques such as the trace of the query and runstats.sql (given by Tom Kyte). Explain plan with statistics can also give us the info about consistent gets (set autotrace on explain statistics). The plan with statics is at the bottom of the article
b.   Because all the rows are fetched from the database, the number of round trips from the db server to the client are huge
      2.       
a.    All these rows will have to be sent to the presentation server. This will involve an added delay

We will use OBIEE's session log to find the time taken by the query to execute on the DB and the time taken for the generation of the pivot view. 

Modus operandi
      1.      We will create a 2 answers that show the pivoted data using 2 techniques
      2.      We will measure performance of each step in each technique

To do this test, I am using the following query to generate 100,000 records

   SELECT     DBMS_RANDOM.STRING ('u', 10) AS employee_name,
           CEIL (DBMS_RANDOM.VALUE (1, 10)) * 10 AS dept_number,
           DECODE (FLOOR (DBMS_RANDOM.VALUE (1, 5)),
                   1, 'Blue',
                   2, 'Green',
                   3, 'Purple',
                   4, 'Red'
                  ) AS shirt_color,
           1 employee_count
      FROM DUAL
CONNECT BY ROWNUM <= 100000


Now, let's start digging

Oracle 11G has a new function called pivot and I intend to leverage it for my benefit. My physical and BMM models are weird but those are not the focus of this article

I have created my physical and BMM is such a way that my queries fetch data from a single physical source.  The intention is to not add any extra load on the database just because OBIEE needs one fact and one dimension table to execute a query

My Physical Layer:





       1.      The dummy table is an opaque view with the following query
 select 1 from dual
       2.      Employees is an actual table generated with the query written above
       3.      Pivoted_table is again an opaque view with the following query
select *
from
  (select dept_number,shirt_color,employee_count
   from employees
   )
   pivot
    count(employee_count)
      for shirt_color in ('Blue' as Blue,'Green' as Green,'Purple' as Purple,'Red' as Red))

The above query is doing most of the trick. In this query, I am doing a count of the employees and then grouping then on the shirt_color field. Anything in the 'in' clause here, becomes the column of my pivoted table. So the output of this query will have 5 columns, namely: dept_number, Blue, Green, Purple and Red.



 





Since we know the column names, generated by the pivot query, so we can create the same for our opaque view as shown below










My BMM Layer:

I have kept the same model in BMM as well but I have created Right outer joins here. Creating right outer join ensures that all data from Pivoted_table and Employees is selected irrespective of the records in the dummy table.

So my BMM looks like the following






Both the joins here are right outer joins















My Presentation Layer:

My presentation layer does not have any manipulation. I have dragged my BMM catalog to the presentation layer

My web catalog:

I have created 2 answers, one each on Pivoted_table and Employees tables

The answer on Pivoted_table with table view looks like the following

















Now my other answer is on Employees table. I have selected the pivot view in the compound layout as shown below















 














Let's compare the performance now

Below are the screenshots of the 2 session logs. The 1st one is of the answer built on Pivoted_Table and the 2nd is on Employees table.

As we can see the the 1st only returned the 9 desired records. This is a huge benefit because we are reducing the traffic all across the channel. Right from the Db server to the repository to the presentation server to the engine generating the table view. The 2nd log shows that 100000 records were returned








Now let's dig a little deeper. Let's bifurcate the time into 2 parts
           1.      Time taken to execute the query
           2.      Time taken to generate the chart

I am pasting only the relevant portions of the session log here

Log of the answer on Pivoted_table: 
+++Administrator:2f0000:2f0004:----2012/01/15 02:37:12
-------------------- Sending query to database named Pivot Pool (id: <<8464>>):
select *
from
  (select dept_number,shirt_color,employee_count
   from employees
   )
   pivot
    (count(employee_count)
      for shirt_color in ('Blue' as Blue,'Green' as Green,'Purple' as Purple,'Red' as Red))
+++Administrator:2f0000:2f0004:----2012/01/15 02:37:12
-------------------- Execution Node: <<8464>>, Close Row Count = 9, Row Width = 32 bytes
+++Administrator:2f0000:2f0004:----2012/01/15 02:37:12
-------------------- Execution Node: <<8448>> Sort, Close Row Count = 9, Row Width = 32 bytes
+++Administrator:2f0000:2f0004:----2012/01/15 02:37:13
-------------------- Query Status: Successful Completion
+++Administrator:2f0000:2f0004:----2012/01/15 02:37:13
-------------------- Rows 9, bytes 288 retrieved from database query id: <<8464>>
+++Administrator:2f0000:2f0004:----2012/01/15 02:37:13
-------------------- Physical query response time 0 (seconds), id <<8464>>
+++Administrator:2f0000:2f0004:----2012/01/15 02:37:13
-------------------- Physical Query Summary Stats: Number of physical queries 1, Cumulative time 0, DB-connect time 0 (seconds)
+++Administrator:2f0000:2f0004:----2012/01/15 02:37:13
-------------------- Rows returned to Client 9
+++Administrator:2f0000:2f0004:----2012/01/15 02:37:13
-------------------- Logical Query Summary Stats: Elapsed time 1, Response time 0, Compilation time 0 (seconds)

Log of the answer on Employees:
+++Administrator:2f0000:2f0003:----2012/01/15 02:32:57
-------------------- Sending query to database named Pivot Pool (id: <<8409>>):
select T3334.SHIRT_COLOR as c1,
     T3334.DEPT_NUMBER as c2,
     T3334.EMPLOYEE_NAME as c3
from
     EMPLOYEES T3334
+++Administrator:2f0000:2f0003:----2012/01/15 02:33:09
+++Administrator:2f0000:2f0003:----2012/01/15 02:34:16
-------------------- Execution Node: <<8397>> DbGateway Exchange, Close Row Count = 100000, Row Width = 8048 bytes
+++Administrator:2f0000:2f0003:----2012/01/15 02:34:17
-------------------- Query Status: Successful Completion
+++Administrator:2f0000:2f0003:----2012/01/15 02:34:17
-------------------- Rows 100000, bytes 804800000 retrieved from database query id: <<8409>>
+++Administrator:2f0000:2f0003:----2012/01/15 02:34:17
-------------------- Query Status: Successful Completion
+++Administrator:2f0000:2f0003:----2012/01/15 02:34:17
-------------------- Rows 100000, bytes 804800000 retrieved from database query id: <<8409>>
+++Administrator:2f0000:2f0003:----2012/01/15 02:34:17
-------------------- Physical query response time 11 (seconds), id <<8409>>
+++Administrator:2f0000:2f0003:----2012/01/15 02:34:17
-------------------- Physical Query Summary Stats: Number of physical queries 1, Cumulative time 11, DB-connect time 0 (seconds)
+++Administrator:2f0000:2f0003:----2012/01/15 02:34:17
-------------------- Rows returned to Client 100000
+++Administrator:2f0000:2f0003:----2012/01/15 02:34:17
-------------------- Logical Query Summary Stats: Elapsed time 81, Response time 14, Compilation time 0 (seconds)


Let's compare the important timing metrics of the 2 answers
 





 
The definitions of these timings can be found at:


Let me also share the plan and the statistics



2 comments:

Leigh Wilson said...

Hi Vishal, nice post. I wouldn't have considered this as an option to be honest but it's good to see alternative ideas in play. Nice work :-)

Another idea that I use is to force the aggregation onto the physical layer by creating custom columns in Answers, forcing the aggregation via case statements etc so the outcome (in a table view) is what you're looking for as the end result. Then to make it look and feel like a normal pivot view I then render it in a pivot view. This results in the database and OBI layer doing the hard work, returning a very compact result set for the client (Internet Explorer) to render. One misconception that many people have is that pivot views are pivoted on the server when they aren't. If the underlying result set is 1000 rows, all of the rows are sent to the client Internet Explorer where it is then rendered as a pivot view. This results in lag and sometimes very long delays in pivot views appearing.

To see this pivoting/client lag, run a large pivot report with the processes/memory monitoring on... the memory and cpu will go through the roof. This is a serious problem for some clients when they are offsite, small bandwidth etc.

Thoughts?

Vishal Pathak said...

Hello Leigh,

Thanks for the good words

I believe, your's is a very interesting approach. The best part of your approach is that one can improve the performance of pivot view even using 10G rdbms system.

But again, if one is working on 11G DB then I believe, this approach is better as using case statements will again spoil the plan.

I haven't started working on obiee 11g yet, but

I expect OBIEE to handle pivoting in a more intelligent way in the newer version