IBM Netezza analytics to analyze query history table usage

Using Netezza’s in-database analytics package FPGROWTH, database administrators can identify the most commonly used combination of tables and the performance of the queries that reference those sets of tables.

First, let’s see the most commonly used combination of tables.  Today, FPGROWTH requires that you specify a unique ID within which associations are discovered.  With query history data, the unique identifier is a composite key made up of NPSID, NPSINSTANCEID & OPID.  We have to create a single key that we can feed into the procedure, which can be done as shown below:

1. Create a sequence that we can use to generate a unique key
create sequence seq_qhist_id as bigint;

2. Next, create a table that associates each unique NPSID, NPSINSTANCEID & OPID with a newly generated sequence value.

create table tbl_qhist_id_assoc
as
select        *,
                  next value for seq_qhist_id qhist_id
from         ( select         distinct npsid,
                                                npsinstanceid,
                                                opid
                  from           poc_qhist_db..”$hist_table_access_1″
                ) x;

3. Create a view that takes the data in $hist_table_access_1 and associates each record with the appropriate ID generated in step 2

create view vw_table_access_stats
as
select      t.*,
               a.qhist_id
from        poc_qhist_db..”$hist_table_access_1″ t
inner join tbl_qhist_id_assoc a
                using ( npsid, npsinstanceid, opid )
;

4. Prep work is done…. now we can run FPGROWTH against the view and find which tables are determined to be used most frequently within the same queries.bg

QHIST_ANALYSIS(ADMIN)=> call nza..fpgrowth(‘intable=vw_table_access_stats,tid=qhist_id,item=TABLEID,supportType=absolute,support=1000,pfx=qhist_analysis_0315′);
NOTICE:
RUNNING FPGrowth algorithm:
DATASET : “VW_TABLE_ACCESS_STATS”
Transaction column : “QHIST_ID”
Item column : “TABLEID”
Group by : <none>
Minimum support : 1000  transactions
Min frequent itemset size : 0
Max frequent itemset size : 1000000000
Level of conditional dbs : 1
Result tables prefix : “QHIST_ANALYSIS_0315_FP_”

FPGROWTH
———-
5843
(1 row)

We can see that this produced a number of “sets” with the command:

\dt qhist_analysis_0315

Our FPGROWTH command was instructed to produce a series of tables with the prefix qhist_analysis_0315; so \dt will display all tables with that string at the start of the object’s name.

QHIST_ANALYSIS(ADMIN)=> \dt qhist_analysis_0315
List of relations
Name             | Type  | Owner
——————————+——-+——-
QHIST_ANALYSIS_0315_FP_SET1  | TABLE | ADMIN
QHIST_ANALYSIS_0315_FP_SET10 | TABLE | ADMIN
QHIST_ANALYSIS_0315_FP_SET11 | TABLE | ADMIN
QHIST_ANALYSIS_0315_FP_SET12 | TABLE | ADMIN
QHIST_ANALYSIS_0315_FP_SET2  | TABLE | ADMIN
QHIST_ANALYSIS_0315_FP_SET3  | TABLE | ADMIN
QHIST_ANALYSIS_0315_FP_SET4  | TABLE | ADMIN
QHIST_ANALYSIS_0315_FP_SET5  | TABLE | ADMIN
QHIST_ANALYSIS_0315_FP_SET6  | TABLE | ADMIN
QHIST_ANALYSIS_0315_FP_SET7  | TABLE | ADMIN
QHIST_ANALYSIS_0315_FP_SET8  | TABLE | ADMIN
QHIST_ANALYSIS_0315_FP_SET9  | TABLE | ADMIN

Let’s start with a chunk we can digest easily; sets of 5 objects.

QHIST_ANALYSIS(ADMIN)=> select * from QHIST_ANALYSIS_0315_FP_SET5 order by sup desc limit 10;
ITEM1  |  ITEM2  |  ITEM3  |  ITEM4  |  ITEM5  |  SUP   | GRP
———+———+———+———+———+——–+—–
1928132 | 1927319 | 1927115 | 1927921 | 1927709 | 103640 |   0
1927115 | 1928606 | 1927921 | 1927319 | 1927709 | 103635 |   0
1928132 | 1928606 | 1927115 | 1927921 | 1927709 | 103635 |   0
1928132 | 1928606 | 1927319 | 1927115 | 1927921 | 103635 |   0
1928132 | 1928606 | 1927319 | 1927115 | 1927709 | 103635 |   0
1928132 | 1928606 | 1927319 | 1927921 | 1927709 | 103635 |   0
5023 |    5006 |    5014 |    1260 |    5093 |  66411 |   0
5618 |    5014 |    1260 |    5093 |    5006 |  66371 |   0
5023 |    5618 |    5006 |    5014 |    1260 |  66331 |   0
5023 |    5618 |    5006 |    1260 |    5093 |  66331 |   0

We see some pretty dominant patterns of table usage in the first 6 rows of the result set.  What we care about, however, is a large set of objects involved in queries that tend to be slower.

SELECT
setid,
avg(EXTRACT(epoch from finishtime-submittime))
FROM
(
SELECT
*
FROM
(
SELECT
b.setid,
npsid,
npsinstanceid,
opid
FROM
poc_qhist_db..”$hist_table_access_1″ a,
(
SELECT
‘set-’||item1 ||’.’||item2 ||’.’||item3||’.’||item4||’.’||item5 setid,
*
FROM
QHIST_ANALYSIS_0315_FP_SET5
) b
WHERE
a.tableid IN (b.item1, b.item2, b.item3, b.item4, b.item5)
AND tableid > 200000
GROUP BY
setid,
npsid,
npsinstanceid,
opid
HAVING
COUNT(1) = 5
) a
JOIN poc_qhist_db..”$hist_query_prolog_1″ p
USING(npsid, npsinstanceid, opid)
JOIN poc_qhist_db..”$hist_query_epilog_1″ e
USING(npsid, npsinstanceid, opid)
) a
GROUP BY
1
ORDER BY
2 DESC

This produces a result like:

SETID                    |   AVG
———————————————+———-
set-1928132.1927319.1927115.1927921.1927709 | 7.195417
set-1927115.1928606.1927921.1927319.1927709 | 7.190187
set-1928132.1928606.1927319.1927921.1927709 | 7.190187
set-1928132.1928606.1927319.1927115.1927709 | 7.190187
set-1928132.1928606.1927115.1927921.1927709 | 7.190187
set-1928132.1928606.1927319.1927115.1927921 | 7.190187
set-2215266.1928606.2213789.2061762.2050344 | 1.659014
set-2215266.2050576.2213789.2061762.2050344 | 1.659014
set-2215266.2050576.1928606.2213789.2061762 | 1.659014
set-2213789.2050576.1928606.2050344.2061762 | 1.659014
set-2215266.2050576.1928606.2213789.2050344 | 1.659014
set-2215266.2050576.1928606.2061762.2050344 | 1.659014
set-972025.969171.969255.970648.968881      | 0.656696

Now, what is happening with this data is that there is one query that references 6 tables — but with each possible combination of 5 showed up several times.  This is why you see the same averages for 5 different rows.

 

About these ads
This entry was posted in administration, analytics, Performance. Bookmark the permalink.

2 Responses to IBM Netezza analytics to analyze query history table usage

  1. Pingback: Netezza Query History Table — Blog Articles, Funny, Templates, Travel

  2. dbanetezza says:

    Not that I’m aware of; this solution is really a combination of two very different features within Netezza: advanced query history which tracks access all the way down to the column-level for historical reporting and the advanced analytic suite which allows us to use fpgrowth to identify frequent pattern sets within that query data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s