Assessing query performance … by table?

While looking for performance optimization opportunities, I decided to look at things from a slightly different perspective: by table.  I’m not entirely sure how helpful this will be – but since I wrote the query I figured I’d put it out here.

First, this requires the advanced query history mechanism be enabled and configured to capture table access information.  You can learn more about creating / altering history configurations in the administrator’s guide or in the Netezza Technical Support KB.  Once this has been in place for some time, you can analyze the data it contains.

Here is the query I used to get the average query duration, min/max, standard deviation and total number of queries involving that table.

select     ta.tablename,
           CASE WHEN reltuples < 0 THEN ((2^32)::bigint * relrefs)  +  ((2^32)::bigint + reltuples )
                ELSE                    ((2^32)::bigint * relrefs)  +  (                 reltuples ) 
           END rowcount,
           avg(runtime_seconds),
           min(runtime_seconds),
           max(runtime_seconds),
           stddev(runtime_seconds),
           count(1) query_count
from nz_qryhistory.."$hist_table_access_1" ta
inner join _t_class c on c.relfilenode = ta.tableid
inner join 
         (
            select     npsid, npsinstanceid,s.opid,s.planid,s.sessionid,ismainplan,
                       extract(epoch from f.endtime-s.starttime) runtime_seconds 
            from       nz_qryhistory.."$hist_plan_prolog_1" s 
            inner join nz_qryhistory.."$hist_plan_epilog_1" f using ( npsid, npsinstanceid, planid )
            where s.starttime::date >= current_date - 14
          ) a using (npsid, npsinstanceid,opid,sessionid )
where        tableid > 200000
and            ismainplan = 't'
and         (usage & ( 1 << 0 ) ) = 1
group by     1,2
order by    3 desc

Note the emboldened text: this is analyzing query history for only the last 14 days.  You can configure this to run for as much history as you are interested in.

The results will look like this:

   TABLENAME    | ROWCOUNT |   AVG    | MIN | MAX |      STDDEV      | QUERY_COUNT
----------------+----------+----------+-----+-----+------------------+-------------
 GEO            |    36607 | 1.500000 |   1 |   2 | 0.70710678118655 |           2
 COMMUNE        |    36607 | 1.500000 |   1 |   2 | 0.70710678118655 |           2
 JDB_OC         |     2124 | 1.400000 |   1 |   2 | 0.54772255750517 |           5
 COMPTEUR       |     2408 | 1.400000 |   1 |   2 | 0.54772255750517 |           5
 PDL            |     2408 | 1.400000 |   1 |   2 | 0.54772255750517 |           5
 PMAX           |    74648 | 1.400000 |   1 |   2 | 0.54772255750517 |           5
 CONTRAT        |     2408 | 0.777778 |   0 |   2 |  0.8333336666666 |           9
 CONTRAT_MASTER | 34290838 | 0.000000 |   0 |   0 |                0 |           1
 RELEVE         | 10749312 | 0.000000 |   0 |   0 |                0 |           2

Here we can see the GEO table has been queried twice, has 36k rows.  Queries referencing this table take on average 1.5 seconds to complete.

Now clearly this level of information isn’t enough to base anything on BUT it does seem like an interesting way to look at query history data.  Obviously larger tables having more complex queries will always take longest to complete, but this would point you to those most commonly referenced allowing you to focus your time on optimizing these first.

Advertisements
This entry was posted in administration, General, Performance and tagged , , . Bookmark the permalink.

2 Responses to Assessing query performance … by table?

  1. bunditj says:

    Hello,

    I am a newbie on nz appliance. Is there any point of concern about enabling the enable_collect_history and table level ?

    Thank you,
    BunditJ

    • dbanetezza says:

      Bundit,

      Absolutely not. The query history collection utility can be configured to run as often or as infrequently as you like. As users/jobs submit queries, a very small amount of data is written to /nz/data/hist/staging. Every n minutes (loadinterval setting your history configuration), the system checks to see if at least x mb of data has been staged (where x is the loadminthreshold). I typically set it up to run every 10 minutes with no loadminthreshold; this way, I know that I have all of the query data up until 10 minutes earlier has been loaded.

      On very busy systems, the amount of data staged could add up — but even then you will not see GBs of data being staged; it will still be in the mb range. And loading hundreds of mbs of data every 10 or 15 minutes it not anything to worry about.

      That said, you should use nzhistdbcleanup to remove old query history — unless of course you want to keep everything.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s