Using nz_zonemap to visualize Netezza’s zone map effectiveness

Netezza has a lot of tools in /nz/support/contrib/bin that make life for the NZ DBA much, MUCH easier.  One such tool is nz_zonemap.
Zone maps are how the system keeps track of what records exist in a particular extent (3MB unit of storage).  Each extent will have the minimum and maximum value recorded for each of the columns zone map information exists.  As new data gets written to the table and new extents are allocated, more zone map information gets captured.

When users are querying the data, zone maps are used to immediately eliminate all of the extents that we KNOW don’t have the data we are interested in.  The more selective zone maps are (narrow min/max ranges) the more effective they become at eliminating disk I/O.  Generally speaking, larger tables should be organized on the one or two most commonly occurring restrictions.

To get a list of the columns that are currently zone mapped, run the script with only the database and table name as parameters.

[nz@netezza~]$ /nz/support/contrib/bin/nz_zonemap foo pos_txn_dtl

   Database: FOO
Object Name: POS_TXN_DTL
Object Type: TABLE
Object ID  : 1183666

The zonemappable columns are:

 Column # | Column Name | Data Type
----------+-------------+-----------
        1 | TXID        | BIGINT
        2 | PRODUCTID   | BIGINT
        3 | CUSTOMERID  | INTEGER
(3 rows)

The output here indicates that zone maps exist for the txid, productid and customerid columns.  We could then look at the minimum/maximum range on each extent for a given data slice.  By default, data slice 1 is used.  You could override this default by specifying a particular data slice with the option ‘-dsid NN’.  Using the customerid column in this example, the output looks like:

[nz@netezza~]$ /nz/support/contrib/bin/nz_zonemap foo pos_txn_dtl customerid

   Database: FOO
Object Name: POS_TXN_DTL
Object Type: TABLE
Object ID  : 1183666
 Data Slice: 1
   Column 1: CUSTOMERID  (INTEGER)

 Extent # | CUSTOMERID (Min) | CUSTOMERID (Max) | ORDER'ed
----------+------------------+------------------+----------
        1 | 3                | 199995           |
        2 | 3                | 199995           |
        3 | 3                | 199995           |
        4 | 3                | 199995           |
        5 | 3                | 199995           |
        6 | 3                | 199995           |
        7 | 3                | 199995           |
        8 | 3                | 199995           |
        9 | 3                | 199995           |
       10 | 3                | 199995           |
       11 | 3                | 199995           |
       12 | 3                | 199995           |
       13 | 3                | 199995           |
       14 | 3                | 199995           |
(14 rows)

This tells me that on data slice 1, any query restricting on customerid for some value or range of values will essentially have to perform a table scan since the min/max range on each extent is so wide.  We can narrow this down one of two ways:  clustered base table or re-order the table with a CTAS statement.  Before we do either, I’ll get the elapsed time for a query against this table; it should be quick.

FOO(ADMIN)=> \time
Query time printout on
FOO(ADMIN)=> select count(1) from pos_txn_dtl where customerid = 100000;
 COUNT
-------
   800
(1 row)

Elapsed time: 0m0.468s

So it is taking us 1/2 a second to query this table (it is very small).  If this type of query ran thousands of times per day than even a 1/10th second saving could add up.  Next, let’s organize the table on customerid and then groom it.

FOO(ADMIN)=> alter table pos_txn_dtl organize on (customerid);
ALTER TABLE
Elapsed time: 0m0.357s
FOO(ADMIN)=> groom table pos_txn_dtl;
NOTICE:  Groom processed 14792 pages; purged 0 records; scan size shrunk by 528 pages; table size shrunk by 48 extents.
GROOM ORGANIZE READY
Elapsed time: 0m9.764s

So our table is now organized on this column.  Doing so netted us another advantage: the table size shrunk!  The columnar compression techniques Netezza uses became more effective when this table’s sort order on disk changed.

Taking a look at the output of nz_zonemap now should show something very different.

[nz@netezza~]$ /nz/support/contrib/bin/nz_zonemap foo pos_txn_dtl customerid

   Database: FOO
Object Name: POS_TXN_DTL
Object Type: TABLE
Object ID  : 1183666
 Data Slice: 1
   Column 1: CUSTOMERID  (INTEGER)

 Extent # | CUSTOMERID (Min) | CUSTOMERID (Max) | ORDER'ed
----------+------------------+------------------+----------
        1 | 3                | 15531            |
        2 | 15542            | 30818            | TRUE
        3 | 30818            | 47035            | TRUE
        4 | 47035            | 62320            | TRUE
        5 | 62320            | 77965            | TRUE
        6 | 77965            | 93140            | TRUE
        7 | 93140            | 108599           | TRUE
        8 | 108599           | 123914           | TRUE
        9 | 123914           | 138584           | TRUE
       10 | 138584           | 155353           | TRUE
       11 | 155353           | 170317           | TRUE
       12 | 170317           | 185599           | TRUE
       13 | 185599           | 199995           | TRUE
(13 rows)

Now the min/max ranges are very, very narrow and the ORDERED indicator shows true — meaning the minimum value on this extent is greater than or equal to the maximum value of the previous extent.  Queries against this table restricting on customerid should now scan as little data as is possible to find the relevant records.

FOO(ADMIN)=> \time
Query time printout on
FOO(ADMIN)=> select count(1) from pos_txn_dtl where customerid = 100000;                                                              COUNT
-------
   800
(1 row)

Elapsed time: 0m0.054s

Our query went from .46 seconds to .05 seconds — or 9x faster.  This same approach can be used with much larger tables; the key here is that you are able to visualize the distribution of values across zone map entries using the nz_zonemap tool.

Now, where to start your research into which tables to optimize is another story.  You could start with http://wp.me/p1U3hY-A to identify average query response time by table.

Advertisements
This entry was posted in administration, General, Performance and tagged , , , , . Bookmark the permalink.

3 Responses to Using nz_zonemap to visualize Netezza’s zone map effectiveness

  1. marees says:

    Will running “Groom” automatically update zonemaps for fields specified in “Organize” or we need to gather stats again using “Generate Statistics”?

    I understand from your post above that Groom will automatically update zonemaps of the Organize fields. Is my understanding correct?

    Just to clarify, I have a 22 billion records table for which incremental groom takes only 3 minutes but stats always takes 10 minutes. So I would like to avoid running “generate statistics” as much as possible.

  2. dbanetezza says:

    Groom creates new blocks of data when physically re-ordering (or organizing) the rows in your table. Whenever a new block of data is written, the same zone map rules apply and as data is written to the block the minimum and maximum values are stored.

    If the fields found in the ‘ORGANIZE ON’ clause are not of a type supported by zone maps (think varying character) than zone map support is automatically extended to these columns. This means that zone maps apply when grooming but more importantly when adding or updating existing rows (just like an integer, date, timestamp, smallint, bigint, etc).

    Statistics on the other hand describe the entire table and aren’t focused on an extent or page like zone maps are. Wider tables require multiple passes on the table to complete (I think the default is 20 columns at a time?). For tables having 22B rows, we don’t calculate cardinality at the time of generating statistics (we use sample scans at runtime instead). It’d be interesting to know how wide this table is and whether or not we have to scan it several times. One way around this is to compute statistics for only those columns that appear in queries as join columns, group by columns, etc. You could run this frequently and perhaps once a week do a full statistics on that table.

    I hope this helps.

    • marees says:

      Thanks I will check (with specific columns). Also our DBAs rolled back to a prior version of NPS 7 due to bugs in the latest 7.x.xx when running GenStats. that should also improve performance I think.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s