How to properly REORG a TABLE in Db2 LUW — different scenarios
How do you know that your table needs a reorg? Are there many empty pages? Maybe you did many deletes lately or changed column types? Or maybe you just want to reorder the rows?
These questions might seem unimportant if you already decided that a reorg is needed. Well… it depends on your performance expectations. In some cases a simple full reorg regardless of the problem you want to solve might be a ‘good enough’ option. But this article is called „How to properly REORG a TABLE…“, which means we’ll be looking for precisely tailored solutions. In the second part of the article, I am going to analyse specific cases, but as a warm up, let’s start with some theory.
What types of REORGs are available?
Classic reorg
Basically, there are two types of table reorgs in Db2 LUW: classic (offline) and inplace (online).
The classic reorg looks like this:
1. The first thing is to SORT the records. This is performed in two cases:
- if you specified an index in the reorg command (“REORG TABLE <tableName> INDEX <indexName>”) → then the data is sorted according to this index;
- the other reason for sorting the records is that the table has a clustering index specified.
2. Secondly, the BUILD phase comes. This is the main phase of the reorg process. Db2 creates a reorganised copy of the original table. The copy is created in the same table space by default, but you can specify a table space for the copy, as follows: “reorg table <tabName> use <tablespaceName>”.
3. Thirdly, the REPLACE is done. During this phase, the original table is replaced by a reorganised copy. If the copy resides in the same table space, then we only need to have a pointer created — the reorganised copy becomes the table and the original table is deleted. But if you used another table space for a copy, then the copy is moved and takes place of the original table. This approach takes a little more time, but doesn’t require additional free space in the table space you store your table (it uses a temp table space).
4. Last phase is called RECREATE ALL INDEXES. As the name suggests, all the indexes created on the original table are recreated on the reorganised one.
BUILD and RECREATE INDEXES phases can take long time, while SORT and REPLACE phases are short.
The classic reorg is simple, fast and effective. But there is one big drawback. It makes the table unmodifiable during the reorg. Reading is possible during the BUILD phase. At least that!
Inplace reorg
Therefore IBM introduced another type of reorgs, namely inplace. They’re sometimes called online, to contrast them with classic offline reorgs, but it’s not fully true. That’s why I use quotes around the word „online“.
Inplace reorgs allow full access to the table that is being reorganised, also in the write mode. The reason is that the table is reorganised piece by piece. There is no copy of the table, everything is done in-place. Data is moved within the original object.
The inplace reorg has also 4 phases, but they’re different from the ones of the classic reorg:
- SELECT — in this phase the database manager selects a group of consecutive extents for a reorganisation.
- Vacate the selected range — database manager moves all records of the selected range to the free pages section. Each row leaves behind a pointer (record pointer, RP) that contains the RID of the new row’s location. The row is placed on a free page in the table as a REORG table overflow (RO) record that contains the data. After the utility finishes moving a set of rows, it waits until all applications that are accessing data in the table are finished. These ‘old scanners’ use old RIDs when table data is accessed. Any table access that starts during this waiting period (‘new scanner’) uses new RIDs to access the data. When all of the old scanners are complete, the REORG utility cleans up the moved rows, deleting RP records and converting RO records into regular records.
- Fill the range — after all rows in a specific range are vacated, they are written back in a reorganised format, they are sorted according to any indexes that were used, and obeying any PCTFREE restrictions that were defined. When all of the pages in the range are rewritten, the next n sequential pages in the table are selected, and the process is repeated.
- Truncate the table — by default, when all pages in the table are reorganised, the table is truncated to reclaim space. If the NOTRUNCATE option is specified, the reorganized table is not truncated.
Although inplace reorg is more flexible, because you can pause it if needed and it allows write access during almost the entire operation, it is significantly slower than the classic reorg and also not so effective.
Partial reorgs
Except for the full reorgs (either classic or inplace) there are also specialised partial reorgs. For example, if you just need to reclaim unused extents, you can do it (this option concerns only specific tables). If you just need to clean up overflows, you can do it as well. The syntax for these options is the following:
- REORG TABLE <tabName> RECLAIM EXTENTS,
- REORG TABLE <tabName> INPLACE CLEANUP OVERFLOWS.
To sum up, you have 4 reorg options:
- classic REORG,
- inplace full REORG,
- inplace cleanup overflows,
- and reclaim extents (only for MDC and ITC tables).
But let’s move from theory to practice.
Case study
Here is a sample REORGCHK output, table statistics section. There are 8 non-partitioned tables. Their names suggest what could be wrong with each of them. At first, let’s look at the last table: TAB_OVERFLOW_ROWS. The F2 and F3 indicators are not so bad (more about F2, F3 and all other REORGCHK stuff is described here). The main problem here might be slower read and high I/O due to overflow rows. Namely 6% of rows are spread across more than one page. It means doubled read for those records. In such case, instead of the full reorg, we can just cleanup overflows (cleanup overflows is an inplace reorg). So, I let’s run the following command:
> db2 “reorg table TAB_OVERFLOW_ROWS inplace cleanup overflows”
As a result of this short operation, the overflow problem is solved. Please note that although indicator F3 improved, indicator F2 worsened. If you want to not only cleanup overflows, but also get rid of unused space, making the table more compact, you can do the full reorg:
> db2 “reorg table TAB_OVERFLOW_ROWS inplace”
As a result, not only we got rid of the overflows (OV or F1), but also the number of pages used (FP) is smaller (some space has been released) and all indicators improved.
Speaking of releasing unused space… Let’s take a look at table TAB_FREE_EXTENTS. It has no overflows, but more than one fifth of the pages are completely unused (indicator F3). It means they contain only psuedo-deleted records. It seems that is such a case the perfect solution would be the RECLAIM EXTENTS reorg. Let’s try this:
> db2 “reorg table TAB_FREE_EXTENTS reclaim extents”
Hmm… I got an error... Let’s find out in the IBM documenation what does it mean:
This error tells me that I can RECLAIM EXTENTS only on MDC and ITC tables. My table is a usual simple table, so the full reorg is the only option here. OK, let’s bring out the guns and reorg all tables using the full reorg. At first I am going to use the classic reorg, then the inplace one, so that we can compare the results. I run the following command against each of my 8 tables:
> db2 “reorg table <tableName>”
In my case the execution took seconds, but in the real world, doing classic reorgs on many tables can last days! That’s why the cleanup overflows command is sometimes really useful.
Wow! The result is almost perfect! As the indicators show, the tables are very well organised now. Let’s compare it with the REORGCHK output after the inplace full reorg of all tables:
The inplace reorg is almost as good as the classic one. For tables marked in green, the results are the same, but for tables marked in red there is a slight difference in favour of classic reorg when it comes to F2 indicator, which is “Effective Space Utilisation of Data Pages”. You can see that these red tables occupy a little more pages after the inplace reorg comparing to the classic one. The reason is that the inplace reorg is not so good in data clustering. You may say that the difference is small. Yes, but I had no transactions on my tables during this inplace reorg. In case I would run inplace reorgs during high traffic time, clustering could be much worse.
You may notice, there is no difference in table size (TSIZE column). Not even a byte. The reason is that the TSIZE column shows only an estimate size of a table. It is imprecise and may be wrong by few percents.
Index statistics
Table statistics is not the only part of REORGCHK output you should focus on, when you are designing your REORG TABLE statement. The thing is that table records are stored physically in data pages in some order. And, of course, you may not care and leave the order whatever it is, but it might strike back during the table reads. So, in order to read fast, you should keep the records ordered according to the index you use frequently. For this section I have prepared table “TAB_INDEX_PROBLEM”. It has two simple indexes on two different columns. Its F1, F2 and F3 indicators are fine. Let’s look at the REORGCHK output, but this time at the index statistics section:
It shows all the indexes grouped in tables they relate to (in my case only one table has any indexes). The only value we’re going to focus on here is F4, which indicates the percentage of the table that is stored in the same order as the given index. As you can see, the F4 values for the indexes are consecutively: 0 and 14, which means the table is terribly ordered regardless of which index you choose. When I perform a standard reorg on the table (“db2 reorg table TAB_INDEX_PROBLEM”), the index statistics don’t change. But, I can also do a reorg with an index option. Namely, I can reorganise a table and reorder its rows, so that they’re clustered according to its index. Let’s try:
> db2 “reorg table TAB_INDEX_PROBLEM index IDX_COL_1”
As a result of the reorg, the table is perfectly ordered according to IDX_COL_1 — its F4 is 100 (by the way, F4 of the second index also changed a little). Great, now let’s do the same with IDX_COL_2!
> db2 “reorg table TAB_INDEX_PROBLEM index IDX_COL_2”
After reorganising the table according to the second index, we got an analogical improvement: the F4 of IDX_COL_2 is perfect. However, the F4 of IDX_COL_1 is a catastrophe now! The two conclusions here:
- You cannot order books in the library both according to titles and authors. You have to choose one index: either titles or authors. In our case: either index on COL_1 or index in COL_2.
- Don’t reorg blindly, when you see an asterisk in the REORG column (and F4 is not the only case you should be careful).
And that’s it! I hope I managed to explain briefly what options to reorganise a table do you have and how to choose the best one. If you want to learn more about the REORGCHK tool, here is my article about it.