Explorations in SQL: Fixing Duplicates for Index



I had an issue with some indexes that would not create. The database complained that the table had duplicates in it that violated the index.  For me, the table was PS_AUC_HDR, but these steps should work for any table.  Maybe some of the techniques will be helpful too.  Also, if you know a better way to take care of the problem, please comment.

Identifying the offending rows

The first step is to determine which rows are the offending rows.  So, you need to know the keys for the table.  You can either open App Designer and look, or you can query the PSKEYDEFN table:

SELECT * FROM PSKEYDEFN
WHERE RECNAME = 'AUC_HDR'
AND INDEXID = '_';

For the PS_AUC_HDR, the keys are:

  • BUSINESS_UNIT
  • AUC_ID
  • AUC_ROUND
  • AUC_VERSION

So, you can use this query to view the rows that offend the key:

SELECT BUSINESS_UNIT, AUC_ID, AUC_ROUND, AUC_VERSION, COUNT(*)
FROM PS_AUC_HDR
GROUP BY BUSINESS_UNIT, AUC_ID, AUC_ROUND, AUC_VERSION
HAVING COUNT(*) > 1
ORDER BY BUSINESS_UNIT, AUC_ID DESC, AUC_ROUND, AUC_VERSION;

If it was just one row, I could probably manually delete that one duplicate row, but I had 44 rows each with a count of 2.  A quick count of all the rows in the table resulted in 88 rows.  So, somehow my data got doubled up.

TRUNCATE TABLE PS_AUC_HDR;

Now, we need to fix it.

Backing up

Taking a database backup would be a good idea, but here is a quick way to backup the data in just this table:

CREATE TABLE PS_AUC_HDR_BACKUP AS
SELECT * FROM PS_AUC_HDR;

Simple Fix

If this were a simple table, we could apply this simple fix.  This assumes that all of the data is the same (not just the keys) and that the table has no long character fields.  (It won’t work for PS_AUC_HDR because of the DESCRLONG field).

First, we create a copy of the data using distinct to eliminate the duplicates.

 CREATE TABLE PS_AUC_HDR_TEMP
 SELECT DISTINCT *
 FROM PS_AUC_HDR
 

Next, we get rid of the data in the original table.

TRUNCATE TABLE PS_AUC_HDR;

Finally, we copy the good data back to the original table.

INSERT INTO PS_AUC_HDR
SELECT * FROM PS_AUC_HDR_TEMP;

Then, we can clean up.

DROP TABLE PS_AUC_HDR_TEMP;

The Hard Fix

Nothing is ever as easy as you think it will be.  We have to handle that DESCRLONG field special.

First, we need to determine what our largest value is.  Here’s a quick query:

SELECT MAX(LENGTH(DESCRLONG))
FROM PS_AUC_HDR

You can view the data that is too long with this query:

SELECT *
FROM PS_AUC_HDR
WHERE LENGTH(DESCRLONG) > 255
ORDER BY BUSINESS_UNIT, AUC_ID DESC, AUC_ROUND, AUC_VERSION;

We will need to break this field into multiple fields of less than 256 characters each.  For me, my longest value was less than 400, so I chose to break it into two fields of 200 characters each.  I created a table like this:

CREATE TABLE PS_AUC_HDR_DESCR_TMP
(BUSINESS_UNIT VARCHAR2(5),
 AUC_ID VARCHAR2(10),
 AUC_ROUND NUMBER,
 AUC_VERSION NUMBER,
 DESCRLONG1 VARCHAR2(200),
 DESCRLONG2 VARCHAR2(200),
 DESCRLEN NUMBER);

Now, we need to copy the DESCRLONG values to this new table with the keys.  This will give us the data in a place with no Long Character fields.

INSERT INTO PS_AUC_HDR_DESCR_TMP
SELECT BUSINESS_UNIT, AUC_ID, AUC_ROUND, AUC_VERSION,
SUBSTR(DESCRLONG, 1, 200) DESCRLONG1,
SUBSTR(DESCRLONG, 201, 200) DESCRLONG2,
LENGTH(DESCRLONG) DESCRLEN
FROM PS_AUC_HDR;

Now, we need to create our copy of the distinct rows.  We need to exclude the DESCRLONG field so we will have to explicitly list each field by name.  That is a lot of typing, but you can save some of the typing with this query:


SELECT ', ' || FIELDNAME FROM PSRECFIELDALL WHERE RECNAME = 'AUC_HDR';

You can copy that list (delete the first comma) to create something like this.  I am deleting most of the fields to save space.  Notice that I moved the DESCRLONG field down to the end and added the ‘ ‘ to make it blank.

CREATE TABLE PS_AUC_HDR_TEMP AS
SELECT DISTINCT AUC_ALLOW_GRP
, AUC_ALLOW_GRP_PBRK
, AUC_ALLOW_IDENTITY
, AUC_ALL_LINES_FLG
, AUC_AWD_COL_DUEDT
.....
, ' ' DESCRLONG
FROM PS_AUC_HDR

We can wipe the duplicate data:

TRUNCATE TABLE PS_AUC_HDR;

Then, you can copy the data back.  Note that you have to use the fields again because they are probably in a different order.

INSERT INTO PS_AUC_HDR(AUC_ALLOW_GRP
, AUC_ALLOW_GRP_PBRK
, AUC_ALLOW_IDENTITY
, AUC_ALL_LINES_FLG
, AUC_AWD_COL_DUEDT
....
, DESCRLONG)
SELECT AUC_ALLOW_GRP
, AUC_ALLOW_GRP_PBRK
, AUC_ALLOW_IDENTITY
, AUC_ALL_LINES_FLG
, AUC_AWD_COL_DUEDT
....
, DESCRLONG
FROM PS_AUC_HDR_TEMP

Now, we need to bring the DESCRLONG value back into the table.  Here is a query that should do just that:

UPDATE PS_AUC_HDR
SET DESCRLONG = (SELECT DISTINCT(DESCRLONG1 || DESCRLONG2)
                 FROM PS_AUC_HDR_DESCR_TMP A
                 WHERE A.BUSINESS_UNIT = PS_AUC_HDR.BUSINESS_UNIT
                 AND A.AUC_ID = PS_AUC_HDR.AUC_ID
                 AND A.AUC_ROUND = PS_AUC_HDR.AUC_ROUND
                 AND A.AUC_VERSION = PS_AUC_HDR.AUC_VERSION);

Now, the problem should be fixed, and our index should build with no problem.


Share