SQL Datafix

This is a Rosetta Code post.

Story

Consider the following relational database: FooDatabase

dbo.FooDatabase

In the context of this database a tenant can only be one of App One, App Two or User. Each User has their own unique GUID and the applications have the following GUIDs:

Application Name GUID
App One 3ba2e2e8-f093-41d8-8daf-64d5385604ac
App Two efec5b17-47fc-4d79-ae74-6e8b023023d5

13 333 records in dbo.item_line have been persisted incorrectly. The column dbo.item_line.tenant_id has been populated with the GUID for App One for the period 01 Jan 2018 to 31 Dec 2018 when it should have been populated with the GUID for the User tenant. This date is stored in dbo.item_line.created_date_uct.

Thankfully dbo.item_line.created_by_user_id is a mandatory field and will contain valid user GUIDs. Note that not all users will exist in dbo.tenant.

The cleanup must be idempotent, this means it can be run multiple times without changing the result beyond the initial story/task.

Task

Cleanup the data using a SQL script in a safe manner.

  • Identify all records in dbo.item_line that need to be updated, print the count.
  • Create records in dbo.tenant that dont exist when linking to dbo.item_line.created_by_user_id, print this count too.
  • Update dbo.item_line.tenant_id to be the newly created dbo.tenant.id when it did not exist, else update to be dbo.item_line.created_by_user_id. Print the total count of updates. This should match the first count printed.

Using the supplied data the prints and counts should be:

1
2
3
total dbo.item_line to be updated : 13333
total new dbo.tenant records : 12983
actual updates to dbo.item_line : 13333

Setup

Spin up a MS SQL Instance, you can then connect to it from Microsoft SQL Server Management Studio on localhost,1500

1
docker run --name=rosetta-df -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=Password123" -p 1500:1433 -d mcr.microsoft.com/mssql/server:2017-latest
1
2
3
4
5
001 CREATE DATABASE.sql
002 CREATE TABLES.sql
003 SEED dbo.tenant.sql
004 SEED dbo.line_item.sql
005 SEED dbo.tenant.sql

This should get the data into a state that matches the counts above.

Solutions

WIP

set operations

A set is an exactly defined collection of objects, this can be accomplished by selecting into a #temptable.

cursor

Use a STATIC cursor to iterate over the data.

WITH common_table_expression

Specifies a temporary named result set, known as a common table expression (CTE).

References