Change data capture utilizes the SQL Server Agent to log insertions, updates, and deletions occurring in a table. So, it makes these data changes accessible to be easily consumed using a relational format. - learn.microsoft.com
This is useful when data from different databases is sucked up into a data lake, this way we can easily keep the data lake current with changes we care about.
Simple POC
You can use sample databases from Microsoft like AdventureWorks or Northwind or you can just invent your own. For simplicity and reasons of being a nerd I did the latter.
A. Create a container for the SQL Server with settings ready for CDC
Local Docker SQL Container
Spin up a DockerSQL Container. This will start the container with Product ID (PID) as Developer Edition.
-e "ACCEPT_EULA=Y" -e "MSSQL_SA_PASSWORD=Password!" -e MSSQL_AGENT_ENABLED=true are environmental variables, MSSQL_AGENT_ENABLED is important as this agent runs the CDC jobs
-p 5901:1433 is the port, so 5001 is your machine which maps to 1433 in the container
Server Name: localhost,5901 Login: sa Password: Password!
Create the sample database and table, I adapted these from w3schools. The table needs a primary key for CDC to track on.
1 2 3 4 5 6 7 8 9 10
CREATE DATABASE databasename;
USE databasename; CREATETABLE person ( id intIDENTITY(1,1) PRIMARY KEY, last_name varchar(255), first_name varchar(255), address varchar(255), city varchar(255) );
You should now see the database in the object explorer:
B. Confirgure the SQL database to track CDC changes
Setup CDC
The ownership of the table should be sa, if you restored a sample database you will need to run ALTER AUTHORIZATION before verifying the owner with the SQL below
1 2 3 4
SELECT d.name, d.owner_sid, sl.name FROM sys.databases AS d JOIN sys.sql_logins AS sl ON d.owner_sid = sl.sid;
Enable tracking changes, you can also disable with the commented out procedure below it.
capture_instance is the new table name. NULL will set the default based on the params example cdc.dbo_person_CT. You could set this to be foo_person_changes_CT (the _CT is always suffixed by default)
supports_net_changes is for when the column changes frequently, so only capture the net change (ie the last one)
captured_column_list these are column you want to track, you have to include the primary key
This will take a few seconds to complete as its creating jobs. The capture job captures the data as it changes from the log file. The cleanup job purges the captured data, it will only keep it for 3 days by default. You can change this with @retention= which needs to be in minutes.
If you want to disable the table you can use this SQL