PSQL Performance

I needed a way to performance tune a Postgresql database focusing on a simple table and select statement. I did this locally using Docker, PgAdmin and a blank table.

This is simliar to MySQL Performance and SQL Execution Plans.

Local setup

  1. Use the postgres docker image to run a local instance
1
docker run -d -p 5432:5432 -e POSTGRES_USER=myuser -e POSTGRES_PASSWORD=mypassword -e POSTGRES_DB=mydatabase -e POSTGRES_SCHEMA=myschema --name my-postgres postgres:17.4
  1. Connect to the database using PgAdmin, I used v9.1

  2. Create a test table, here I copied Milans example and created in pgtest.mydatabase.public.user_notes

1
2
3
4
5
6
7
CREATE TABLE user_notes (
id UUID NOT NULL,
user_id UUID NOT NULL,
note CHARACTER varying(500),
date TIMESTAMP NOT NULL,
CONSTRAINT pk_user_notes PRIMARY KEY (id)
);
  1. Based on Generate Random Data create 1,000,000 records, on my machine this took 22 seconds to complete
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- PostgreSQL uses DECLARE within a DO $$ ... $$ block (an anonymous code block) to declare variables.
DO $$
DECLARE
count INT := 1; -- := is used for assignment during declaration.
upper_bound INT := 1000000;
half_way INT := upper_bound / 2;
random_user_id UUID;
random_note VARCHAR(500);
random_date TIMESTAMP;
random_id UUID;

BEGIN
RAISE NOTICE 'Inserting test data with % records', upper_bound;

WHILE count <= upper_bound LOOP
random_user_id := gen_random_uuid();
random_note := substr(md5(random()::text), 1, 100); -- Generate a random string for the note, up to 100 characters
-- random_date := (clock_timestamp() - (random() * 365 * 24 * interval '1 hour')); -- Generate a random date time within the last year
random_date := clock_timestamp();
random_id := gen_random_uuid();

INSERT INTO user_notes (id, user_id, note, date)
VALUES (random_id, random_user_id, random_note, random_date);

IF count % 50000 = 0 THEN
RAISE NOTICE '%', count;
END IF;

IF count = half_way THEN
RAISE NOTICE '** Halfway bro! **';
RAISE NOTICE '%', random_id;
RAISE NOTICE '%', random_date;
END IF;

count := count + 1;
END LOOP;

RAISE NOTICE 'DONE!';
END $$;
  1. Note the data halfway, this is useful later based on the data access patterns, its almost like I knew how the data was going to be accessed ahead of time 💩
1
2
3
4
5
6
7
NOTICE:  450000
NOTICE: 500000
NOTICE: ** Halfway bro! **
NOTICE: 782103c1-a971-4b5d-b7da-5e29af12ce60
NOTICE: 2025-03-07 05:36:56.091957
NOTICE: 550000
NOTICE: 600000
  1. Confirm the data exists

Count and expect 1,000,000

1
SELECT count(*) AS cnt FROM user_notes

Count

Identify data access patterns

  1. Here Im just using the same SQL command Milan was using for his cursor example and the predicate values from my data seeding above
1
2
3
4
5
6
SELECT id, date, note, user_id
FROM user_notes
WHERE date > '2025-03-07 05:36:56.091957'
OR (date = '2025-03-07 05:36:56.091957' AND id >= '782103c1-a971-4b5d-b7da-5e29af12ce60')
ORDER BY date DESC, id DESC
LIMIT 20;

Analyze

There are a few ways to analyze the query

Visually

  1. Under Explain Settings Check Costs. The cost is not a time measurement, its just a number that represents the effort to run the query where lower is good. If you have a complex query, you would look at the cost per step and then focus on the highest cost.

Explain Settings Costs

  1. Select Explain Analyze, then click Graphical, the thick lines are indicative of inefficiency, this is a simple example but for more complex commands they could help you pin point where to start looking first. If you de-select Costs from step 1 and run again you will notice the lines are skinny.

So I guess SQL is telling us if we are too fat we will be slow? Thats not very nice! 🤬

Analyze Explain Graphical

  1. Selecting Analysis you can see the cost per step, a table scan on user_notes is bad, this means its scanning the whole table. Generally a seek is more efficient.

Analyze Explain Analysis

Explain Analyze

  1. Just add EXPLAIN ANALYZE to the beginning of the SQL command and execute the script. The SQL command should look like the below:
1
2
3
4
5
6
EXPLAIN ANALYZE SELECT id, date, note, user_id
FROM user_notes
WHERE date > '2025-03-07 05:36:56.091957'
OR (date = '2025-03-07 05:36:56.091957' AND id >= '782103c1-a971-4b5d-b7da-5e29af12ce60')
ORDER BY date DESC, id DESC
LIMIT 20;
  1. Looking at the results, the execution time is 149.880 ms. Also same as the above, you can see a table scan on user_notes which is bad.

Analyze Explain

Adding an index

Based on the contrived example above and seeing what Milan did, its clear we need an index on date and id with the same sorting. An index is just a sorted table that will point to the data. Its bad to index every column because there is a performance hit, everytime you insert or delete, the index becomes more fragmented.

1
2
3
CREATE INDEX idx_user_notes_date_id ON user_notes (date DESC, id DESC);

--DROP INDEX idx_user_notes_date_id; -- you can drop the index if its dodgy

Analyze (with index)

Life is better with an index :)

Visually (with index)

  1. Remember that under Explain Settings to check Costs, then running the SQL command with Explain Analyze and click Graphical. You will now see idx_user_notes_date_id is being used, there is still a tick line for the limit, indexes are not a silver bullet :)

Analyze Explain Graphical With Index

  1. Selecting Analysis you can now see a scan using the index, this is better

Analyze Explain Analysis With Index

Explain Analyze (with index)

  1. Same as above add just add EXPLAIN ANALYZE to the beginning of the SQL command and execute the script, this time the execution time is 0.042 ms

Analyze Explain With Index

  1. So if you compare the speed gain with and without the index, its night and day!
1
2
  0.042 ms -> with index
149.880 ms -> without index

References