I needed a way to performance tune a Postgresql database focusing on a simple table and select statement. I did this locally using Docker, PgAdmin and a blank table.
-- PostgreSQL uses DECLARE within a DO $$ ... $$ block (an anonymous code block) to declare variables. DO $$ DECLARE count INT :=1; -- := is used for assignment during declaration. upper_bound INT :=1000000; half_way INT := upper_bound /2; random_user_id UUID; random_note VARCHAR(500); random_date TIMESTAMP; random_id UUID;
BEGIN RAISE NOTICE 'Inserting test data with % records', upper_bound;
WHILE count <= upper_bound LOOP random_user_id := gen_random_uuid(); random_note := substr(md5(random()::text), 1, 100); -- Generate a random string for the note, up to 100 characters -- random_date := (clock_timestamp() - (random() * 365 * 24 * interval '1 hour')); -- Generate a random date time within the last year random_date := clock_timestamp(); random_id := gen_random_uuid();
Here Im just using the same SQL command Milan was using for his cursor example and the predicate values from my data seeding above
1 2 3 4 5 6
SELECT id, date, note, user_id FROM user_notes WHEREdate>'2025-03-07 05:36:56.091957' OR (date='2025-03-07 05:36:56.091957'AND id >='782103c1-a971-4b5d-b7da-5e29af12ce60') ORDERBYdateDESC, id DESC LIMIT 20;
Analyze
There are a few ways to analyze the query
Visually
Under Explain Settings Check Costs. The cost is not a time measurement, its just a number that represents the effort to run the query where lower is good. If you have a complex query, you would look at the cost per step and then focus on the highest cost.
Select Explain Analyze, then click Graphical, the thick lines are indicative of inefficiency, this is a simple example but for more complex commands they could help you pin point where to start looking first. If you de-select Costs from step 1 and run again you will notice the lines are skinny.
So I guess SQL is telling us if we are too fat we will be slow? Thats not very nice! 🤬
Selecting Analysis you can see the cost per step, a table scan on user_notes is bad, this means its scanning the whole table. Generally a seek is more efficient.
Explain Analyze
Just add EXPLAIN ANALYZE to the beginning of the SQL command and execute the script. The SQL command should look like the below:
1 2 3 4 5 6
EXPLAIN ANALYZE SELECT id, date, note, user_id FROM user_notes WHEREdate>'2025-03-07 05:36:56.091957' OR (date='2025-03-07 05:36:56.091957'AND id >='782103c1-a971-4b5d-b7da-5e29af12ce60') ORDERBYdateDESC, id DESC LIMIT 20;
Looking at the results, the execution time is 149.880 ms. Also same as the above, you can see a table scan on user_notes which is bad.
Adding an index
Based on the contrived example above and seeing what Milan did, its clear we need an index on date and id with the same sorting. An index is just a sorted table that will point to the data. Its bad to index every column because there is a performance hit, everytime you insert or delete, the index becomes more fragmented.
1 2 3
CREATE INDEX idx_user_notes_date_id ON user_notes (dateDESC, id DESC);
--DROP INDEX idx_user_notes_date_id; -- you can drop the index if its dodgy
Analyze (with index)
Life is better with an index :)
Visually (with index)
Remember that under Explain Settings to check Costs, then running the SQL command with Explain Analyze and click Graphical. You will now see idx_user_notes_date_id is being used, there is still a tick line for the limit, indexes are not a silver bullet :)
Selecting Analysis you can now see a scan using the index, this is better
Explain Analyze (with index)
Same as above add just add EXPLAIN ANALYZE to the beginning of the SQL command and execute the script, this time the execution time is 0.042 ms
So if you compare the speed gain with and without the index, its night and day!
1 2
0.042 ms -> with index 149.880 ms -> without index