What do Postgres Tables Actually look like?

If you were to peek inside Postgres and see how the tables are stored, you’d be surprised they look nothing like the outuput you get when you do a SELECT query. In this post I look inside Postgres to understand how tables are stored in the filesystem. The goal of this post is nothing more than to understand a small part of the whole Postgres ecosystem to demystify things a bit. It is a step by step process to find out where tables actually live in the filesystem and their structure. ...

February 11, 2026 · 8 min

Replicas vs Standby

When working with distributed systems you can encounter terminology such as “replicas” or “standby” intstances. When I first started learning about distrubuted systems, I assumed “replicas” and “standby” were the same thing, but called differently when talking about scaling or high availability. While they can be the same thing, that is not always the case, so I wanted to clarify the concepts in this post. Replicas Replicas are one or more exact copies of the primary instance of your database. Their main purpose is to balance the load. ...

October 5, 2025 · 4 min

Building a GraphQL API with AWS AppSync and PostgreSQL Aurora

Introduction This is a guide to myself if I need to deploy an Appsync App in the future. Appsync is a Serverless AWS offering for implementing GraphQL APIs. It integrates nicely with Aurora and DynamoDB. Other integrations are done via Lambdas. Architecture Overview Compontents AWS AppSync: The managed GraphQL service that processes API requests Aurora PostgreSQL: Serverless relational database for data storage AWS Secrets Manager: Securely stores database credentials AWS IAM: Manages permissions between services AWS CloudWatch: Handles logging and monitoring The flow of a request is: ...

August 5, 2025 · 5 min

Connection Pooling for Postgres using PG Bouncer

I recently had to run load testing for an API that fetches data from Postgres. I was monitoring Postgres when running the first test, and I noticed that hundreds of connections were being opened. I was also seeing runtime errors in Postgres such as “no more connections allowed” or “out of shared memory”. Clearly the database was not prepared for the load. So I started to look for ways to improve connection management to the database, and I came across pgBouncer, a connection pooling service for Postgres. ...

July 24, 2025 · 11 min

Making Friends With Postgres: Replication

Replication refers to having copies of your data across different instances. It gives you redundancy when one of your instances goes down so you can still serve data to users from the remaining instances. Another benefit of a replicated system is being able to load balance your requests across different instances. This can be useful when the load of serving all the requests from a single instance would be too much. ...

June 9, 2025 · 9 min

Building a Reverse ETL Pipeline: Upserting Delta Lake Data into Postgres with Structured Streaming

In this post, I share how to build a Reverse ETL pipeline to upsert data from Delta (Databricks) into Postgres to provide sub-second response times to our tables. The goal is to make warehouse data available to downstream systems that require millisecond response times. These systems could be front-end applications that need to consume this data, or online machine learning models that require additional data as input to generate predictions in real time. ...

May 8, 2025 · 9 min

We Need to Talk about Partitions

The other day my team came across a peculiar situation with our Postgres database. We had this table partitioned by a primary key, which we’ll call order_id. We realised one of the queries was taking a very long time to execute, and like good SQL debuggers we looked at the query plan. Let’s pretend this was the slow query: EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 12345; Upon inspection, we observed that the query plan was scanning at every single partition! And in our case, there were a lot of them. ...

March 16, 2025 · 3 min

Load Testing PostgeSQL using Locust

I recently needed to run load tests on a PostgreSQL database. Postgres is no question a reliable and scalable database ready for production use cases, but there might be times when you’ll need to confirm it can handle your specific use case. This might be necessary if you’re using a PostgreSQL database hosted by a third party or if you want to check if your current instance size and specs can manage your existing load. ...

January 10, 2025 · 3 min

Understanding Database Indexes

Indexes are like going to the airport and looking at the signs to see where your gate is located. The signs tell you how to get to your gate quickly. The alternative to not having signs is walking the whole airport randomly until you come across your gate. So by having the signs you can reach your gate a lot faster. Makes sense right? This is the idea behind indexes. ...

December 8, 2024 · 6 min