Postgres

Connection Pooling for Postgres using PG Bouncer

I recently had to run load testing for an API that fetches data from Postgres. I was monitoring Postgres when running the first test, and I noticed that hundreds of connections were being opened. I was also seeing runtime errors in Postgres such as “no more connections allowed” or “out of shared memory”. Clearly the database was not prepared for the load. So I started to look for ways to improve connection management to the database, and I came across pgBouncer, a connection pooling service for Postgres. ...

Making Friends With Postgres: Replication

Replication refers to having copies of your data across different instances. It gives you redundancy when one of your instances goes down so you can still serve data to users from the remaining instances. Another benefit of a replicated system is being able to load balance your requests across different instances. This can be useful when the load of serving all the requests from a single instance would be too much. ...

Building a Reverse ETL Pipeline: Upserting Delta Lake Data into Postgres with Structured Streaming

In this post, I share how to build a Reverse ETL pipeline to upsert data from Delta (Databricks) into Postgres to provide sub-second response times to our tables. The goal is to make warehouse data available to downstream systems that require millisecond response times. These systems could be front-end applications that need to consume this data, or online machine learning models that require additional data as input to generate predictions in real time. ...

We Need to Talk about Partitions

The other day my team came across a peculiar situation with our Postgres database. We had this table partitioned by a primary key, which we’ll call order_id. We realised one of the queries was taking a very long time to execute, and like good SQL debuggers we looked at the query plan. Let’s pretend this was the slow query: EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 12345; Upon inspection, we observed that the query plan was scanning at every single partition! And in our case, there were a lot of them. ...

Load Testing PostgeSQL using Locust

I recently needed to run load tests on a PostgreSQL database. Postgres is no question a reliable and scalable database ready for production use cases, but there might be times when you’ll need to confirm it can handle your specific use case. This might be necessary if you’re using a PostgreSQL database hosted by a third party or if you want to check if your current instance size and specs can manage your existing load. ...

Understanding Database Indexes

Indexes are like going to the airport and looking at the signs to see where your gate is located. The signs tell you how to get to your gate quickly. The alternative to not having signs is walking the whole airport randomly until you come across your gate. So by having the signs you can reach your gate a lot faster. Makes sense right? This is the idea behind indexes. ...