Foreign Data Wrappers, Who?

October 31, 2023

Andrew Jung

Foreign Data Wrappers, Who?

Problem

In a microservices architecture, usually you have more than one database, usually following the database-per-service pattern

What happens if Service A has data, that Service B wants to use? What if it's just database tables and you don't want to futz around with creating an endpoint to get the data, then querying for all the rows, then doing something? This is a sort of tight coupling that we'd normally want to avoid, but Service B in this case, is used for aggregating data across our services, and Service A has the goods.

A foreign data wrapper can help us create a new table in Service B, using data in Service A without having to keep it up to date on our own. This data is read only, so we don't have to worry about messing up data from somewhere else.

What is a Foreign Data Wrapper

In this context, I'm talking specifically about Postgres and being able to create a foreign view in a service that is a subsection of data from another service. We can create a foreign table in our service that will query table as if it were local, to another service's. This is not a built-in part of Postgres, but an extension you can install on your server. You can learn how to set it up here: https://www.postgresql.org/docs/current/postgres-fdw.html

Scenario

Let's say you have a table of "cat breeds" from our Cat Identification Service. Our Cat Reporting Service wants to know how many cat breeds we have, and also wants to pull other data from other services, like Cat Owners from the Cat Owners Service. Vets and Bills from the Cat Vet Service. We want to pull all this data in this reporting service, but we don't always want to keep calling out to these services about their data. There's a couple ways to accomplish this, but this is about the foreign tables and views!

We want to build a "dataset" for each of these services, so we don't have to replicate the query in the service and we only want specific columns.

We can create our foreign data, with a regular query with JOINs and filters, to get our data down, and pick specific columns, that will represent our table.

Then in our final reporting service, we can have a representation of multiple services:

We can then build our own query using these three new foreign tables, alongside any tables we have in our Cat Reporting Service, to expose a unique data view.

Downsides

There's a lot to think about when wanting to views as tables:

Why isn't this table accessible without the table? How many tables are too many? What about migrations on a live system? Is it secure? What happens to my indexes? Constraints? Where did this data come from?

There's also human errors, if people don't follow a great naming convention for your tables, you might end up reproducing similar tables with similar data. You might confuse a source with another source. And the database grows linearly with your sources.

Extra/Warning

I'm not actually sure if this is the proper way to use a foreign data wrapper and view. But this is what I learned at Perpetua. I would create a new view using a Django SQL migration using existing server connections to another service, with filtered down data or even distilling a complex query. I'd then run a migration on the intended service to pull the data in, using Alembic and SQLAlchemy to create a foreign table and its representation. It would make my data readily available from Service A into Service B, ready to use in other queries for good 'ol data.