typescript and javascript logo
author avatar

Grzegorz Dubiel

15-04-2026

Behind the Screens: Designing and Building a Node.js Ad Campaign System

I am the type of person who keeps away from screens because they distract me. I know that sounds weird given my profession, but I'm not going to talk about my approach to digital well-being here (that's a story for another article). Recently, I noticed some digital screens distributed around the city center where I live. In general, I don't like ads, too many screens, or too many blinking colors, but one thing came to my mind: how the heck do they create a system of distributed screens showing what they want? What if they lose connection? How do campaigns keep running on those screens? Additionally, I realized that the technology I like and specialize in could be a good fit for such a system. I decided to take on the challenge of designing such a system and building a POC of it. Moreover, I had recently started to focus more on solving "system design problems." So let's dive into it.

Gathering functional and non-functional requirements

Before starting to draw any diagram, we need to gather the requirements and state them clearly. The best way to do it is to start by sorting them into functional and non-functional requirements.

Functional requirements

  • The user shall be able to create a campaign by defining metadata, a schedule, and assets
  • The user shall be able to see all of their campaigns
  • The system shall create the ad from the metadata and assets sent by the user
  • Ads shall be displayed according to schedule on the selected devices (screens)
  • The system shall be able to cancel the selected campaign

Non-functional requirements

  • The system shall handle up to 500 concurrent requests for campaign creation
  • The system shall push campaigns to up to 20000 screens concurrently
  • Campaign creation shall be idempotent
  • A campaign shall start within 1 second of the scheduled time
  • Screens shall send acknowledgments to the backend immediately after installing the ad on the device
  • Screens shall continue to work offline while still starting campaigns according to schedule

Defining the core components of the system

As you can see, the task is not trivial. To meet all of these requirements, we will have to define three main layers of the application:

  1. Admin UI dashboard - handles campaign creation by adding metadata and assets, and lists the created campaigns and their status.
  2. Backend - will include components such as the dashboard API, a template worker for creating and storing templates, a database for storing metadata, a publisher for publishing campaigns to devices, and an ack consumer that will receive messages from devices to keep track of the state of campaign publishing. The workers will be connected through a persistent job queue.
  3. Screens frontend - will be a simple vanilla JS app that will receive the campaign manifest, install the campaign in the cache, run campaigns, and notify the backend about the publishing status.

NOTE: For this design, we will not talk about authentication and authorization. The example repo does not implement them either. In this article, I'd like to focus solely on the core idea of the screen-based campaign system.

Data flow and system design diagram

The best way to understand how the components of the system work is to draw a diagram. After analyzing it, we will have a clear high-level overview of the system.

Add campaign architecture diagram

The flow works in a kind of feedback loop triggered from the Admin UI, where we have: request -> API -> compute template, create manifest -> publish to the screens -> send ack to the backend, which closes the feedback loop. Backend services communicate via BullMQ. The backend also has file storage (it's just a file system for the needs of this POC, but it can be object storage or whatever else fits your needs). Consistency and atomicity are ensured by using the outbox pattern when registering events for creating and cancelling campaigns and creating the template. The POST request for creating a new campaign requires an idempotency key, which prevents multiple jobs for the same campaign. Without idempotency, this can be hard to bear, especially at scale, because creating a template and storing campaign assets are not trivial tasks, and it is obvious that there cannot be duplicates in the system. From the start, we will use a read replica for read operations as a deliberate scalability choice, because even in a moderate-traffic scenario, background processes are write-heavy and can quickly impact read performance, resulting in a poor user experience.

Data Model

The data model in this case is simple. For this POC project, we will create four core domain entities: device, campaign, campaign asset, and delivery event. In addition, we will use an outbox table to support reliable internal event processing. In the future, the data model can be extended with users, teams, etc.

Here is the minimal data model for the application, including the outbox table:

Table: campaigns

SQL

Table: devices

SQL

Table: campaign_assets

SQL

Table: delivery_events

SQL

Table: outbox

SQL

As you can see, the entire data model at this point is simple and covers storing assets and associating campaigns with devices through the delivery_events table, as well as supporting the outbox pattern. There are no filters in the Admin UI yet, so the indexes in the presented design are not sophisticated.

Project structure

The project structure is organized as a monorepo that consists of the Admin UI, the backend with all significant backend services as modules such as the API, publisher, outbox service, and template worker, as well as the screen app. At this stage, all backend services will have a single source of truth, which is an SQL database.

In order to discuss the choices, trade-offs, and patterns, we need to zoom in a little on the services.

Admin UI

The Admin UI is responsible for creating, reading, and cancelling campaigns. To build this service, I used React + Vite + TanStack Query.

Here is the client-side API integration for those cases:

typescript

These API fetchers are wrapped with TanStack Query hooks and used in the form and the list below:

Paginated Table

JSX

Form

JSX

One of the most important things from the perspective of the reliability and consistency of the entire system is that the Admin UI generates the idempotency key, ensuring that the same campaign creation request will be stored only once. This makes the request safe in situations where the client needs to retry it, there are double clicks, or other reasons cause the client to send a duplicate request. The service on the API backend side will later use this key when inserting the campaign into the database.

Why React, and why not React with a meta-framework?

I chose React + Vite from the start because it is an established and well-maintained stack, which is always good when the team needs to grow. In this case, I am not keen on picking any meta-framework. In my opinion, this project will not require read-heavy use cases that are often optimized by SSR, nor is there any need for optimized SEO in the case of an Admin UI dashboard.

Campaign API

For the backend services, I chose NestJS and PostgreSQL, with TypeORM used as the ORM layer.

The Campaign API is responsible for serving endpoints for the Admin UI and reflects the same tasks: campaign creation, campaign cancellation, and displaying the list of campaigns.

Let's zoom straight into its service:

typescript

In the create method, we can see how the previously mentioned idempotency key is used. It is used to check whether the campaign already exists. If a campaign with the same idempotency key exists, a new campaign will not be inserted into the database.

Why I chose NestJS for the backend? It is opinionated, has a wide array of packages, is well maintained, encourages modular architecture, comes with established patterns, and offers great integration with queues and brokers, which helps keep things well organized.

Why I chose a relational database with TypeORM TypeORM has out-of-the-box support for primary-replica replication and excellent integration with NestJS. I chose PostgreSQL because of its convenient jsonb support, which is useful for storing metadata in important entities such as campaigns and devices. The overall setup also integrates well with the rest of the system. Using SQL with strong ACID guarantees is important for consistency and works well with patterns like the outbox to address the dual-write problem in workers.

Outbox Poller

To ensure atomicity and consistency, and to address the dual-write problem, I decided to implement the outbox pattern for campaign creation, template processing, and campaign cancellation. Internal event distribution is handled by BullMQ. For instance, the Outbox Poller wraps publishing to the queue, ensuring that no inconsistent jobs are published there:

typescript

As you can see, I tried to achieve strong reliability here by batching jobs and using locks. The whole flow is organized in a claim/lock-like pattern, where we first claim events from the outbox, then assign a worker to them to ensure that only one particular worker handles the job, then publish to the queue and update the outbox event. When the lock becomes stale, cleanupStaleClaims will release the event. This reduces a risk that, in the case of an error, the event will be lost.

Why BullMQ for service communication?

In this case, I chose BullMQ because I do not need complex routing or many different message types. The functional requirements also suggest reasonable traffic for mass campaign creation (500 concurrent requests), so this solution should handle that scale well.

Why polling vs. CDC

I chose polling rather than CDC because reaction speed is not that significant in this case. Many campaigns will be scheduled well ahead of their release time, so if polling every 5 seconds becomes too frequent because of database load or cost, we will have room to reduce the polling frequency. For the required throughput, the polling approach is a better fit because it should work well without unnecessary infrastructure overhead and added complexity compared to CDC.

Template Worker

The template worker has one task to do. It gathers the assets and metadata of the campaign and assembles a template that will later be fetched by the screen device for display:

typescript

This class is simply a BullMQ worker. I decided to use the file system as storage because this is only a POC, but in production you would want to use object storage such as S3.

Area for optimization

The generateTemplate function is a good candidate to keep an eye on here. In load tests, I did not notice any performance issues, but if the code inside is extended with more computations, it could block the event loop. If that happens, this function should be offloaded to worker_threads.

Publisher Worker

Next is a worker that is responsible for pushing the campaign manifest to the screen clients or publishing revoke messages to the devices. At a high level, the behavior is simply pub-sub. The publisher uses the MQTT protocol to communicate with the devices. MQTT (Message Queuing Telemetry Transport) implements the pub-sub pattern by design. Its main component is the broker, which acts as a central server that receives messages and distributes them to clients by organizing data into topics.

Ok, let's see what we have in the processor code:

typescript

The methods for checking and setting data in the tracker are called inside the handlePublish and handleCancel methods of the publisher processor service.

The service handles two publishing paths: publishing the install manifest (handlePublish) and cancelling the campaign (handleCancel). The startResult and finalResult checks help prevent race conditions.

To help guarantee idempotency in publishing, we have a small service that leverages Redis:

typescript

The requirements tell us that there will be quite a large number of devices to handle, so we need a small service for streaming devices from the database and publishing to each streamed batch:

typescript

This class simply exposes a generator method. The generator yields devices in configurable cursor-paginated batches, which offers robust control over data processing and can be adjusted as needed.

I also had to cover the case where devices do not receive the manifest or cancel message because they are offline. I had at least two options to choose from:

  1. Force the MQTT broker to handle retries by setting up broker-side queuing.
  2. Create a separate worker service for retrying manifest delivery for those offline devices.

I chose the second option as the safer one. This avoids using queues inside the MQTT broker and gives full control over redelivery, including how undelivered jobs are paginated. It also reduces the risk of resource exhaustion on the broker side, because storing undelivered messages could become a bottleneck, especially if we assume a scale of around 20k devices.

Here is the code:

typescript

One more thing is the MQTT broker configuration, where the mqtt.connect() call uses tuned options:

  • keepalive: 30 — sends PINGREQ every 30s to detect dead connections
  • reconnectPeriod: 1000 — reconnects after 1s on disconnect
  • connectTimeout: 10_000 — 10s timeout for initial connection
  • clean: true — starts with a clean session (no stale state)
  • reschedulePings: true — resets the keepalive timer on activity (avoids unnecessary pings during high throughput)

In the scenario of 20k devices, we only needed to tune the configuration a little to handle higher throughput. If this limit rises, we can introduce a small pool of connections and use a round-robin algorithm to distribute message traffic, but it is worth remembering that this comes with higher resource consumption because more WebSocket connections are needed to achieve it.

Why MQTT instead of just WebSockets

For this part of the code, I decided to rely on MQTT because it solves many problems out of the box, such as topic-based message distribution. It already behaves like a real pub-sub system. If I had decided to use only WebSockets, I would have ended up implementing an abstraction on top of them. The infrastructure cost is also lower compared to building and maintaining my own pub-sub service from scratch.

Screen Device

The screen app is a small frontend application written without any frontend framework. I decided to do it that way because the template is assembled on the server, stored, and then fetched by the screen, so there is no need to build it every time at runtime on the device. This keeps the frontend extremely lightweight, which is important for these devices.

The main module of the screen looks like this:

typescript

This code orchestrates a few things that are crucial for the screen to install, schedule, and cancel the campaign:

  • storage initializes, saves, gets, and deletes the manifest. It leverages IndexedDB, which is sufficient as storage in this case.
  • MQTT client connection allows the device to subscribe to the server-side pub-sub and communicate with the server by sending ACKs for publish/cancel operations.
  • scheduler initializes the scheduler, which handles display timing on the device and fulfills one of the functional requirements.
  • fetch template fetches the template built on the server and stored in the asset storage. Note that the template is prefetched earlier (5 min handled in scheduler) to make sure it will be available when the scheduled time comes. According to the functional requirements, the allowed delay is 1 second.

Why cache the manifest and schedule campaigns locally instead of pushing them on schedule via MQTT

When we have ~20k devices subscribed to the pub-sub, relying on the server for live publishing becomes risky. There are many things that can go wrong during publishing, such as network delays, an overwhelmed broker, or devices being offline at the scheduled time. Storing the manifest client-side on the device is almost zero cost. There is also room for improvement to enhance reliability, such as defining a maximum lead time before the scheduled start in which the campaign must be created.

ACK Consumer

Now it is time for the last service in our design, which closes the publish -> ACK loop. This consumer is responsible for receiving and processing ACKs. The screen devices publish messages to it depending on which operation should be confirmed.

typescript

This service has to be prepared to handle a significant burst of ACKs, as the requirements mention ~20k devices. Choosing the naive approach of inserting every ACK event straight into the database would not scale well. In this case, I decided to combine buffering with batching in the queue. For the queue, I decided to use the existing infrastructure (BullMQ). I considered Kafka, but in this case BullMQ is enough and gives us everything we need, including data buffering, error handling, and configurable concurrency.

The batch processor logic looks like this:

typescript

The events are first mapped, and then the maximum parallelism is computed dynamically based on the maximum number of available pool connections in order to strike a balance between speed and reliability. After that, the inserts are executed, and finally the affected devices are marked as seen.

Summary

Designing systems, breaking down each component, comparing trade-offs, and making decisions is deeply satisfying. I know this code is not production-ready yet (in fact, it is still far from that state), but even so, I have a lot of fun building it, and I still get a dopamine hit when I see load tests passing. This POC is a good starting point for creating a real system. Since AI became smarter, I've heard people complain about losing satisfaction from coding. For me, the satisfaction is the same as it was before AI. The most satisfying thing is still building complex systems, assembling patterns, and exploring different options and solutions. In code, this often happens at the micro scale, but when we focus on the bigger picture, the satisfaction comes from the macro scale.

Check out my repo with this project on my GitHub.

PS: I also mentioned the outbox pattern in this article. If you want to learn more about the pattern itself, check out!

typescript and javascript logogreg@aboutjs.dev

©Grzegorz Dubiel | 2026