Turning Entire Blogs into Short Summaries: Map-Reduce for LLMs

Our need for scale is constantly growing. Humanity processes enormous amounts of data. Every day, engineers have to cope with limitations. This reality often forces us to cut corners, creating workarounds to make big things happen. One of the best examples is using LLMs for processing large documents. On one side, we have still-limited technology such as LLMs with restricted context windows, and on the other, we face huge knowledge bases represented by charts, documents, audio, movies, and codebases. In a short time, many useful applications have been created, such as NotebookLM for research work, Claude Code or Cursor for agentic coding. Each of these tools needs to process information that exceeds an LLM’s context window. To make this possible, we must utilize multiple patterns. Let’s consider one of these patterns, called Map-Reduce. To make it easier to understand, we will build a small app for summarizing blogs. This article and an idea of the app we will build was inspired by the map-reduce pattern tutorial from LangChain’s documentation website.

Map-Reduce Pattern

This method was proposed in 2004, by two people from Google: Jeffrey Dean and Sanjay Ghemawat, to process large amounts of distributed data. In simple terms, it allows a program to efficiently process massive amounts of data in chunks and then reduce it to a final outcome, ensuring speed and overall efficiency of the process.

As the name suggests, the pattern is a combination of two functions:

Map, which concurrently calls a non-void function on each chunk of data (for example, counting expected keywords in the chunk and returning an intermediate key-value pair representing each keyword and its number of occurrences).
Reduce, which takes the output of the map functions and aggregates it into the final result (for example, returning a table with the total number of occurrences for each keyword in the dataset).

Because each map task can run independently on different workers, this process scales easily, and the degree of parallelism can be controlled to balance speed and resource usage.

In the context of using LLMs, this pattern is particularly useful because we always deal with a limited context window. In real-world scenarios, the content or data we want to process often exceeds this window. We cannot process an entire blog with just one or two calls to the LLM. Even if we could, the task would be extremely slow and potentially inaccurate, as LLMs generally reason better on smaller chunks of data.

Project Overview

The first thing I’d like to mention is that in this article, we will try to implement this pattern with minimal help from AI frameworks for building AI solutions like langgraph. The core logic will be implemented by ourselves; we will use only utility functions from langchain to split text and format data.

Our logic will be divided into the following steps:

Pre-processing phase– Collecting selected blog articles using a vibe-coded web scraper.
Mapping phase– Looping over each article, splitting it, and summarizing each part with an LLM in parallel using the map prompt.
Reducing phase– If the summaries produced by the map phase exceed the limit, they are collapsed by splitting them into sub-lists and summarizing each sub-list again with the reduce prompt into a consolidated summary. This reduce phase runs recursively until the mapped or collapsed summaries fit within the limit and can be summarized into the final summary, or until the recursion limit is reached. Otherwise, if the summaries already fit, they are summarized directly with the reduce prompt into the final summary.

Pre-Processing Phase

As I mentioned, to get articles for summarization we will use a scraper. We won’t dive deep into the scraper’s code. We will call the scraper in the main function to fetch all the posts from my blog’s website:

typescript

import { summarizeDocuments } from "./summarizer/main";
import { runDirectScraper, runSitemapBasedScraper } from "./scraper/main";

async function main() {
  const scrappingResults = await runSitemapBasedScraper([
    "https://www.aboutjs.dev",
  ]);

  const filteredScrappedResults = scappingResults.filter((result) => {
    if (result.error) {
      console.error(`❌ ${result.url}: ${result.error}`);
    }
    return result.success;
  });
}

void main();

Next, we need to declare a main function for the summarizer — the place where the core logic will run:

typescript

// summarizer/types.ts

export type Document = {
  title: string;
  content: string;
  link: string;
  date: string;
  source: string;
  selector: string;
  index: number;
};

// summarizer/main.ts
import type { Document as LocalDocument } from "./type";

export async function summarizeDocuments(
  documents: LocalDocument[],
  maxIterations = 5,
) {}

The function has two parameters: documents, which serve as the input data (articles in our case), and an optional maxIterations parameter that defines how many recursive operations can be performed when the content does not fit into the limit.

Map Phase

The first thing we do is map over all blog posts and prepare them for the text splitter by formatting them into the Document type:

typescript

import type { Document as LocalDocument } from "./type";
import { Document } from "@langchain/core/documents";

export async function summarizeDocuments(
  documents: LocalDocument[],
  maxIterations = 5,
) {
  const formattedDocs = documents.map(
    (doc) =>
      new Document({
        pageContent: doc.content,
        metadata: {
          title: doc.title,
          link: doc.link,
          date: doc.date,
          source: doc.source,
          selector: doc.selector,
          index: doc.index,
        },
      }),
  );
}

Now, we prepare a function that will split our documents into chunks and summarize each chunk with the LLM.

To craft it, we first need to prepare a few things:

LLM model runnable

We will use gpt-5-mini. For our summarizer, this model is lightweight and fast — more than good enough for the task. To handle the model instance, we will use the openai integration from langchain.
typescript
```
/* REST OF THE CODE */

import { ChatOpenAI } from "@langchain/openai";

const model = new ChatOpenAI({
  model: "gpt-5-mini",
  apiKey: process.env.OPENAI_API_KEY,
});

/* REST OF THE CODE */
```

prompt

In our case the crucial task of the map phase is to summarize each blog post. We need a prompt that will be sent along with the blog post text, or its chunk, to the LLM for summarization:

typescript

export const mapTemplate = (content: string) => `
You are an expert content analyzer. Your task is to extract and summarize the key information from the following document.

Please analyze the content and provide:
1. Main topics and themes
2. Key insights and takeaways
3. Important facts, statistics, or examples
4. Core concepts or ideas presented

Format your summary in the bullet points format.

Summary should be brief and to the point.

Document Content: ${content}

Provide a concise but comprehensive summary that captures the essential information from this document. Focus on the most valuable and actionable content.
`;

The prompt is returned by a template function that takes the blog post content, or its chunk, as an argument. This allows us to dynamically create a prompt with the embedded content for summarization.

text splitter

We need to prepare a tokenTextSplitter, which is capable of splitting a document into smaller sub-documents:

typescript

// summarizer/const.ts
export const CHUNK_SIZE = 1000;

// summarizer/main.ts
import {
  TokenTextSplitter,
  RecursiveCharacterTextSplitter,
} from "@langchain/textsplitters";
import { CHUNK_SIZE } from "./const";

const textSplitter = new TokenTextSplitter({
  chunkSize: CHUNK_SIZE,
  chunkOverlap: 0,
});

/* REST OF THE CODE */

We use an extremely small chunk size to illustrate the recursive process of analyzing texts.

Now we are able to put together the function for running the mappers:

typescript

import type { Document as LocalDocument } from "./type";
import { TokenTextSplitter } from "@langchain/textsplitters";
import { ChatOpenAI } from "@langchain/openai";
import { mapTemplate } from "./prompts";
import { CHUNK_SIZE } from "./const";
import { Document } from "@langchain/core/documents";

const model = new ChatOpenAI({
  model: "gpt-5-mini",
  apiKey: process.env.OPENAI_API_KEY,
});

const textSplitter = new TokenTextSplitter({
  chunkSize: CHUNK_SIZE,
  chunkOverlap: 0,
});

async function runMappers(formattedDocs: Document[]): Promise<string[]> {
  console.log("Summarization started...");
  const splitDocs = await textSplitter.splitDocuments(formattedDocs);

  const results = await model.batch(
    splitDocs.map((doc) => [
      {
        role: "user",
        content: mapTemplate(doc.pageContent),
      },
    ]),
  );

  return results.map((result) => result.content as string);
}

export async function summarizeDocuments(
  documents: LocalDocument[],
  maxIterations = 5,
) {
  const formattedDocs = documents.map(
    (doc) =>
      new Document({
        pageContent: doc.content,
        metadata: {
          title: doc.title,
          link: doc.link,
          date: doc.date,
          source: doc.source,
          selector: doc.selector,
          index: doc.index,
        },
      }),
  );

  let summaries = await runMappers(formattedDocs);
}

Articles will be split into smaller chunks (sub-documents), and each chunk will then be sent along with the map prompt to the LLM for summarization. The batch method allows us to efficiently send multiple tasks to the LLM in parallel. We can control the maximum number of concurrent requests sent per batch in the LLM model runnable configuration. The results from all the mappers will be returned for processing in the next phase.

Reducing Phase

In the next step, we will take all summaries returned from the map phase and consolidate them into a single final summary. In this step, we also need to handle cases where the summaries from the mappers exceed the LLM’s context window or any other limits. To do this, we will collapse the list of summaries by splitting it into sublists and then summarizing each one. As a result, we will get one summary per sublist.

The first function we create for the reduce phase will determine whether we even need to perform collapsing:

typescript

/* REST OF CODE */

async function lengthFunction(summaries: string[]) {
  const tokenCounts = await Promise.all(
    summaries.map(async (summary) => {
      return model.getNumTokens(summary);
    }),
  );
  return tokenCounts.reduce((sum, count) => sum + count, 0);
}

async function checkShouldCollapse(summaries: string[]) {
  const tokenCount = await lengthFunction(summaries);
  return tokenCount > 2000;
}

/* REST OF CODE */

The lengthFunction takes all the summaries and sums up their token count. The checking function then compares this total number of tokens with a hardcoded limit, ensuring that the summaries do not exceed that limit.

Collapsing Summaries

Next, we will build a function for the recursive collapsing of summaries. Collapsing will be done by calling an LLM and requesting a consolidation(summary) of the list of summaries.

Let’s create the reduce prompt for this task:

typescript

/* REST OF THE CODE */
export const reduceTemplate = (summaries: string) => `

The following is a set of summaries:
${summaries}
Take these and create one summary as a whole context gathered from the summaries.

Keep it concise and focused on the main points, avoiding unnecessary details. The goal is to distill the essence of the summaries into a single, coherent summary.
`;

Next, we need to prepare two additional functions for splitting the summary list into sublists.

The first will be another token splitter, responsible for handling the edge case where a single summary from the list exceeds the maximum sublist size.

typescript

/* REST OF THE CODE */

import {
  TokenTextSplitter,
  RecursiveCharacterTextSplitter,
} from "@langchain/textsplitters";

/* REST OF THE CODE */
const recursiveTextSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: CHUNK_SIZE,
  lengthFunction: (text) => {
    return model.getNumTokens(text);
  },
  chunkOverlap: 0,
});

/* REST OF THE CODE */

The RecursiveCharacterTextSplitter is sufficient in this case, since we focus on a single oversized summary. This splitter recursively breaks the text down from sentences to words, trying to keep fragments as long as possible.

Now we can define a function for splitting the list:

typescript

/* REST OF THE CODE */

export async function splitSummariesByTokenLimit(
  summaries: string[],
  tokenLimit: number,
): Promise<string[][]> {
  const listOfSummariesSublists: string[][] = [];
  let sublist: string[] = [];
  for (const summary of summaries) {
    const chunks = await recursiveTextSplitter.splitText(summary);

    for (const chunk of chunks) {
      const candidateList = [...sublist, chunk];
      const candidateTokens = await lengthFunction(candidateList);
      if (candidateTokens > tokenLimit) {
        if (sublist.length > 0) {
          listOfSummariesSublists.push(sublist);
          sublist = [];
        }
      }
      sublist.push(chunk);
    }
  }
  if (sublist.length > 0) {
    listOfSummariesSublists.push(sublist);
  }
  return listOfSummariesSublists;
}

/* REST OF THE CODE */

The function looks quite complicated because the task is not trivial. It iterates over each summary, automatically splitting a summary into smaller chunks if it exceeds the sublist limit. Each potential chunk is then added to a candidate list. The candidate list’s token count is calculated using the lengthFunction defined earlier. If the length exceeds the limit, the current sublist is pushed to the final results array, and a new sublist is started. Otherwise, the chunk is simply added to the current sublist.

We need one more function to collapse the summaries:

typescript

/* REST OF THE CODE */
import { mapTemplate, reduceTemplate } from "./prompts";

/* REST OF THE CODE */

async function reduceSummariesBatch(listOfSummaries: string[][]) {
  const result = await model.batch(
    listOfSummaries.map((summaries) => [
      {
        role: "user",
        content: reduceTemplate(summaries.join("

")),
      },
    ]),
  );
  return result.map((res) => res.content as string);
}

/* REST OF THE CODE */

The function runs parallel calls to the LLM to summarize each sublist of summaries. As a result, the list of summaries becomes shorter.

Alright, we can now create the main function for recursively collapsing the summaries:

typescript

/* REST OF THE CODE */

async function collapseSummaries(
  summaries: string[],
  recursionLimit = 5,
  iteration = 0,
) {
  console.log("Collapsing summaries...");
  if (summaries.length === 0) {
    return [];
  }
  const splitDocLists = await splitSummariesByTokenLimit(summaries, CHUNK_SIZE);

  const results = await reduceSummariesBatch(splitDocLists);

  let shouldCollapse = await checkShouldCollapse(results);
  if (shouldCollapse && iteration < recursionLimit) {
    console.log("Token count exceeds limit, collapsing summaries further...");
    return collapseSummaries(results, recursionLimit, iteration + 1);
  }
  return results;
}
/* REST OF THE CODE */

In this function, we can easily and clearly organize each step of collapsing, as our complex logic is neatly packed into descriptive functions. First, we split the list of summaries into sublists. Then, we reduce each sublist in a batch using the LLM. Next, we check if the reduced list now fits within the context window. If it does not, we perform another collapse; otherwise, or when the recursion limit is reached, we simply return the reduced list.

Now we can call the collapseSummaries function conditionally inside the main summarizer's function:

typescript

/* REST OF THE CODE */

export async function summarizeDocuments(
  documents: LocalDocument[],
  maxIterations = 5,
) {
  const formattedDocs = documents.map(
    (doc) =>
      new Document({
        pageContent: doc.content,
        metadata: {
          title: doc.title,
          link: doc.link,
          date: doc.date,
          source: doc.source,
          selector: doc.selector,
          index: doc.index,
        },
      }),
  );

  let summaries = await runMappers(formattedDocs);

  const shouldCollapse = await checkShouldCollapse(summaries);
  if (shouldCollapse) {
    summaries = await collapseSummaries(summaries, maxIterations);
  }
}
/* REST OF THE CODE */

Final Reduce

After collapsing and ensuring that the list of mapped summaries does not exceed the limit, we can reduce those summaries into a single consolidated final summary.

To do this, we use the reduceTemplate prompt that we used earlier when collapsing sublists of summaries.

Let’s define a function for reducing a single list of summaries:

typescript

/* REST OF THE CODE */
async function reduceSummaries(summaries: string[]) {
  const result = await model.invoke([
    {
      role: "user",
      content: reduceTemplate(summaries.join("

")),
    },
  ]);
  return result.content as string;
}
/* REST OF THE CODE */

We did something similar earlier when collapsing each sublist of summaries in batches. Now, we have only one list at a time to reduce.

The last thing we should do is call this function in the main summarizer function:

typescript

/* REST OF THE CODE */

export async function summarizeDocuments(
  documents: LocalDocument[],
  maxIterations = 5,
) {
  const formattedDocs = documents.map(
    (doc) =>
      new Document({
        pageContent: doc.content,
        metadata: {
          title: doc.title,
          link: doc.link,
          date: doc.date,
          source: doc.source,
          selector: doc.selector,
          index: doc.index,
        },
      }),
  );

  let summaries = await runMappers(formattedDocs);

  const shouldCollapse = await checkShouldCollapse(summaries);
  if (shouldCollapse) {
    summaries = await collapseSummaries(summaries, maxIterations);
  }
  const finalSummary = await reduceSummaries(summaries);
  console.log("finalSummary", finalSummary);
}

Conclusions

After implementing our application, we can clearly see how the Map-Reduce pattern applies to the world of generative AI. Using this pattern, we can overcome LLM context window limits. In software development, simple, battle-tested solutions invented many years ago can work perfectly alongside brand-new, groundbreaking technologies like AI. This shows that before adopting any new technology, we should ask what patterns we can use to overcome its limitations. Don’t be afraid to use battle-tested patterns and algorithms, and combine them with code that leverages state-of-the-art technologies.

If you want to learn more about the context window, check out my article on managing the context window of GPT-4o-mini

Also, check out the repository of our blog summarizer on GitHub.

Turning Entire Blogs into Short Summaries: Map-Reduce for LLMs#

Map-Reduce Pattern#

Project Overview#

Pre-Processing Phase#

Map Phase#

Reducing Phase#

Collapsing Summaries#

Final Reduce#

Conclusions#

Turning Entire Blogs into Short Summaries: Map-Reduce for LLMs

Map-Reduce Pattern

Project Overview

Pre-Processing Phase

Map Phase

Reducing Phase

Collapsing Summaries

Final Reduce

Conclusions