GitHub Year in Review: 2024 — What Open Source Changed in the Engineering Stack

At the start of 2024, AI assistants answered questions. They did not act. Engineers building AI-augmented systems still scraped their own web data with Selenium, wrote custom database connectors for each LLM integration, and maintained separate embedding pipelines decoupled from their primary datastores. By October, browser-use had shipped a library that handed any LLM a real Chromium browser to operate. OpenHands had reached 74,000 GitHub stars after researchers demonstrated it could autonomously fix GitHub issues end-to-end. Google had open-sourced an MCP server that connected Claude, Gemini, and other MCP-compatible clients to BigQuery, Spanner, and PostgreSQL without a line of custom connector code. Three convergent waves defined the year: the operator layer arrived, the knowledge retrieval layer got a graph spine, and the database-to-AI interface standardized around a protocol. Nine repositories show exactly where each shift happened.

The Year at a Glance

Theme	Repository	Domain	Eliminated Manual Task	Peak Stars
Agents as Operators	firecrawl/firecrawl	System Design	Custom per-site scraping pipelines for AI input	123,403
Agents as Operators	browser-use/browser-use	System Design	Per-site Playwright automation scripts	95,226
Agents as Operators	OpenHands/OpenHands	Developer Productivity	Manual write-test-debug cycle for every code change	74,651
RAG with Graph	microsoft/graphrag	System Design	Flat vector search for multi-hop document questions	33,182
RAG with Graph	HKUDS/LightRAG	System Design	Maintaining separate vector DB and graph DB pipelines	35,620
RAG with Graph	getzep/graphiti	System Design	Ad-hoc agent memory using truncated message lists	26,430
Databases Go AI-Native	googleapis/mcp-toolbox	Databases	Custom connector per AI assistant per database	15,323
Databases Go AI-Native	Canner/WrenAI	Databases	Brittle NL2SQL prompt engineering without schema semantics	15,310
Databases Go AI-Native	timescale/pgai	Databases	External embedding pipeline with manual synchronization	5,802

Situation

Three technical constraints were keeping AI systems to the role of answering questions rather than taking action at the start of 2024. First, connecting an LLM to real-world data — a website, a database, a codebase — required writing and maintaining a custom connector for each pairing; no standard interface existed. Second, RAG systems built on vector similarity search had a documented failure mode with multi-hop questions: vector search returns isolated chunks, not relationships between entities across documents. Third, LLM agents had no persistent memory of facts that changed over time — session history truncation meant the agent forgot; flat storage meant it could not resolve contradictions. The year’s open-source releases addressed each constraint, and the star counts confirm the adoption was not theoretical.

The Problem at Year Start

Domain	Manual task	Engineering cost	Status at year end
System design	Writing per-site Playwright scripts for web data extraction	1–3 days per site; breaks on UI changes	Eliminated for LLM-ready output by firecrawl
System design	Building per-LLM per-database connector code	1–2 weeks per integration; repeated for every new model	Standardized via MCP; mcp-toolbox covers 11+ databases
System design — RAG	Multi-hop questions over document corpora	Poor accuracy from vector search; hours of prompt engineering	Graph-augmented retrieval addressable via graphrag and LightRAG
Platform engineering	Deploying AI agents to production Kubernetes	4–8 hours per new agent workload; bespoke manifests per service	Partially reduced; agent frameworks matured across the year
Databases	Maintaining external embedding pipeline synchronized with source data	Ongoing ops; stale embeddings accumulate during outages	Automated by pgai vectorizer inside PostgreSQL
Databases	NL2SQL without hallucinating column or table names	Per-query schema-dump prompting; business definitions not captured	Semantic layer approach standardized by WrenAI

The question 2024 answered: can open-source AI tooling at the infrastructure layer remove the connector-writing, pipeline-building, and prompt-engineering overhead that consumes engineering cycles each time a new AI use case begins?

2024: AI Tooling Moved from Answering to Acting

flowchart TD
    A[2024 — AI stopped answering and started acting] --> B[Theme 1 — Agents as Operators]
    A --> C[Theme 2 — RAG with Graph Structure]
    A --> D[Theme 3 — Databases Go AI-Native]
    B --> E[firecrawl — web data for AI]
    B --> F[browser-use — AI controls browser]
    B --> G[OpenHands — AI edits and runs code]
    C --> H[graphrag — entity graph from documents]
    C --> I[LightRAG — hybrid graph and vector retrieval]
    C --> J[graphiti — temporal agent memory]
    D --> K[mcp-toolbox — MCP server for databases]
    D --> L[WrenAI — semantic layer for NL2SQL]
    D --> M[pgai — embeddings inside PostgreSQL]

Theme 1: AI Agents Learned to Operate the Computer

Building an AI system that acted on the web in early 2024 meant writing brittle Playwright scripts per site, or accepting that your agent was constrained to text generation. Three repositories removed that constraint by shipping the operator layer as a reusable dependency — the plumbing that connects an LLM to real systems.

firecrawl/firecrawl — replacing per-site scraping pipelines with a single web API

Before — the manual workflow: JavaScript-heavy pages required Selenium or Playwright; proxy rotation, rate limiting, and content cleaning were per-project work that did not transfer across sites.

# Before: JS-rendered pages require Playwright; output needs manual cleaning
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com")
    html = page.content()
    # Manual extraction, markdown conversion, proxy rotation — all bespoke per site

After — with firecrawl:

# After: firecrawl Python SDK — one call returns LLM-ready markdown
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key="fc-...")
result = app.scrape_url("https://example.com", formats=["markdown"])
# result.markdown: complete content, JS-rendered, proxy-handled, clean

The productivity delta: According to the project README, firecrawl “handles rotating proxies, orchestration, rate limits, JS-blocked content, and more — zero configuration.” The README reports P95 latency of 3.4 seconds across millions of pages. The engineer no longer maintains a per-site extraction layer or manages proxy infrastructure.
How it works: Firecrawl wraps a headless browser pool with proxy rotation and content normalization. Output formats include markdown, structured JSON, screenshots, and links — all sized for LLM token budgets. The README states it “covers 96% of the web, including JS-heavy pages.”
Where it breaks: The hosted service has rate limits proportional to the plan. Self-hosting moves the proxy pool management back to the team — the operational complexity Firecrawl abstracts. For high-volume, budget-constrained scraping, the self-hosted version requires provisioning and operating the proxy infrastructure the README describes as “handled.”

browser-use/browser-use — replacing per-site Playwright scripts with an LLM-controlled browser

Before — the manual workflow: Web task automation required a script that knew the target site’s DOM — specific selectors, form field names, navigation sequences. Each script was brittle to UI changes and non-transferable to new sites.

# Before: Playwright script tied to one site's DOM structure
from playwright.async_api import async_playwright
async with async_playwright() as p:
    browser = await p.chromium.launch()
    page = await browser.new_page()
    await page.goto("https://example.com/form")
    await page.fill('input[name="email"]', "user@example.com")
    await page.click('button[type="submit"]')
    # Breaks if the site redesigns the form; does not generalize

After — with browser-use: the LLM reads the page visually and adapts to layout changes without script updates.

# After: browser-use — agent navigates any site from a task description
from browser_use import Agent
from langchain_openai import ChatOpenAI

agent = Agent(
    task="Fill out the contact form with name 'Test User' and email 'test@example.com'",
    llm=ChatOpenAI(model="gpt-4o"),
)
result = await agent.run()

The productivity delta: The project README states browser-use “makes websites accessible for AI agents” by providing browser control without per-site script maintenance. The README notes the library works with any LLM via LangChain, and a cloud service is available for teams that want hosted browser sessions.
How it works: The library passes visual DOM state to the LLM, which generates action sequences (click, fill, scroll, navigate) based on the task description. No site-specific selectors are needed.
Where it breaks: Agents navigating visually are slower and more expensive per task than scripted automation. For deterministic, high-frequency workflows (thousands of daily runs), a maintained Playwright script remains cheaper. Browser-use’s value is highest for irregular tasks or sites that change layout frequently.

OpenHands/OpenHands — replacing the manual write-test-debug cycle with an autonomous coding agent

Before — the manual workflow: A developer reads a failing test, edits the function, re-runs the test suite, interprets the output, and repeats — context switching between editor, terminal, and ticket.
```
# Before: manual write-test-debug loop
vim src/parser.py
python -m pytest tests/test_parser.py -v
# Read failure output, return to editor, repeat until green
```

After — with OpenHands CLI:

# After: OpenHands handles the read-edit-test loop autonomously
openhands run --task "Fix the failing test in tests/test_parser.py; \
  the parse_config function is not handling null values in the options dict"
# OpenHands reads files, edits code, runs tests, interprets output, iterates

The productivity delta: The project README reports a 77.6% SWE-Bench score — a benchmark measuring autonomous resolution of real GitHub issues. The README links to the benchmark spreadsheet. This is a documented adoption signal: the agent resolves most well-specified coding tasks without a human in the loop.
How it works: OpenHands provides a sandboxed runtime where an AI agent reads files, edits code, runs test suites, and interprets terminal output. The README describes both a CLI for single tasks and an SDK for running agents at scale.
Where it breaks: An agent solution may be functionally correct but deviate from team coding conventions — naming, patterns, error handling idioms. Human review before merge is still required. The README SDK is designed to be composable, allowing teams to constrain the file scope available to the agent per task.

Theme 2: RAG Grew a Graph Spine

By early 2024, vector similarity search as the sole retrieval mechanism had a documented failure mode: questions requiring multi-hop reasoning — “how does A relate to B through C?” — returned isolated chunks rather than connected answers. Three repositories shipped in 2024 by adding a graph layer to the retrieval process, each targeting a different part of the problem: indexing, retrieval, and persistent agent memory.

microsoft/graphrag — entity graph extraction for multi-hop document retrieval

Before — the manual workflow: Standard RAG embeds document chunks and retrieves the top-k most similar chunks. Multi-hop questions fail because the answer requires traversing entity relationships that do not co-occur in any single chunk.

# Before: flat vector RAG — isolated chunks, no relational context
# Question: "What themes connect John's research and Mary's implementation work?"
# Vector search returns John's chunks OR Mary's chunks — not their intersection
# The relationship between them lives in neither chunk individually

After — with graphrag:

# After: graphrag indexes documents into an entity-relationship graph
pip install graphrag
python -m graphrag index --root ./my-documents
# Extracts entities, relationships, and community summaries via LLM calls
python -m graphrag query --root ./my-documents \
  --method global \
  --query "What themes connect all the research papers?"
# Graph traversal finds cross-document connections unavailable to vector search

The productivity delta: According to the README and the linked Microsoft Research blog post (arXiv 2404.16130), GraphRAG “unlocks LLM discovery on narrative and private data” by maintaining graph-structured knowledge that supports global query mode — summarizing across the entire corpus — which flat vector search cannot do.
How it works: GraphRAG runs an LLM-powered indexing pipeline that extracts named entities and relationships from each document, then organizes them into community clusters. At query time, graph traversal finds cross-document connections. The README notes two query modes: local (specific entity focus) and global (corpus-wide summarization).
Where it breaks: The README includes a direct warning: “GraphRAG indexing can be an expensive operation — please read all of the documentation and start small.” The LLM-powered extraction step runs at index time and costs proportionally to corpus size. Not suitable for large-scale indexing without cost controls in place first.

HKUDS/LightRAG — hybrid graph and vector retrieval from a single unified index

Before — the manual workflow: Teams running both semantic similarity and relationship traversal maintained two separate systems — a vector store and a graph database — each with its own ingestion pipeline, update cadence, and query interface.

# Before: two separate systems for two retrieval modes
# System 1: embed chunks → vector store → similarity search
# System 2: extract entities → graph DB → traversal queries
# Two pipelines to maintain; two sets of stale data to manage

After — with LightRAG: a single index supports vector similarity, graph traversal, and hybrid modes.

# After: LightRAG — one index, four retrieval modes
from lightrag import LightRAG, QueryParam

rag = LightRAG(working_dir="./rag_cache")
await rag.ainsert("path/to/documents/")

# Hybrid mode uses both vector similarity and graph traversal
result = await rag.aquery(
    "How does the new architecture affect the legacy system?",
    param=QueryParam(mode="hybrid")
)

The productivity delta: According to the project README and arXiv paper (2410.05779), LightRAG supports four retrieval modes — naive, local, global, and hybrid — from a single unified index. The engineer no longer maintains separate systems for queries that require different retrieval strategies.
How it works: LightRAG extracts a knowledge graph during ingestion, stores both graph edges and vector embeddings in a unified index, and routes each query to the appropriate retrieval mode. The paper was accepted at EMNLP 2025.
Where it breaks: The quality of the knowledge graph depends on the LLM used during indexing. Low-quality or poorly-prompted models produce noisy graph extractions that degrade retrieval for graph-dependent query modes. The embedding and graph extraction are both LLM calls — compute costs scale with corpus size.

getzep/graphiti — temporal knowledge graph for agent memory that handles facts that change over time

Before — the manual workflow: AI agents maintained context via a truncated message history. Facts from earlier sessions were lost when the history was trimmed. Contradictions between old and new facts accumulated with no mechanism to resolve which was current.

# Before: agent memory = message list, truncated at context limit
messages = []  # newest 20 messages; earlier facts are gone
# Session 1: "Project Alpha is in planning"
# Session 15: "Project Alpha shipped"
# Agent has no way to know which fact is currently true

After — with graphiti: each interaction adds to a temporal knowledge graph that tracks which facts are currently valid.

# After: graphiti maintains a temporal graph from agent episodes
from graphiti_core import Graphiti

graphiti = Graphiti("bolt://localhost:7687", "neo4j", "password")
await graphiti.add_episode(
    name="session_42",
    episode_body="Project Alpha shipped to production on January 15."
)
# Returns facts that are currently true — temporal contradictions resolved
facts = await graphiti.search("What is the current status of Project Alpha?")

The productivity delta: According to the README, Graphiti’s context graphs “track how facts change over time, maintain provenance to source data, and support both prescribed and learned ontology — making them purpose-built for agents operating on evolving, real-world data.” The agent no longer loses information at session boundaries or accumulates unresolved contradictions.
How it works: Graphiti extracts entities and relationships from each episode (agent interaction), stores them in a Neo4j graph, and marks temporal validity on each edge so queries return the currently-true state. The repo also includes an MCP server that lets Claude, Cursor, and other MCP-compatible clients use Graphiti as their memory backend.
Where it breaks: Graphiti requires a running Neo4j instance (or a compatible managed graph database). Teams without an existing graph database add a new infrastructure dependency. The temporal resolution quality depends on LLM entity extraction during the add_episode step.

Theme 3: Databases Gained a Native AI Interface

At the start of 2024, connecting a database to an LLM required writing a custom connector: one integration for Claude, another for Gemini, another for each new model. Three repositories removed that per-pairing work in 2024, each targeting a different layer of the database-to-AI interface.

googleapis/mcp-toolbox — one MCP server connecting any AI agent to any database

Before — the manual workflow: Each AI assistant required its own database integration. Adding a new model meant writing and maintaining a new connector in that model’s tool-calling format.

# Before: same database logic registered separately for each LLM
# For Claude: tool defined in Anthropic tool-use format
# For Gemini: same logic, different SDK, different schema format
# For new model: write it again
def search_products(name: str) -> list:
    conn = psycopg2.connect(DATABASE_URL)
    cursor.execute("SELECT * FROM products WHERE name ILIKE %s", (f"%{name}%",))
    return cursor.fetchall()

After — with mcp-toolbox: define tools once in YAML; any MCP-compatible client connects.

# After: toolbox_config.yaml — write once, connect from any MCP client
sources:
  products-db:
    kind: postgres
    host: ${DB_HOST}
    database: products
tools:
  search-products:
    kind: postgres-sql
    source: products-db
    description: "Search products by name"
    parameters:
      - name: query
        type: string
        description: "Product name search term"
    statement: SELECT id, name, price FROM products WHERE name ILIKE $1

toolbox serve --tools-file toolbox_config.yaml
# Claude Code, Gemini CLI, and other MCP clients — all connect; no per-client code

The productivity delta: According to the README, mcp-toolbox “serves a dual purpose: a ready-to-use MCP server that instantly connects AI clients to databases, and a robust framework to build specialized AI tools for production agents.” The tool definition is written once and serves all connected clients.
How it works: The server implements the Model Context Protocol and exposes database-backed tools via a standardized interface. Supported databases per the README topics and description include BigQuery, Spanner, PostgreSQL, MySQL, Redis, Firestore, MongoDB, Elasticsearch, Oracle, ClickHouse, CockroachDB, and TiDB.
Where it breaks: The README notes that custom tools require careful parameterization to prevent SQL injection — the framework does not automatically sanitize inputs. Every tool definition needs a security review before it is exposed to a production agent.

Canner/WrenAI — semantic context layer that teaches AI agents what business data means

Before — the manual workflow: NL2SQL prompts included raw schema dumps — table names, column names — and relied on the LLM to infer business meaning. Queries crossing multiple tables or depending on business-specific definitions (revenue = net amount after refunds) produced plausible but wrong SQL.

-- Before: LLM infers semantics from raw schema; gets the shape right, the logic wrong
-- Context given: "orders(id, customer_id, amount, refund_amount, created_at)"
-- Question: "Who are our top customers by revenue?"
-- LLM output: SELECT customer_id, SUM(amount) FROM orders GROUP BY 1 ORDER BY 2 DESC
-- Wrong: uses gross amount; no customer name join; no quarter filter

After — with WrenAI: the semantic model defines what data means; agents query through the context layer.

# After: WrenAI semantic context layer
pip install wrenai
# Semantic model defines: revenue = amount - refund_amount; customer name from customers table
wren ask "Who are our top 10 customers by net revenue this quarter?"
# WrenAI resolves semantics, generates correct SQL, returns verified results

The productivity delta: According to the README, WrenAI is “the open context layer for AI agents over business data — your agent doesn’t know what your data means. We fix that.” The semantic layer prevents the class of wrong-but-plausible SQL that schema-only prompting produces.
How it works: WrenAI maintains a semantic layer (MDL — Modeling Definition Language) that maps business concepts to the underlying schema. AI agents query through this layer rather than against raw tables, and the engine translates natural language into semantically-grounded SQL.
Where it breaks: The semantic model requires manual maintenance when the underlying schema changes. If a column is renamed or a business definition shifts, the MDL needs to be updated separately — it does not automatically sync from schema migrations.

timescale/pgai — automatic vector embeddings and semantic search inside PostgreSQL

Before — the manual workflow: AI applications maintained an external embedding pipeline — call the embedding API on new or updated rows, push embeddings to a separate vector store, handle synchronization failures, manage stale embeddings when source data changed.

# Before: external embedding pipeline decoupled from source data
def sync_embeddings():
    rows = db.execute(
        "SELECT id, text FROM docs WHERE updated_at > %s", (last_sync,)
    )
    for row in rows:
        embedding = openai.embeddings.create(
            input=row.text, model="text-embedding-3-small"
        )
        vector_store.upsert(row.id, embedding.data[0].embedding)
    # Runs on a cron; stale embeddings accumulate during API outages

After — with pgai: the vectorizer runs inside PostgreSQL, triggered automatically by data changes.

# After: pgai vectorizer — embeddings stay synchronized inside the database
import pgai

vectorizer = pgai.create_vectorizer(
    "docs",
    destination="docs_embeddings",
    embedding=pgai.openai_embedding("text-embedding-3-small", 1536),
    chunking=pgai.character_text_splitter(chunk_size=800),
)
# pgai workers re-embed automatically when docs data changes
# Query with standard SQL + pgvector; no separate vector store to operate

The productivity delta: According to the README, pgai “automatically creates and synchronizes vector embeddings from PostgreSQL data and S3 documents” with “embeddings [that] update automatically as data changes.” The external sync cron and its stale-embedding handling are eliminated.
How it works: pgai installs as a Python package with database components. Stateless vectorizer workers watch for data changes via the configuration, process a queue, and write embeddings back to PostgreSQL. The README notes the architecture “decouples data modifications from the embedding process so failures in the embedding service do not affect core data operations.” Works with any PostgreSQL — RDS, Supabase, Timescale Cloud (all cited in the README).
Where it breaks: pgai requires deploying and operating vectorizer worker processes alongside the database. For managed PostgreSQL deployments, the worker is an additional compute process with its own health monitoring. The decoupling means a worker outage stops embedding updates without affecting read/write on the underlying data — correct behavior, but the queue lag needs independent observability.

Year-over-Year Signal

Domain	Manual task at year start	Status at year end	What drove the change
System design — web	Per-site Playwright automation for web tasks	Replaced for irregular tasks by browser-use; scripted automation still cost-effective for deterministic high-frequency flows	browser-use shipped Oct 2024; LLM vision quality crossed a usability threshold
System design — AI connectors	Custom per-LLM per-database connector code	Partially standardized via MCP; mcp-toolbox unifies 11+ databases under one server definition	Model Context Protocol gained cross-vendor adoption in 2024
System design — RAG	Flat vector search as the default retrieval mechanism	Graph-augmented retrieval available via graphrag and LightRAG; production adoption still early for most teams	graphrag shipped Mar 2024, LightRAG Oct 2024; peer-reviewed research backed both
Databases	External embedding pipeline with manual sync	Automated for PostgreSQL stacks by pgai vectorizer	pgai shipped May 2024 with synchronization as a first-class design goal
Databases — NL2SQL	Schema-dump prompting for text-to-SQL	Semantic layer approach available via WrenAI; eliminates the class of wrong-but-plausible SQL from schema inference	WrenAI’s MDL provides business-concept grounding that raw schema prompting cannot
Infrastructure	Redis as the community default distributed cache	Valkey (25,887 stars) forked and became an LF project; migration from Redis ongoing across the ecosystem	Redis changed its license to SSPL and RSALv2 in March 2024

In Practice

Theme 1 — Agents as Operators: firecrawl’s P95 latency figure (3.4s), proxy handling description, and 96% web coverage are stated in the README. OpenHands’ 77.6% SWE-Bench score appears in the README badge with a link to the benchmark spreadsheet. Browser-use’s LLM-driven navigation model is described in the quickstart. I have not run OpenHands on a production codebase; the SWE-Bench score measures autonomous issue resolution on a curated benchmark, not arbitrary production work — it is an adoption signal, not a deployment guarantee.
Theme 2 — RAG with Graph: GraphRAG’s entity extraction and query modes are described in the README and arXiv 2404.16130. LightRAG’s four retrieval modes are in the README and arXiv 2410.05779 (EMNLP 2025 accepted). Graphiti’s temporal graph, provenance tracking, and MCP server are described in the README. I have not verified graph extraction quality at production corpus sizes; the warning about indexing cost in graphrag’s README reflects a real, documented constraint.
Theme 3 — Databases Go AI-Native: mcp-toolbox’s supported database list (11+) is in the GitHub topics and README. pgai’s vectorizer architecture is described in the README including the architecture diagram and the decoupling design rationale. WrenAI’s semantic layer approach is described in the README tagline and documentation links. I have not run any of these three in production; pgai requires self-managed vectorizer workers that add operational overhead not visible in the quickstart.

Productivity Scorecard

Tool	Theme	Domain	Eliminated Task	Documented Impact	Maturity
firecrawl/firecrawl	Agents as Operators	System Design	Per-site scraping pipeline	”Handles rotating proxies, rate limits, JS-blocked content — zero configuration” (README)	GA
browser-use/browser-use	Agents as Operators	System Design	Per-site Playwright automation	”Makes websites accessible for AI agents” (README); hosted cloud available	GA
OpenHands/OpenHands	Agents as Operators	Developer Productivity	Write-test-debug loop	77.6% SWE-Bench score (README badge; spreadsheet linked)	GA
microsoft/graphrag	RAG with Graph	System Design	Multi-hop RAG via flat vector search	”Unlocks LLM discovery on narrative private data” (MS Research blog, linked in README)	GA
HKUDS/LightRAG	RAG with Graph	System Design	Separate vector and graph indexes	4 unified retrieval modes; EMNLP 2025 paper (arXiv 2410.05779)	GA
getzep/graphiti	RAG with Graph	System Design	Truncated message-list agent memory	”Tracks how facts change over time, maintains provenance” (README)	GA
googleapis/mcp-toolbox	Databases Go AI-Native	Databases	Per-LLM per-database connector code	”Instantly connect AI clients to 11+ databases” (README); Apache 2.0	GA
Canner/WrenAI	Databases Go AI-Native	Databases	Schema-dump NL2SQL prompting	”Agent doesn’t know what data means. We fix that.” (README); Apache 2.0	GA
timescale/pgai	Databases Go AI-Native	Databases	External embedding sync pipeline	”Automatically creates and synchronizes vector embeddings as data changes” (README)	GA

Where It Breaks

Failure mode	Trigger	Fix
graphrag indexing cost exceeds budget	LLM extraction runs against a large corpus without cost controls	Per the README: “start small.” Set per-run token budgets; test on a 50-document subset before indexing the full corpus
browser-use agent slower than scripted automation	High-frequency, deterministic web workflow running thousands of times per day	Use Playwright for predictable, high-volume flows; reserve browser-use for irregular or layout-change-prone tasks
firecrawl self-hosted proxy pool requires maintenance	Team self-hosts to avoid API rate limits and per-page costs	Evaluate hosted-service pricing vs. proxy infrastructure ops; the hosted tier removes the maintenance burden the README describes as “handled”
WrenAI semantic layer drifts after schema migration	Column renamed or table structure changed outside WrenAI’s MDL	Treat schema changes as requiring a semantic layer update; add MDL review to the migration checklist
pgai vectorizer worker outage causes embedding queue lag	Embedding API outage or worker process crash	Per README design: data writes are unaffected. Monitor vectorizer queue depth independently; alert when lag exceeds acceptable staleness for the use case
OpenHands agent generates correct but unconventional code	Agent produces code that passes tests but violates team conventions	Require human PR review before merge; use the SDK to constrain file scope available to the agent
LightRAG graph quality degrades on noisy input	Low-quality LLM used for indexing, or poorly structured input documents	Use the highest-quality available model for indexing (separate from the query model); re-index if retrieval quality drops
mcp-toolbox write-capable tool exposed to production agent	Custom tool allows INSERT or UPDATE without row-level restrictions	Restrict all production mcp-toolbox tools to read-only SQL; implement an explicit approval workflow before any write-capable tool is connected to a live agent
OpenHands coding agent + mcp-toolbox write access — agent runs DDL against production database	Agent generates schema-altering SQL via a write-capable mcp-toolbox tool	Scope mcp-toolbox to read-only connections; run OpenHands in sandbox environments isolated from production database write paths

What to Carry into 2025

Problem: The operator layer arrived in 2024 — agents can now act on websites, codebases, and databases — but agent memory and long-term context management remain fragile. Graphiti and graphrag solve parts of the problem, but production-grade multi-session agent memory with reliable temporal reasoning is not yet a solved category. The gap going into 2025 is persistent agent state at production scale.
Solution: Three tools to evaluate now, one per domain, each GA with documented production readiness: browser-use for web-operating agents where site-specific scripting is the bottleneck (system design), pgai for teams maintaining an external embedding cron that drifts from source data (databases), and mcp-toolbox for teams that have written the same database connector more than twice across different AI integrations (databases and platform).
Proof: After 60 days on pgai, the embedding sync cron job should be gone. The vectorizer queue lag metric (observable in the tables pgai creates in PostgreSQL) replaces the custom pipeline monitor. If the cron still runs in parallel, the migration is incomplete and the team is operating two sources of truth for embeddings.
Action: Install pip install pgai, run pgai install against a development PostgreSQL instance, and create one vectorizer over the table you currently embed externally. Run both pipelines in parallel for two weeks and compare the embedding freshness and error rates. The first place they diverge will show exactly what the external pipeline was doing wrong — and whether pgai’s architecture handles it correctly for your workload.