How Bunnyshell Built a Multi-Agent System to Automate Containerization (Hopx / MACS)

At Bunnyshell, we’re building the environment layer for modern software delivery. One of the hardest problems our users face is converting arbitrary codebases into production-ready environments, especially when dealing with monoliths, microservices, ML workloads, and non-standard frameworks.

To solve this, we built MACS: a multi-agent system that automates containerization and deployment from any Git repo. With MACS, developers can go from raw source code to a live, validated environment in minutes, without writing Docker or Compose files manually.

In this post, we’ll share how we architected MACS internally, the design patterns we borrowed, and why a multi-agent approach was essential for solving this problem at scale.

Problem: From Codebase to Cloud, Automatically

Containerizing an application isn’t just about writing a Dockerfile. It involves:

Analyzing unfamiliar codebases
Detecting languages, frameworks, services, and DBs
Researching Docker best practices (and edge cases)
Building and testing artifacts
Debugging failed builds
Composing services and deploying environments

This process typically takes hours or days for experienced DevOps teams. We wanted to compress it to minutes, with no human intervention.

The Multi-Agent Approach

Similar to Anthropic’s research assistant and other cognitive architectures, we split the problem into multiple specialized agents, each responsible for a narrow set of capabilities. Agents operate independently, communicate asynchronously, and converge on a working deployment through iterative refinement.

Our agent topology:

Agent	Responsibility
Orchestrator	Breaks goals into atomic tasks, tracks plan state
Delegator	Manages task distribution and parallelism
Analyzer	Performs static & semantic code analysis
Researcher	Queries web resources for heuristics and Docker patterns
Executor	Builds, tests, and validates artifacts
Memory Store	Stores past runs, diffs, artifacts, logs

This modular architecture enables robustness, parallel discovery, and reflexive self-correction when things go wrong.

Pipeline Flow

Each repo flows through a pipeline of loosely-coupled agent interactions:

Initialization
A Git URL is submitted via UI, CLI or API
The system builds a contextual index: file tree, README, CI/CD hints, existing Dockerfiles
Planning
The Orchestrator builds a goal tree: identify components, generate artifacts, validate outputs
Delegator breaks tasks into subtrees and assigns to Analyzer/Researcher in parallel
Discovery
Analyzer inspects the codebase: detects Python, Node.js, Go, etc., plus frameworks like Flask, FastAPI, Express, etc.
Researcher consults external heuristics (e.g., “best Dockerfile for Django + Celery + Redis”)
Synthesis
Executor generates Dockerfile and Compose services
Everything is run in ephemeral Docker sandboxes
Logs and test results are collected
Refinement
Failures trigger self-prompting and diff-based retries
Agents update their plan and try again
Transformation
Once validated, Compose files are converted into bunnyshell.yml
Environment is deployed on our infrastructure
A live URL is returned

Memory & Execution Traces

Unlike simpler systems, we separate planning memory from execution memory:

Planning Memory (Orchestrator): Tracks reasoning paths, subgoals, dependencies
Execution Memory (Executor): Stores validated artifacts, performance metrics, diffs, logs

Only Executor memory is persisted across runs, this allows us to optimize for reuse and convergence without bloating the planning context.

Implementation Details

Models:
- Orchestrator: GPT-4.1 (high-context)
- Sub-agents: 3B–7B domain-tuned models
Runtime:
- Each agent runs in an ephemeral Docker container with CPU/RAM/network caps
Observability:
- Full token-level tracing of prompts, responses, API calls, build logs
- Used for debugging, auditing, and improving agent behavior over time

Why Multi-Agent?

We could have built MACS as a single LLM chain, but this quickly broke down in practice. Here’s why we went multi-agent:

Parallelism: Analyzer and Researcher run concurrently to speed up discovery
Modular reasoning: Each agent focuses on a narrow domain of expertise
Error isolation: Build failures don’t halt the planner—they trigger retries
Reflexivity: Agents can revise their plans based on test results and diffs
Reusability: Learned solutions are reused across similar projects

What We’ve Learned

Multi-agent debugging is hard: you need good observability, logs, and introspection tools.
Robustness beats optimality: our system favors “works for 95%” over exotic edge-case perfection.
Emergent behavior happens: some of the most efficient retry paths were not explicitly coded.
Boundaries matter: defining clean interfaces (e.g., JSON messages) between agents pays off massively.

What’s Next

We’re expanding MACS with:

Better multi-language support (Polyglot repo inference)
Orchestrator collaboration (multi-planner mode)
Plugin SDKs for self-hosted agents and agent fine-tuning

Our north star: a fully autonomous DevOps layer, where developers focus only on code - and the system handles the rest.

Want to try it?

You need only to paste your repo. Hopx by Bunnyshell instantly turns it into production-ready containers.

Try it now

Building a Multi-Agent Containerization System at Bunnyshell