At Bunnyshell, we’re building the environment layer for modern software delivery. One of the hardest problems our users face is converting arbitrary codebases into production-ready environments, especially when dealing with monoliths, microservices, ML workloads, and non-standard frameworks.
To solve this, we built MACS: a multi-agent system that automates containerization and deployment from any Git repo. With MACS, developers can go from raw source code to a live, validated environment in minutes, without writing Docker or Compose files manually.
In this post, we’ll share how we architected MACS internally, the design patterns we borrowed, and why a multi-agent approach was essential for solving this problem at scale.
Problem: From Codebase to Cloud, Automatically
Containerizing an application isn’t just about writing a Dockerfile. It involves:
- Analyzing unfamiliar codebases
- Detecting languages, frameworks, services, and DBs
- Researching Docker best practices (and edge cases)
- Building and testing artifacts
- Debugging failed builds
- Composing services and deploying environments
This process typically takes hours or days for experienced DevOps teams. We wanted to compress it to minutes, with no human intervention.
The Multi-Agent Approach
Similar to Anthropic’s research assistant and other cognitive architectures, we split the problem into multiple specialized agents, each responsible for a narrow set of capabilities. Agents operate independently, communicate asynchronously, and converge on a working deployment through iterative refinement.
Our agent topology:
Agent | Responsibility |
---|---|
Orchestrator | Breaks goals into atomic tasks, tracks plan state |
Delegator | Manages task distribution and parallelism |
Analyzer | Performs static & semantic code analysis |
Researcher | Queries web resources for heuristics and Docker patterns |
Executor | Builds, tests, and validates artifacts |
Memory Store | Stores past runs, diffs, artifacts, logs |
This modular architecture enables robustness, parallel discovery, and reflexive self-correction when things go wrong.
Pipeline Flow
Each repo flows through a pipeline of loosely-coupled agent interactions:
- Initialization
A Git URL is submitted via UI, CLI or API
The system builds a contextual index: file tree, README, CI/CD hints, existing Dockerfiles - Planning
The Orchestrator builds a goal tree: identify components, generate artifacts, validate outputs
Delegator breaks tasks into subtrees and assigns to Analyzer/Researcher in parallel - Discovery
Analyzer inspects the codebase: detects Python, Node.js, Go, etc., plus frameworks like Flask, FastAPI, Express, etc.
Researcher consults external heuristics (e.g., “best Dockerfile for Django + Celery + Redis”) - Synthesis
Executor generates Dockerfile and Compose services
Everything is run in ephemeral Docker sandboxes
Logs and test results are collected - Refinement
Failures trigger self-prompting and diff-based retries
Agents update their plan and try again - Transformation
Once validated, Compose files are converted into bunnyshell.yml
Environment is deployed on our infrastructure
A live URL is returned
Memory & Execution Traces
Unlike simpler systems, we separate planning memory from execution memory:
- Planning Memory (Orchestrator): Tracks reasoning paths, subgoals, dependencies
- Execution Memory (Executor): Stores validated artifacts, performance metrics, diffs, logs
Only Executor memory is persisted across runs, this allows us to optimize for reuse and convergence without bloating the planning context.
Implementation Details
- Models:
- - Orchestrator: GPT-4.1 (high-context)
- - Sub-agents: 3B–7B domain-tuned models
- Runtime:
- - Each agent runs in an ephemeral Docker container with CPU/RAM/network caps
- Observability:
- - Full token-level tracing of prompts, responses, API calls, build logs
- - Used for debugging, auditing, and improving agent behavior over time
Why Multi-Agent?
We could have built MACS as a single LLM chain, but this quickly broke down in practice. Here’s why we went multi-agent:
- Parallelism: Analyzer and Researcher run concurrently to speed up discovery
- Modular reasoning: Each agent focuses on a narrow domain of expertise
- Error isolation: Build failures don’t halt the planner—they trigger retries
- Reflexivity: Agents can revise their plans based on test results and diffs
- Reusability: Learned solutions are reused across similar projects
What We’ve Learned
- Multi-agent debugging is hard: you need good observability, logs, and introspection tools.
- Robustness beats optimality: our system favors “works for 95%” over exotic edge-case perfection.
- Emergent behavior happens: some of the most efficient retry paths were not explicitly coded.
- Boundaries matter: defining clean interfaces (e.g., JSON messages) between agents pays off massively.
What’s Next
We’re expanding MACS with:
- Better multi-language support (Polyglot repo inference)
- Orchestrator collaboration (multi-planner mode)
- Plugin SDKs for self-hosted agents and agent fine-tuning
Our north star: a fully autonomous DevOps layer, where developers focus only on code - and the system handles the rest.

Want to try it?
You need only to paste your repo. Hopx by Bunnyshell instantly turns it into production-ready containers.