Building ReachyArchi: A Voice-Driven Robotic AWS Solutions Architect
How we combined a Reachy humanoid robot with Amazon Bedrock Nova Sonic to create an AI-powered Solutions Architect for AWS Summits
Table of Contents
The Problem
At AWS Summits, 10,000+ attendees compete for time with only 12 “Ask an Architect” booths. Wait times of 30+ minutes are common, and many attendees leave without getting architecture guidance.
What if we could scale the Solutions Architect experience using AI and robotics?
The Solution
ReachyArchi is an AI-powered robotic Solutions Architect that combines:
- A Reachy humanoid robot from Pollen Robotics
- Amazon Bedrock’s Nova Sonic model for real-time voice conversations
- The Strands Agents SDK BidiAgent framework for bidirectional streaming
- Real-time architecture diagram generation
The result: instant, personalized AWS architecture consultations with an engaging robotic presence.
Architecture Overview
Why Single BidiAgent?
After evaluating multiple agentic patterns, we chose Single BidiAgent:
| Pattern | Verdict | Rationale |
|---|---|---|
| Single BidiAgent | Chosen | Best voice performance, no handoff latency |
| Graph | Rejected | Overkill for mostly linear conversation flow |
| Swarm | Rejected | No parallel independent agents needed |
| Hierarchy | Rejected | Single agent handles all phases efficiently |
The bidirectional streaming model is essential for voice interactions - it allows continuous audio input/output while the agent reasons and calls tools concurrently.
The Integration Challenge
Tools run on Amazon Bedrock AgentCore Runtime, but the Reachy SDK runs locally on the robot. The solution: WebSocket Command Events.
┌─────────────────────┐ WebSocket ┌─────────────────────┐
│ AgentCore (AWS) │◄──────────────────►│ Reachy Mini │
│ - BidiAgent │ robot_command │ - SDK Control │
│ - Robot Tools │ {action, params} │ - Audio I/O │
│ - Arch Tools │ │ - Motor Execution │
└─────────────────────┘ └─────────────────────┘
Tools send JSON command events; the client translates them to SDK calls.
Implementation Deep Dive
Robot Tools: Cloud-to-Hardware Bridge
Each tool is an async function that sends commands via WebSocket - never importing the Reachy SDK directly:
@tool
async def nod_yes() -> str:
"""Nod to show agreement or understanding."""
await _send({
"type": "robot_command",
"action": "animation",
"params": {"name": "nod_yes"}
})
return "Nodded yes"
This pattern cleanly separates cloud reasoning from local hardware control.
System Prompt Engineering
The system prompt ensures ReachyArchi is expressive, not just a voice assistant:
CRITICAL: TOOL CALLING
ALWAYS call robot tools - NEVER just say their names. Every response needs movement!
- WRONG: Saying "wave_hello" or "I'm nodding"
- CORRECT: Actually invoke wave_hello() as function calls
MOVEMENT RULES - EVERY RESPONSE!
Call at least one robot tool per response to feel alive.
Barge-In Handling
Voice UX requires handling interruptions gracefully:
model = BidiNovaSonicModel(
provider_config={
"turn_detection": {
"endpointingSensitivity": "HIGH" # Fast barge-in for booth demos
}
}
)
When a user speaks mid-response, the client clears the audio buffer immediately and processes new input.
6-Phase Conversation Flow
ReachyArchi follows a state machine for structured interactions:
IDLE → GREETING → INCEPTION → DESIGN → ITERATION → DELIVERY → FAREWELL
↑ │
└───────────┘ (needs_more_info)
- GREETING: Wave hello, introduce self in French
- INCEPTION: Ask 1-2 targeted questions (tilt head with
look_curious()) - DESIGN: Generate PNG + JSON diagrams in parallel
- ITERATION: Refine based on feedback
- DELIVERY: Generate QR code for companion app
- FAREWELL: Wave goodbye, reset session
Demo in Action
Watch ReachyArchi in action: Demo Video on YouTube
A 90-second interaction showcases the full experience:
[User] "Bonjour Reachy!"
[Reachy] *waves and responds in French*
[User] "I want to build a mobile app with a REST API and a database"
[Reachy] "What type of workload? Serverless or containers?"
[User] "Serverless, high traffic"
[Reachy] "I recommend [API Gateway](https://aws.amazon.com/api-gateway/), [Lambda](https://aws.amazon.com/lambda/), and [DynamoDB](https://aws.amazon.com/dynamodb/). Generate the diagram?"
[User] "Oui!"
[Reachy] *generates architecture diagram - React frontend updates live*
[User] "Reachy, tu connais Werner Vogels?"
[Reachy] *dances* "Everything fails, all the time!"
The companion app updates in real-time via AWS AppSync as diagrams are generated.
Key Takeaways
-
WebSocket command events: Cleanly separate cloud AI from local hardware. Tools send JSON, clients execute - no SDK imports in cloud code.
-
Explicit tool invocation prompts: LLMs may “describe” tool calls instead of executing them. Be explicit: “ALWAYS call tools, NEVER just say their names.”
-
HIGH barge-in sensitivity: Essential for natural booth conversations. Users will interrupt - handle it gracefully.
Try It Yourself
The project is open source! Check out the code and try it yourself:
- GitHub: github.com/agiusalexandre/reachyarchi
- Demo Video: Watch on YouTube
Tech stack:
- Strands Agents SDK for BidiAgent framework
- Amazon Bedrock with Nova Sonic model
- Reachy SDK for robot control
- React Flow for diagram visualization
Check out the Strands Agents documentation to build your own voice-driven agent.
Do It Yourself
Key takeaways
- WebSocket commands, not SDK imports — Cloud tools should emit JSON command events, not directly control hardware. This separation allows local clients to translate commands to SDK calls while keeping cloud code hardware-agnostic.
- Explicit tool prompts prevent LLM “descriptions” — Models will often say “I’m waving” instead of calling
wave_hello(). Add clear instructions: “ALWAYS call tools, NEVER just describe them.” - Barge-in sensitivity is critical for voice UX — Set
endpointingSensitivityto HIGH for natural conversations. Users will interrupt, and handling it gracefully makes or breaks the experience.
Try it now
-
Build a BidiAgent voice app: Start with the Strands Agents quickstart to create your first bidirectional streaming agent with Amazon Bedrock Nova Sonic.
-
Clone and run ReachyArchi: Fork the ReachyArchi GitHub repo and follow the setup guide to run it locally with a simulated robot (no hardware required).
-
Explore Bedrock voice models: Test Nova Sonic’s voice capabilities using the Bedrock Converse API examples — experiment with turn detection settings and tool calling patterns.
Have questions? Connect with me on LinkedIn or check out more posts on agiusalexandre.com.
ONE LETTER A MONTH · NO TRACKER · UNSUBSCRIBE ANYTIME
Comments
Sign in to leave a comment
