Agriculture is facing an extreme labor shortage, where the number of people who are interested and capable of operating heavy machinery is declining every year. At the same time, demand for food increases as the world’s population grows. Clearly, this industry would benefit from autonomy. So, why can you ride in a self-driving car in multiple US cities today, but the food you eat was probably harvested by manually operated machines?
At Bonsai Robotics, we’re building the world’s most advanced AI for outdoor autonomy, starting with agriculture.
Why On-Road Autonomy Doesn’t Transfer Off-Road
Consider that for something that sounds simple like harvesting almonds, at least five different types of heavy machinery are used (and none of them are tractors!) These machines kick up dense dust and constantly make contact with solid objects like dangling branches and high brush—conditions where traditional autonomy stacks will stop short out of an abundance of caution.
Worse, traditional autonomy stacks are generally designed to do an extremely good job at controlling one class of vehicle, which has limited automation in agriculture to a narrow set of high-use form factors like tractors.
To add more complexity, off-road autonomy isn’t as simple as driving from point A to point B. To harvest almonds, growers use tree shakers with huge hydraulically-driven gripper arms to rigidly grasp each tree and shake the nuts free. The shaker arm needs to precisely grasp the tree or risk stripping the bark off, lowering the number of nuts the tree will produce next season. And, the arm needs to be safely guided around expensive irrigation infrastructure but confidently pushed through tall brush and low hanging branches that block its path. No traditional autonomy stack is able to complete this task end-to-end.
What Off-Road Autonomy Actually Requires
We believe that a great off-road autonomy stack must include three components that are missing from models used traditional autonomy stacks:
- First-class support for multiple form factors
- Rich semantic, spatial, and temporal understanding under extreme conditions
- Whole-vehicle control of both driving and manipulation
How We’re Solving This
To solve all of these problems, Bonsai Intelligence is designed from the ground up to generalize to multiple vehicle form factors, handle challenging conditions like heavy dust with ease, and complete manipulation tasks at the same time as driving a vehicle. By collecting a large dataset of over 30 million samples and training a state of the art spatiotemporal transformer model, Bonsai Intelligence will be able to reliably aggregate information from multiple camera views over time. This enables us to remember whether the space behind a temporary dust cloud that suddenly blocks our view is safe to drive through or not.
Unlike on-road autonomy stacks that produce simple occupied versus free voxel grids, our next-generation system predicts multi-class semantic occupancy that enables us to reason about which classes of solid objects we can make contact with, and which we cannot. By fusing information across space and time, we can reason about whether dangling branches in front of the vehicle have a solid tree trunk behind them, or whether they are safe to push through. While on-road autonomy stacks can assume that any grade on a paved road is traversable, whether a vehicle can climb a hill in off-road autonomy depends on the vehicle itself.
Our next-generation system also predicts elevation maps, enabling our system to understand that while a steep grade may look traversable semantically, it may be too steep for the vehicle to climb. In addition, we can do this all from simple camera images, enabling automation of every machine used in agriculture, including tree shakers (which would literally shake even the most ruggedized lidar sensor apart).
Orchard in Australia uses OMC Shockwave Xs enabled with Bonsai Intelligence to autonomously shake trees and harvest almonds
From Perception to Action: Vision-Language-Action Control
Rather than just predicting perception results, we have developed a vision-language-action (VLA) model to predict actuator controls in real time using flow matching with real-time action chunking. By leveraging the rich semantic, temporal, and spatial features from the perception encoder as input, we will be able to control the world’s largest and most powerful robot gripper arms with confidence and ease. We believe that by leveraging our large scale dataset across multiple vehicle form factors and deployments, we will be able to train a model to generalize to control any type of heavy equipment, including excavators, loaders, and even cranes. Recent advances have shown not just that this kind of generalization is possible, but that cross-embodiment training may actually be required to truly solve robotics.
To our knowledge, no other company in the world has:
- a dataset spanning this wide range of embodiments and environments
- a deployed fleet continuously expanding that dataset.
- Over 45+ deployed vehicles, all feeding data back into our training pipeline to make our models even better.
Through our partnership with Anyscale, we train using the same infrastructure as frontier AI labs and groundbreaking robot manipulation companies, but apply it to multi-ton hydraulic machines operating in the real world.
The Bitter Lesson of Outdoor Autonomy
To those in the on-road autonomous vehicle industry, our system may seem impressive—but still only a moderate extension of next-generation on-road AV stacks. With such a huge opportunity in off-road autonomy, you might think that the obvious play for existing on-road AV companies would be to extend into off-road. So, why aren’t the existing AV companies doing this?
The “bitter lesson” in AI refers to the observation that machine learning models trained on huge amounts of data with a high parameter count and large amounts of compute tend to out-perform expert systems. While the compute and model size for outdoor autonomy have increased in recent years, the bitter lesson in outdoor autonomy is that publicly available data has not. While collecting over 10 million samples of on-road data is not easy, it’s not super difficult either—public roads are freely accessible to everyone, anytime, and at all seasons of the year. The same is absolutely not true for farms, mines, construction sites, or any outdoor environment where autonomy is in demand.
Heavy machinery is operated on private property and most equipment owners see an AI company mounting data collection sensors to their machinery mostly as a risk to their productivity, not an asset. This creates a chicken and egg problem—any grower would gladly let you mount your sensors on their equipment if you could guarantee you could automate their vehicle and increase their productivity. But, your autonomy stack will never work without a large dataset to train the AI models, and you can’t collect a dataset without sensors mounted to growers’ equipment! Furthermore, many growers have been burned in the past by robotics companies with a traditional approach to autonomy that failed to deliver the promised productivity benefits. As such, they are often hesitant to collaborate at all.
By combining people with deep expertise in AI and autonomy with people who have literally run farms and operated heavy equipment for a living, we’ve been able to break the cycle and ship the world’s most advanced AI model for outdoor autonomy.
The Sky’s Not the Limit
The human form factor is awesome (and makes imitation learning approaches much easier.) But we believe that ultimately, robotics will follow the same trajectory as heavy machinery, meaning a diverse set of many specialized form factors.
Humanoids will be an awesome luxury product for those lucky enough to afford one and bring it home—but they will be limited to places meant for people, like homes and hospitals. Everything around you was made from materials that were either mined or grown using heavy machinery, far from homes and hospitals—and the future will take this to the next level.
Robots with form factors we cannot imagine will construct orbital structures, bases on the Moon and Mars, and discover new worlds. We hope that they will all be powered by Bonsai Intelligence.
Hardware is hard, and so we are taking the very first step toward this exciting future of new form factors with a much smaller step. The Amiga is shipping now, an autonomous robot for agriculture unconstrained by the need for a cab, or operator comforts like air conditioning. It expands our reach to every crop on Earth, and also opens the door to broader off-road applications.
Amiga Flex with WEED-IT precision sprayer makes its debut at World Ag Expo 2026. It detects weeds using chlorophyll reflection, enabling 80–90% herbicide reduction to lower input costs. Integrated with the electric Amiga Flex and Bonsai Intelligence, the combined solution supports healthier people, crops and soils.
Join Us
If you’re interested in building the world’s most advanced AI for outdoor autonomy, we’re hiring across AI, autonomy, and hardware. Be a part of what creating what comes next.
John Macdonald is the Head of Perception at Bonsai Robotics. Having been convinced of the potential of robotics from a young age, he got his start in competitive robotics in grade school and hasn’t looked back. Previously, he worked on deep learning based obstacle avoidance and 3D reconstruction at Skydio. Before that, he worked on autonomous organic crop navigation & monitoring at Carnegie Mellon University. John holds an M.S. in Robotic Systems Development from Carnegie Mellon University and a B.S. in Computer Science from Cornell University.
