Had A Bad Day

The Day PLTcloud Went Down—And What We Learned

What happens when the automated test stations on production lines from Peoria to Penang suddenly lose their connection to the cloud? On May 27, 2025, we found out.

At Blue Clover, we take system uptime seriously. Our customers rely on us to keep their production lines running smoothly, and we treat that responsibility with the urgency it deserves. But that day, we had a moment of failure—a red bar on our uptime graph that marked the only major outage in our history.

We use Uptime Robot to monitor the status of our cloud backend, known as PLTcloud. This system is the backbone of our automated testing platform. It’s where customers upload firmware and test plans for deployment, and where the resulting test reports from the production line are stored. When the connection to PLTcloud is disrupted, every Production Line Tool (PLT) around the world shows it—LEDs flash red, error messages appear, and operators on the line notice immediately.

On May 27, all of that happened.

The root cause was a database overflow, and we've since updated our AWS configuration to ensure this specific failure won't happen again. But prevention isn’t enough—we’re also building in resilience.

We’re adding a key safeguard called Store and Forward. If PLTcloud goes offline, PLTs keep running tests—storing results like mountain snowpack holds water, ready to flow when the cloud returns. Encrypted test data is held locally on each PLT’s 1TB solid-state drive and syncs automatically once reconnected—reducing data loss and preventing costly production downtime.

The reality is, perfect uptime is not possible without infinite cost. Our mission is to make automated testing affordable and reliable for the real-world companies building the hardware we all depend on. We work hard to strike the right balance between high availability and cost-effectiveness. Scaling improves both—and with that in mind, our next milestone is a million tests a month.

We believe we are blazing a trail in reducing the cost per test of production-level test systems, and we have the data to prove it—because every test station is connected to PLTcloud.

For any team deploying a cloud-connected service, "what ifs" are not optional—they’re part of responsible planning. While the May 27 outage was frustrating, it would have been much worse if our customers had nowhere to check for updates. Fortunately, our status page was up and running, providing real-time visibility and transparency throughout the incident.

We learned from this event, and we’re stronger for it.


Leave a comment

Please note, comments must be approved before they are published


Join our mailing list
Be among the first to know updates about Blue Clover Devices.

Latest Stories at Blue Clover

Had A Bad Day
AI-Friendly Test Automation