Tectalks

The Hands-On-the-Yoke Era of AI: From General Aviation to LLM Ops

I’ve spent most of my flying time in General Aviation (GA), where the connection between the pilot and the machine is direct. In a small plane, you feel the thermals through the yoke, you hear the engine’s rhythm change with the mixture, and you are constantly adjusting for the environment.

Hands on the yoke

It is a stark contrast to the world of heavy commercial jets. In those cockpits, it’s often said the autopilot handles the flight surfaces for 99% of the journey. In that environment, the pilot has evolved into a Systems Manager—monitoring the automation rather than hand-flying the aircraft.

In the tech world, we’ve spent the last decade trying to reach that “99% Autopilot” state with DevOps and SRE. We built CI/CD pipelines and infrastructure as code to make our systems “set-and-forget.” But as I’ve moved deeper into LLM Ops, I’ve realized that we’ve moved back into the cockpit of a GA aircraft.

The Return of Non-Determinism

In traditional software, we expect a specific input to always produce the same output. It’s like a mechanical linkage: you move the lever, and the flap moves exactly ten degrees.

Large Language Models (LLMs) aren’t like that. They are probabilistic engines. Managing them in production feels less like managing a server and more like flying a landing pattern in a gusty crosswind. The “atmosphere” (the prompt context, the model temperature, and the retrieval data) is always shifting.

In GA, if you take your hands off the controls in turbulent air, the plane will eventually drift off course. You have to make constant, subtle micro-corrections to stay on the centerline.

This is exactly what we do in LLM Ops through Evaluations (Evals). Because LLMs are non-deterministic, we can’t rely on a simple “Pass/Fail” unit test. We have to monitor for:

  1. Model Drift: Is the model becoming less accurate over time?
  2. Hallucinations: Has the model “lost situational awareness” and started generating false data?
  3. Latency: Is the “engine” struggling under the load of complex reasoning?

Prompt Engineering as Trim Tabs

In a small plane, you use “trim” to relieve the pressure on the controls so the plane flies level without constant physical effort. You aren’t redesigning the wing; you are just adjusting the surfaces to match the current conditions.

I see Prompt Engineering as the trim tabs of AI. We aren’t necessarily fine-tuning the base model every day. Instead, we are making micro-adjustments to the system instructions to balance the output. If the model is leaning too far into “creative” territory when we need “factual,” we adjust the trim to bring it back to a level flight path.

The Pilot in Command (PIC)

Modern cloud computing—especially Serverless—often feels like it’s taking the “pilot” out of the loop. But as any GA pilot knows, no matter how much technology you have in the panel, the human in the loop is ultimately responsible for the outcome.

In the age of AI, we cannot abdicate our responsibility to the model. We must understand the “physics” of how these models work. Whether I’m debugging a Go-based inference pipeline or performing a pre-flight walkaround, the goal is the same: building a deep understanding of the systems we rely on so that when the “weather” gets rough, we know exactly how to take manual control.

Is your AI on autopilot, or are you still hand-flying the mission?