How to Build an AI App That Actually Works in Production
Most AI demos look great. You hook up an LLM, write a prompt, and it works — in your terminal, with your test data, on a good day.
Then you ship it. Users send weird inputs. The model hallucinates. Latency spikes. Costs balloon. The app that worked perfectly in a demo falls apart under real conditions.
We've built enough AI apps at Fovea to know where things go wrong. Here's what we've learned.
Start With the Problem, Not the Model
The most common mistake: picking a model first, then finding a use case. This is backwards.
Start with the problem. What does the user need? What decision are they trying to make? What data do they already have?
Sometimes the answer isn't even AI. A good SQL query or a simple rules engine might solve the problem faster, cheaper, and more reliably. We've talked clients out of AI projects when simpler solutions made more sense.
But when AI is the right tool, knowing the problem first tells you:
- What kind of model you need (LLM, classification, regression, etc.)
- How accurate it needs to be
- How fast it needs to respond
- How much you can spend per request
Pick Your Architecture
Most AI apps fall into one of these patterns:
Chat / Copilot
User talks to an AI assistant. Could be customer support, internal tools, or a domain-specific helper. The key decisions are:
- Context window management — how much history and context do you feed the model?
- Tool use — does the AI need to call APIs, query databases, or take actions?
- Guardrails — what should the AI refuse to do?
Prediction API
You send data in, get a prediction back. Sports predictions, fraud detection, lead scoring. The key decisions are:
- Batch vs real-time — do you need answers in milliseconds or is hourly fine?
- Model serving — where does the model run? How do you update it?
- Monitoring — how do you know when the model starts getting worse?
AI-Enhanced Dashboard
A traditional app with AI features layered in — summaries, recommendations, anomaly detection. The key decisions are:
- Where to run inference — server-side, edge, or client?
- Caching — can you pre-compute AI outputs or does everything need to be live?
- Fallbacks — what happens when the AI is slow or wrong?
At Fovea, we've built all three. Our sports prediction platforms are prediction APIs. Our consulting work often involves copilots and dashboards.
The Stack That Works for Us
We're opinionated about our stack because we've tried the alternatives:
Go for the backend. It's fast, compiles to a single binary, handles concurrency well, and deploys easily to containers. For AI apps specifically, Go is great for the orchestration layer — the part that manages prompts, calls models, handles retries, and coordinates between services.
Kubernetes for orchestration. AI workloads are bursty. Sometimes you need 10x compute for a batch job, then nothing for hours. Kubernetes handles this well with autoscaling.
Azure for cloud. Azure OpenAI gives you the same models as OpenAI's API but with enterprise features — private endpoints, content filtering, data residency. For companies that care about where their data goes, this matters.
PostgreSQL for data. With pgvector, you get vector search in the same database as your application data. No need for a separate vector database in most cases.
The Parts Nobody Talks About
Evals
You need automated tests for your AI. Not unit tests — evals. These are test cases that check whether the model's output is good enough.
For a copilot, this might be: "Given this question, does the answer contain the right information?" For a prediction API: "Is the model's accuracy above X% on this test set?"
Without evals, you're shipping blind. Every prompt change, every model upgrade, every context window tweak could break something and you won't know until users complain.
Cost Management
LLM costs add up fast. A single GPT-4 call can cost $0.03-0.10. Multiply that by thousands of users and you're burning money.
Things that help:
- Use the smallest model that works. GPT-4 is great but GPT-3.5 or Claude Haiku might be fine for your use case.
- Cache aggressively. If two users ask the same question, don't call the model twice.
- Batch when possible. Batch API calls are usually 50% cheaper.
Error Handling
LLMs fail in ways traditional software doesn't. They don't throw exceptions — they return confident-sounding wrong answers. Your app needs to handle:
- Hallucinations — the model makes up facts. Cross-reference outputs against your data.
- Refusals — the model won't answer because it thinks the question is unsafe. Have a fallback.
- Latency spikes — model APIs can be slow. Set timeouts and show loading states.
- Rate limits — every API has them. Queue requests and handle backpressure.
Deployment
Ship it like any other app, with a few additions:
- Feature flags — roll out AI features gradually. If something breaks, kill it fast.
- Logging — log every model input and output. You'll need this for debugging, evals, and improving prompts.
- Monitoring — track latency, cost per request, error rates, and output quality metrics.
When to Build vs When to Get Help
Build it yourself if:
- You have engineers who've shipped AI features before
- The use case is straightforward (basic chatbot, simple classification)
- You have time to learn and iterate
Get help if:
- You need it in production fast
- The use case involves complex orchestration, multiple models, or critical business decisions
- You don't want to learn the hard way what breaks in production
We do AI consulting for companies in the second category. We've shipped enough AI apps to know the pitfalls, and we'd rather help you skip them than watch you discover them.
The Bottom Line
Building an AI app that demos well takes a weekend. Building one that works in production takes proper architecture, evals, cost management, and monitoring.
The technology isn't the hard part — the engineering around it is. Treat AI features like any other critical system: test them, monitor them, and plan for failure.
If you're building something and want to talk through your architecture, reach out. We're happy to help.