Hardware-Optimized Computer Vision
Better vision systems come from better system design. Athena combines the right cameras, optics, sensors, and edge hardware with optimised AI models, validating performance in your environment before deployment.
Trusted by teams building production vision systems
.png%3F2026-04-10T15%253A24%253A23.357Z&w=3840&q=100)
Why “Hardware-Optimized”
Two failure modes dominate real-world computer vision, and both sit outside the model.
The first is at the front of the chain: a brilliant model fed by the wrong camera, a poor lens, or bad lighting will never be accurate. You cannot recover detail the sensor never captured, and no amount of model tuning fixes glare, motion blur, or a part that’s only a few pixels wide in the frame.
The second is at the back of the chain: a model that’s accurate in the cloud but too slow, too power-hungry, or too expensive to run on the hardware you can actually deploy at the scene.
Most vendors only work on the middle — the model. We work on the whole chain, because that’s where accuracy, latency, and cost are really won or lost. That means we select, assess, or engineer:
- Cameras & optics — sensor type, resolution, frame rate, shutter, dynamic range, lens and field of view.
- Lighting — often the single highest-ROI lever in the entire system.
- Tracking & depth sensors — depth, LiDAR, radar, thermal, and RF tags for what a camera can’t see.
- Edge compute — from low-power accelerators to GPU modules to on-prem servers. The right chip, not the biggest one.
- Associated hardware — networking, power, enclosures, mounting, and time-sync that make it survive a real environment.
- The model & its optimisation — engineered to run fast on the specific hardware above.
What You Actually Get
Outcome | What it means |
|---|---|
Accuracy that holds in your environment | Because we engineer the camera, lens and lighting first — not just the model — and benchmark on your real footage, not demo clips. |
Real-time on real hardware | Optimised to react in milliseconds on the device you deploy, not only in a data centre. |
The right sensor for the job | Cameras where vision is enough; depth, radar, thermal or RF where it isn’t — fused into one system. |
Predictable cost | Right-sized sensing and compute. No over-spec, no per-inference cloud bill compounding at 20+ cameras. |
You own it | Your hardware, your models, your weights. No proprietary black boxes, no lock-in. |
The Sensing-to-Decision Chain
If you remember one thing from this page: fix the camera and the light before you touch the model. A part that’s only a few pixels wide can’t be classified reliably no matter how good the model is — the fix is a better lens or a closer camera, not a bigger network. More accuracy is often won with a few hundred dollars of the right lighting than a month of model tuning.
- See it well. The camera, lens and lighting decide what detail is even in the image. Get this right and everything downstream gets easier.
- Fill the gaps. Cameras struggle with distance, darkness, dust and things hidden behind other things. Depth sensors, thermal, radar and radio tags cover what a camera can’t.
- Run it where it matters. The model runs on a device at the scene, not in a data centre — so it reacts in milliseconds and keeps working when the internet doesn’t.
- Make it fit the chip. The same model can run several times faster on the same device once it’s tuned for that hardware — or fail entirely if it isn’t.
- Connect it to the real world. The result has to reach a screen, an alert, a robot, or a business system — reliably, every time.

Where This Deploys
The model is similar across these; the sensing mix and compute differ by environment. A few representative profiles:
Vertical | Typical sensing mix | Compute | Key constraint |
|---|---|---|---|
Manufacturing & Robotics | Global-shutter cameras + controlled lighting; depth/LiDAR for bin-picking & AGV | Orin NX / AGX | Deterministic latency, PLC integration |
Retail & Commerce | Overhead RGB + depth for counting; on-device anonymisation | Orin Nano / NX | Low-power, always-on, privacy |
Sports & Broadcast | High-frame-rate cameras, multi-camera sync; optional UWB player tags | Orin AGX / GPU server |
Inside the Build: the Sensing & Hardware Layer
From here down, the page goes deep for the technical reader. Hardware-optimised computer vision is a co-design problem — the camera and sensor choices, the lighting, the compute, the model and the integration all interact. This is the part that decides your accuracy ceiling, and the part most vendors ignore. It does not stop at the GPU.

Cameras & Optics
- Shutter: Global shutter for fast motion (conveyors, vehicles, sport) to avoid the skew and blur rolling shutter introduces; rolling shutter is acceptable and cheaper for static or slow scenes.
- Resolution × frame rate × bandwidth: A three-way budget you can’t max simultaneously. What matters is pixels-on-target: pick the resolution and lens so the smallest thing you must detect occupies enough pixels to be detected at all.
- Frame rate: ~30fps for general detection, 60fps+ for fast tracking and sport, higher for fast inspection lines.
- Dynamic range & low light: Wide-dynamic-range/HDR sensors for variable lighting; larger pixels and monochrome sensors for low-light sensitivity.
- Spectrum: Colour, monochrome, near-infrared (with IR illumination) for low light, thermal/IR for heat and privacy-preserving people detection, polarisation for glare and transparent materials, and multispectral for materials and agriculture.
- Interface: MIPI CSI (embedded, straight into a Jetson), GMSL for long cable runs in robotics/automotive, GigE Vision (industrial, ~100m over Ethernet with PoE), USB3 Vision for short high-bandwidth runs, CoaXPress for very high bandwidth, and IP / RTSP / ONVIF for existing camera estates.
- Camera-agnostic vs purpose-spec: We work with your existing IP cameras where they’re adequate, and specify machine-vision cameras only where accuracy genuinely demands it.
Lighting: The Highest-ROI Lever
Controlled lighting (diffuse, dome, coaxial, backlight, structured, or strobed), IR illuminators for low/no-light operation, and polarisers to kill glare routinely do more for accuracy than any model change. Controlling ambient light is frequently the difference between a system that works at 9am and one that fails at dusk. We treat lighting as a first-class design decision, not an afterthought.
Choosing the Right Sensing
What you need to see / do | Recommended sensing | Why |
|---|---|---|
Fast-moving objects | Global-shutter camera, 60fps+, correct lens | Avoids motion blur/skew that defeats detection & tracking |
Small or fine detail (defects, text, small parts) | Higher resolution + correct lens + controlled lighting | Detail the model can’t infer if it was never captured |
Variable / harsh lighting | Wide-dynamic-range sensor + controlled or IR lighting | Consistent exposure across the scene and across the day |
Low or no light | Low-light/mono + NIR with IR illumination, or thermal |
Tracking & Depth Sensors (Fusion)
Cameras give rich semantics but struggle with depth, occlusion, and poor visibility. Complementary sensors fill those gaps, fused with the vision pipeline on-device:
- Depth — stereo, structured light, time-of-flight: volume, bin-picking, navigation, fall detection.
- LiDAR — 2D safety scanners and 3D solid-state: precise distance, AGV navigation, volumetric measurement.
- mmWave radar: presence, speed and even vital signs through dust, fog and darkness; privacy-preserving.
- Thermal / IR: temperature anomalies, low/no-light detection, privacy-preserving people sensing.
- RF positioning — UWB, BLE AoA, RFID: track tagged assets and people to sub-30cm where vision can’t see them.
- Motion — IMU, GNSS, wheel odometry: ego-motion for moving platforms (robots, vehicles).
Fusion is late by default at the edge (modular, no tight time-sync) and tight only where timing is well-controlled and the accuracy gain justifies it. For the full multi-sensor tracking pipeline, see the Real-Time Tracking solution page.
Associated Hardware
- Networking: PoE / PoE+ switches (power and data on one cable), bandwidth budgeting (16×1080p30 ≈ ~128 Mbps sustained), GMSL serialisers for long runs, VLAN segmentation, and one-way diodes for air-gapped sites.
- Power: PoE vs DC, power budgets, UPS for resilience, and locked Jetson power modes (5–60W on AGX) chosen at deployment, not at demo.
- Enclosures & environment: IP-rated and sealed enclosures, washdown ratings for food/industrial, heat exchangers for hot environments, vibration tolerance, and mounting geometry that fixes the field of view.
- Sync & timing: hardware triggers, PTP time-sync across multi-camera rigs, and strobe synchronisation.
- Storage: local NVMe for event buffering and clip retention.
Edge Compute
The right compute is set by stream count, latency budget, power envelope, environment and cost — not by buying the biggest chip:
Tier | Compute | ~Rated AI / Power | Best for |
|---|---|---|---|
Microcontroller / TinyML | MCU-class NPU | < 1 TOPS · mW–1W | On-sensor anomaly / simple detection, battery devices |
SBC | Raspberry Pi-class CPU | low · ~3–7W | Prototyping, light single-camera CV |
Edge accelerator | VPU / NPU / Edge TPU (e.g. Hailo, Coral) | ~4–26 TOPS · ~2–5W | Low-power embedded vision in a product |
Model & Inference Optimisation
A model that runs at 30fps in a research framework may run 3× faster on the same device once optimised for it. The optimisation pipeline is not optional — it’s what makes the hardware choice viable at production throughput. Applied in order, each step traded against accuracy:
Technique | What it does | Typical gain |
|---|---|---|
FP16 quantisation | Mixed-precision; halves memory footprint | 1.5–2.2× |
INT8 quantisation | Halves weight size again; needs a calibration set | 2–3× |
Model pruning | Removes near-zero weights; less compute per inference | 1.3–1.8× |
Layer fusion | Combines consecutive ops into one kernel | 1.2–1.5× |
Combined effect: a mid-size detector at full precision running ~35fps on an Orin AGX reaches 90–110fps after FP16 + layer fusion + stream pinning, with an accuracy delta under 1% on the calibration set. We benchmark each step against your footage and your accuracy threshold before committing the production configuration.
Latency Reference
Capability | Cloud (round-trip) | On-prem server | Edge (Orin AGX) |
|---|---|---|---|
Object detection (1080p30) | 80–300 ms | 12–25 ms | 8–18 ms |
Real-time tracking (4 streams) | Not viable (bandwidth) | 30–60 ms | 20–45 ms |
Event / zone trigger | 100–500 ms | 15–30 ms | 10–20 ms |
Vision + Identity at the Edge
A differentiator few competitors offer: identity woven into the vision stack on the same device. Face and liveness, access control, and on-device identity matching can run without biometrics ever leaving the device — or be replaced entirely by RF identity (UWB/BLE/RFID) where biometrics aren’t wanted. This pairs the vision pipeline with the Identity & Access solution. We deploy identity or face recognition only where there is a clear legal basis.
Deployment Topologies
Standalone edge node, edge + on-prem aggregation, hybrid (edge inference + cloud analytics, video staying on-site), and fully air-gapped. The Edge Deployment solution page carries the full architecture, fleet management, thermal and security treatment — this page doesn’t duplicate it.
Build vs Buy
A capable team can get a model running on a Jetson in a couple of weeks. Camera and lighting selection, sensor fusion, optimisation, thermal design, integration, and keeping accuracy stable in the field are the other twenty-two. The hidden costs are the optics and lighting expertise, the calibration work, and the discipline to run a fleet in real conditions.
- Building in-house makes sense when edge computer vision is a core product differentiator and you intend to fund a permanent embedded CV team.
- Buying / partnering makes sense when vision is an enabling capability for a broader product. We’re also brought in to rescue internal builds that stalled at the optimisation or operations layer.
How an Engagement Works
Engagements start with a paid technical discovery — 2–4 weeks against your real hardware, footage, lighting and integration surface. By the end you have a sensing and hardware specification, a benchmarked accuracy and latency baseline, a thermal assessment, and an honest go/no-go based on measured numbers.
Production engagements run on milestone-based contracts with defined acceptance criteria: accuracy threshold, latency SLA (p99 at sustained load), and thermal steady-state. Ongoing operations — monitoring, updates, retraining, on-call — run as a separate retainer scoped to device count and SLA.
What We Won’t Do
- Quote accuracy or latency we haven’t measured on your camera, your lighting and your footage.
- Recommend cameras, sensors or compute you don’t need. Over-spec is waste; we right-size.
- Drop a generic model onto your device and call it done.
- Lock you in. You own the hardware, the models, and the weights.
- Ignore thermal management or lighting and call it ‘out of scope.’
- Deploy face recognition without a clear legal basis
Ready To See What This Looks Like On Your Hardware?
A discovery call is a one-hour technical conversation against your actual environment — your cameras, your lighting, your camera count, your latency requirement, your integration surface. We don’t pitch. We benchmark.