Visual Search

Find anything. By showing it. On your infrastructure.

Visual search replaces the assumption that everything in your catalogue, your archive, or your footage has been correctly tagged, labelled, and catalogued. It doesn't. Parts arrive without markings. Defects recur without anyone connecting the pattern. Objects appear in footage that nobody logged at the time.

Visual search lets your team query with an image instead of a keyword. Photograph an unknown part — get its catalogue match in under a second. Upload a defect example — retrieve every similar instance across your QC archive. Submit a reference image — find every frame across six hours of camera footage where that object or person appears.

Athena AI builds these systems to run entirely on your infrastructure. The embedding index, the search engine, the query interface — all on-premise. Your parts catalogue, your QC archive, your footage never transit a third-party service. No per-query API billing that compounds as search volume grows.

[Book a Discovery Call →] [See How It Works]

What visual search replaces

The honest version of why keyword and metadata search fails at scale:

Scenario	Keyword / metadata search	Visual search (Athena AI)
Find a part that looks like this damaged component	Requires knowing the part number — which you don't, because it's damaged	Upload a photo. Return ranked matches from your parts catalogue in < 500ms.

Use case	Query type	Index type	Latency target	Primary vertical
Parts matching	Photo of part → ranked catalogue matches	Catalogue embedding index	< 500 ms	Manufacturing, MRO, logistics
Defect similarity search	Example defect image → all similar QC frames

Project	What We Built	Result
Canada First Bricks (LEGO Sorting)	Visual classification pipeline distinguishing 300+ part types on Jetson Orin. The classification stage is a constrained form of visual search — embedding-based similarity against a known part library, resolved at actuation speed.	96.3% classification accuracy across 300+ visually similar part types. < 100ms decision latency including actuation. Foundation for a catalogue-scale parts search deployment.
[Manufacturing client — NDA]	Defect similarity search across a QC frame archive. Inspector uploads a reference defect image; system retrieves all visually similar frames from 18 months of production footage. On-prem Faiss index over ~4M QC frames.	[Results to be confirmed with client before publishing — placeholder.]
[Industrial MRO client — NDA]	Parts identification from field photographs. Technicians photograph unlabelled or damaged components; system returns top-5 catalogue matches with part numbers and supplier details.	[Results to be confirmed with client before publishing — placeholder.]

Model	Best for	Embedding dim	When we substitute
CLIP (ViT-B/32 or ViT-L/14)	General-purpose visual search across diverse catalogues. Strong zero-shot performance on novel object categories.	512 / 768	Default for new catalogue deployments without prior training data. Zero-shot baseline before fine-tuning.
EfficientNet-B4/B7 (fine-tuned)	Domain-specific catalogues where CLIP zero-shot underperforms — e.g. industrial parts, medical devices, proprietary products.	1792	When the query population is narrow and visually homogeneous. Fine-tuned on customer catalogue data.
ResNet-50 / ResNet-101 (fine-tuned)	Defect similarity search where texture and surface detail matter more than semantic similarity.	2048	QC archive search for defect types. Fine-tuned on labelled defect examples from your production line.
DINOv2 (ViT-B/S)	High-quality dense features for fine-grained similarity. Strong on industrial parts with subtle visual differences.	768	Parts matching where inter-class similarity is high — e.g. fasteners, connectors, similar-looking components.
OSNet (appearance re-ID)	Person and vehicle search across camera archives. Trained for cross-angle, cross-camera appearance matching.	512	Video archive search for person or vehicle queries. Non-biometric — appearance model, not face recognition.
Custom fine-tune on customer data	Any domain where off-the-shelf models underperform on your specific query population.	Varies	Always evaluated during discovery. Fine-tuning is scoped when zero-shot baseline accuracy is below threshold.

Index type	Index size	Query latency	When we use it
Flat (exact search)	Up to ~50K vectors	< 5 ms	Small–medium catalogues where exact similarity matters and recall must be 100%. Parts catalogues, small QC archives.
Faiss IVF (inverted file index)	50K – 5M vectors	5–50 ms	Large catalogues and mid-size video archives. Trades marginal recall for significant speed. Configurable nprobe for accuracy/speed balance.
Faiss HNSW (hierarchical NSW graph)	Any size; memory-resident	1–10 ms at any scale	High-QPS interactive search where latency matters more than memory cost. Product visual search, real-time parts lookup.
Faiss IVF-PQ (product quantisation)	Millions to billions of vectors	10–100 ms	Very large video archives (weeks of multi-camera footage). Reduces memory footprint by 8–16× at modest recall cost.
Scoped sub-index (filtered search)	Any size, filtered by metadata	Adds < 5 ms to base latency	When queries are naturally scoped: 'find this defect in footage from Line 3 in Q4 2024'. Pre-filter reduces search space before ANN.

Metric	Target	Notes
Catalogue search latency (p99)	< 500 ms end-to-end	Query embedding extraction + index search + result formatting. CPU inference on embedding model; Faiss HNSW index in RAM.
Video archive search latency (p99)	< 10 s per multi-hour query	Depends on archive size and index type. Pre-built frame-level index; query does not require re-scanning footage.
Top-1 recall (parts matching, fine-tuned)	> 90%	On held-out test set from your catalogue. Zero-shot CLIP baseline benchmarked during discovery; fine-tuned model target set against your accuracy threshold.
Top-5 recall (parts matching, fine-tuned)	> 97%	Ranked list of 5 candidates. Human confirmation on top-5 is sufficient for most parts lookup workflows.
Index build time (100K vectors)	< 30 minutes	Offline. Index is pre-built, not built at query time. Incremental index updates available for new catalogue additions.
Index build time (1M vectors)	< 4 hours	Offline. Parallelised embedding extraction on GPU; Faiss index training on CPU.
Concurrent query throughput	> 50 QPS (catalogue search)	On a single T4 GPU server with Faiss HNSW index in RAM. Scales horizontally with additional workers.
Bandwidth to cloud	0 KB/s	All embedding extraction, index storage, and search on customer infrastructure.

Vertical	Primary use case	Index type	Query latency target	Key constraint
Manufacturing	Parts matching, defect similarity, QC archive retrieval	Catalogue + QC frame archive	< 500 ms (parts) / < 10 s (archive)	Fine-grained similarity on homogeneous parts; offline index build from existing QC data
Maintenance, Repair & Overhaul	Identify unknown part from photo; find compatible replacements	Catalogue embedding index	< 500 ms	Parts catalogue may be large and heterogeneous; multi-supplier normalisation
Retail & e-commerce	Customer photo → product results; 'shop the look'	Product catalogue index (HNSW)	< 300 ms interactive	High QPS; index must update as catalogue changes; mobile query image quality varies
Security & investigations	Find person or vehicle across camera archive	Appearance embedding index (video frames)	< 10 s per query	Non-biometric re-ID for person search; large archive size; multi-camera coverage
Logistics & supply chain	Identify shipment contents from photo; match against manifest	SKU / cargo catalogue index	< 500 ms	Heterogeneous item types; query images often low-quality photos
Brand protection & compliance	Identify counterfeit or non-compliant product from image	Approved product index	< 500 ms	High precision required; false positives have commercial consequences
Healthcare & medical devices	Identify device or instrument from photo; find similar cases in archive	Device catalogue + case archive	< 2 s	Regulatory sensitivity; on-prem only; no cloud processing

Visual Search

Find anything. By showing it. On your infrastructure.

What visual search replaces

Where this deploys

Why Athena AI

The index is yours

Fine-tuned for your domain, not a general catalogue

Two use cases, one architecture

Scales to your archive, not against you

Reference work

Ready to see what this looks like on your catalogue or archive?

The four things that determine whether visual search works

1. The embedding model and whether it understands your domain

2. The quality and coverage of the index

3. The similarity threshold and what happens below it

4. The query image quality

What deployment looks like

Embedding model selection

Index architecture and scale

Performance reference

The full pipeline — end to end

The hard problems we plan for

One architecture, seven operational profiles

Deployment topology

On-prem GPU server (standard)

CPU-only (small indices)

Multi-server (large archives)

Air-gapped

Model promotion

Integration surface

Security and data architecture

Build vs buy

What a 3-month internal build looks like

Where building makes sense

Where buying makes sense

What we won't do

Engagement model