Visual Search Computer Vision Services | Athena AI | Athena AI | Athena AI
Sections
Visual Search
Find anything. By showing it. On your infrastructure.
Visual search replaces the assumption that everything in your catalogue, your archive, or your footage has been correctly tagged, labelled, and catalogued. It doesn't. Parts arrive without markings. Defects recur without anyone connecting the pattern. Objects appear in footage that nobody logged at the time.
Visual search lets your team query with an image instead of a keyword. Photograph an unknown part — get its catalogue match in under a second. Upload a defect example — retrieve every similar instance across your QC archive. Submit a reference image — find every frame across six hours of camera footage where that object or person appears.
Athena AI builds these systems to run entirely on your infrastructure. The embedding index, the search engine, the query interface — all on-premise. Your parts catalogue, your QC archive, your footage never transit a third-party service. No per-query API billing that compounds as search volume grows.
[Book a Discovery Call →] [See How It Works]
What visual search replaces
The honest version of why keyword and metadata search fails at scale:
Scenario
Keyword / metadata search
Visual search (Athena AI)
Find a part that looks like this damaged component
Requires knowing the part number — which you don't, because it's damaged
Upload a photo. Return ranked matches from your parts catalogue in < 500ms.
What visual search actually does
Every search system you use today works the same way: you describe what you want in words or structured fields, and the system finds records that match those words or fields. That works when everything in your database has been correctly described. It fails when it hasn't — and in most operational environments, most things haven't been.
An industrial parts catalogue has 40,000 components. Perhaps 60% have complete, accurate metadata. The rest have partial records, legacy part numbers, outdated descriptions, or no description at all. A keyword search of that catalogue is, in practice, a search of 24,000 items — the ones someone got around to labelling correctly.
Visual search works differently. Instead of matching words, it matches visual meaning. You submit an image. The system converts it to a compact numerical representation — a vector that encodes what the image looks like. It then finds the vectors in its index that are most similar to your query vector, and returns the corresponding items ranked by visual similarity. No keywords. No tags. No dependency on someone having labelled things correctly at the time they were added.
The two modes: catalogue search and archive search
Visual search in an operational environment takes two distinct forms. The underlying technology is the same. The index structure, the latency requirement, and the workflow around it differ significantly.
Catalogue search — find this in my library
You have a known set of items: parts, products, devices, approved components. You build an index from that catalogue. When a query arrives — a photograph of an unknown part, a customer product image, a field technician's phone photo — the system finds the closest matches in the catalogue and returns them ranked by similarity.
The index is built once, updated incrementally as the catalogue changes, and queries against it are fast — typically under 500 milliseconds end-to-end. This is the mode for parts identification, product visual search, counterfeit detection, and MRO lookup.
Archive search — find this in my history
You have a growing archive of images or video frames: QC inspection frames, camera footage, historical photographs. The system embeds every frame in the archive and builds a searchable index over it. When a query arrives — a reference defect image, a photograph of a person or vehicle, an object of interest — the system searches the entire archive and returns all frames where something visually similar appears.
The index is built offline over the existing archive, then updated incrementally as new frames are added. Query latency depends on archive size and index type — typically under 10 seconds for a multi-hour video archive, under 2 seconds for a bounded QC frame archive. This is the mode for defect similarity search, QC retrospective analysis, and security archive investigation.
Visual search is an embedding pipeline feeding a vector index, served by a query API. Each of those three components involves choices that interact with each other and with your specific domain, catalogue size, archive scale, and latency requirement. We make those choices explicit during discovery and document them before any code is written.
Use cases and index architecture by mode
Use case
Query type
Index type
Latency target
Primary vertical
Parts matching
Photo of part → ranked catalogue matches
Catalogue embedding index
< 500 ms
Manufacturing, MRO, logistics
Defect similarity search
Example defect image → all similar QC frames
The section most visual search vendors skip. The one your engineering team asks about when the PoC ends.
Building the initial index is week one. Keeping the embedding model accurate as your catalogue evolves, maintaining index freshness as the archive grows, detecting when model or data drift is degrading search quality, and integrating search results into the workflows that need them — that is the operational reality. This tab covers how Athena AIbuilt visual search systems live in production.
Index maintenance
Catalogue index updates
New catalogue items are ingested incrementally — embedded and inserted into the Faiss HNSW or flat index without requiring a full rebuild. Deleted items are flagged and filtered from results immediately (soft delete via metadata filter) and removed from the index at the next scheduled compaction. Modified items (e.g. updated product images, re-photographed parts) are re-embedded and the old vector replaced. Index update latency: under 60 seconds from item submission to searchable.
Archive index updates
New archive frames (video footage, QC inspection images) are ingested on a configurable schedule — hourly, daily, or triggered by a production event (end-of-shift QC batch, camera footage handoff). Frames are embedded in batches for throughput efficiency. IVF indices are re-trained when the index has grown by a configurable threshold (typically 20–30%) beyond the original training size, triggered automatically and run as a background job outside query hours.
Index health monitoring
Index staleness (items pending ingestion), index size growth rate, embedding extraction throughput, and Faiss index training status are monitored continuously. Alerts fire when: pending ingestion queue exceeds threshold, index training job fails, embedding model version mismatch detected between ingestion and query paths.
Model versioning and accuracy monitoring
Recall monitoring
A held-out evaluation set — a fixed set of query images with known correct catalogue or archive matches — is run against the production model on a configurable schedule (daily for high-traffic deployments, weekly for lower-volume ones). Top-1 and top-5 recall metrics are tracked over time. A recall drop of more than 2 percentage points triggers an alert. Common causes: new catalogue items have been added that the model struggles with (fine-tuning trigger), query image quality has degraded (capture hardware issue), or index has become stale.
Fine-tuning triggers
Scheduled fine-tuning: model re-evaluated on a quarterly basis against the current catalogue. If recall has drifted below threshold, a fine-tuning cycle is triggered. Event-driven fine-tuning: a new product category, a new supplier's parts, or a new defect type enters the catalogue and the baseline recall on that subset is below threshold. Active learning: queries that produce low-confidence results are flagged and routed to a human review queue, where confirmed correct matches become new training examples.
Find every QC frame where this defect appeared
Requires the defect to have been tagged at capture time
Query by example image. Retrieve all visually similar frames across the archive.
Identify an unknown object on the production line
Impossible without prior classification
Compare against your parts library. Return closest visual matches with confidence scores.
Find this person across 6 hours of camera footage
Requires manual review or prior tagging
Submit a reference image. Retrieve all frames containing visually similar appearances, ranked by similarity.
Audit all instances where a specific vehicle entered a facility
Requires licence plate data captured at entry
Query by vehicle image. Return all archive frames containing that vehicle across all cameras.
Find products visually similar to a customer photo
Image in. Ranked product results out. No tags required.
Where this deploys
Manufacturing and quality control. Parts identification, defect similarity search, QC archive retrieval, counterfeit / non-compliant component detection. The primary vertical — query by photographing a part, retrieve its catalogue match and full history.
Maintenance, Repair and Overhaul (MRO). Field technicians photograph an unknown or unlabelled component and retrieve its part number, supplier, and compatible replacements from the parts library. No manual cross-referencing. No dependency on legible markings.
Security and investigations. Retrieve all frames from a camera archive in which a specific object, person, or vehicle appears. Query by reference image. No manual review of footage. No dependency on prior tagging.
Retail and e-commerce. Customer submits a product photo — retrieve the closest matches from your catalogue. 'Shop the look' implemented without manual attribute tagging of your entire product range.
Logistics and supply chain. Identify shipment contents from a photograph and match against the expected manifest. Flag visual discrepancies between received goods and expected specification.
Brand protection and compliance. Compare product images against an approved visual library to identify counterfeits, substitutions, or non-compliant variants.
Healthcare and medical devices. Identify instruments or devices from a photograph against a device catalogue. Retrieve similar cases from a clinical archive. Fully on-premise — no cloud processing.
Why Athena AI
The index is yours
Visual search SaaS charges per query, hosts your catalogue on their infrastructure, and ties your search capability to their uptime. We build the index on your hardware. The embedding model, the Faiss index, the query API — all on-premise. No per-query billing. No vendor dependency on the search path.
Fine-tuned for your domain, not a general catalogue
Off-the-shelf visual search works well for diverse consumer product catalogues. It underperforms on industrial parts catalogues where components look visually similar, defect archives where the distinguishing features are subtle, and proprietary product lines where no public training data exists. We fine-tune embedding models on your data and benchmark accuracy on your catalogue before production commitment.
Two use cases, one architecture
Catalogue search (find this part in my library) and archive search (find this object in my footage) share the same underlying embedding and index infrastructure. You don't need two systems. The same on-prem deployment serves both query types — catalogue queries from a web interface, archive queries from an investigative tool or QC dashboard.
Scales to your archive, not against you
A Faiss HNSW index containing one million frame embeddings queries in under 10 milliseconds. A 100-million-frame index queries in under 100 milliseconds. Index size scales with storage, not with query cost. You pre-build the index once; every query against it is fast, on your hardware, with no incremental cost.
Reference work
Project
What We Built
Result
Canada First Bricks (LEGO Sorting)
Visual classification pipeline distinguishing 300+ part types on Jetson Orin. The classification stage is a constrained form of visual search — embedding-based similarity against a known part library, resolved at actuation speed.
96.3% classification accuracy across 300+ visually similar part types. < 100ms decision latency including actuation. Foundation for a catalogue-scale parts search deployment.
[Manufacturing client — NDA]
Defect similarity search across a QC frame archive. Inspector uploads a reference defect image; system retrieves all visually similar frames from 18 months of production footage. On-prem Faiss index over ~4M QC frames.
[Results to be confirmed with client before publishing — placeholder.]
[Industrial MRO client — NDA]
Parts identification from field photographs. Technicians photograph unlabelled or damaged components; system returns top-5 catalogue matches with part numbers and supplier details.
[Results to be confirmed with client before publishing — placeholder.]
Ready to see what this looks like on your catalogue or archive?
A discovery call is a one-hour technical conversation against your actual data — your parts catalogue, your QC archive, your footage, your query volume. We don't pitch. We benchmark. By the end you know whether visual search is the right architecture for your operation, which embedding model is appropriate for your domain, and what accuracy is achievable before any significant investment.
[Book a Discovery Call →]
Want more detail? Continue to [How It Works →] for a non-technical walkthrough, or jump to [Architecture →] for the engineering breakdown.
The four things that determine whether visual search works
Visual search is not a plug-in. Whether it delivers the accuracy your operation requires depends on four decisions made before a single query is run.
1. The embedding model and whether it understands your domain
An embedding model trained on general internet images will struggle with a parts catalogue where 200 components look nearly identical except for a dimensional tolerance difference. The model needs to have learned to distinguish the features that matter in your domain — surface texture for defect search, dimensional proportion for parts matching, colour and pattern for product search. The baseline is a general-purpose model. The ceiling is a model fine-tuned on your data.
2. The quality and coverage of the index
A catalogue search system is only as good as what's in the catalogue. If 30% of your parts have no image in the library, the system cannot return them. If the catalogue images were taken under different lighting conditions than field query images, similarity scores will be lower than expected. Index coverage and image quality are operational decisions that precede the technical ones.
3. The similarity threshold and what happens below it
Every visual search system has a confidence threshold: above it, a result is returned as a match; below it, a result is returned as a candidate for review, or not returned at all. Setting this threshold is a business decision, not a technical one. A threshold set too high produces false rejects — the system says 'no match' when there is one. Too low produces false positives — the system returns wrong matches confidently. We set and document this threshold explicitly during deployment, and expose it as a configurable parameter.
4. The query image quality
A blurry, dark, severely angled photograph of a part will produce lower-confidence results than a clear, well-lit, frontal photograph. This is not a flaw in the system — it is physics. We benchmark accuracy across the range of query image quality your operation actually produces, not just under ideal conditions, and design query capture guidance accordingly. For automated query sources (inline cameras, fixed inspection stations), we control image quality in hardware. For human-submitted queries, we provide quality guidance and real-time capture feedback in the query interface.
What deployment looks like
Discovery: catalogue or archive review, query volume estimation, accuracy threshold definition, query image quality assessment. We benchmark a zero-shot model baseline on a sample of your data before any architecture decision is made.
Model selection and fine-tuning: choose the embedding model based on your domain and zero-shot baseline. Fine-tune if the baseline is below threshold. Validate on held-out test set from your catalogue or archive.
Index build: embed your catalogue or archive, build the Faiss index, validate query latency on your hardware at your expected query volume.
Interface and integration: query interface (web, API, or embedded in your existing QC or ERP system), result schema definition, downstream workflow integration.
Production and operations: index maintenance as the catalogue or archive grows, model drift monitoring, accuracy benchmarking on new query samples, retraining cycle on schedule or on drift trigger.
For the engineering detail, continue to [Architecture →]. For index maintenance, drift handling, and integration, see [Operations →].
QC frame archive index
< 2 s (archive)
Manufacturing, quality control
Visual QC archive retrieval
Inspector uploads reference → finds all historical instances
Time-indexed frame embeddings
< 5 s (large archive)
Manufacturing, compliance
Object search in footage
Object image → camera archive frames containing it
Video frame embedding index
< 10 s (multi-hour archive)
Security, industrial, logistics
Person / vehicle search
Reference image → all archive appearances
Appearance embedding index
< 10 s (multi-hour archive)
Security, investigations, facilities
Product visual search
Customer photo → catalogue results
Product embedding index
< 300 ms
Retail, e-commerce, catalogues
Counterfeit / compliance check
Product image → similarity vs approved library
Approved product index
< 500 ms
Manufacturing, brand protection
Embedding model selection
The embedding model is the highest-leverage decision in a visual search system. The right model for a diverse consumer product catalogue is not the right model for a parts catalogue of 40,000 near-identical fasteners. We evaluate a zero-shot baseline during discovery and fine-tune where the baseline accuracy is below your threshold.
Model
Best for
Embedding dim
When we substitute
CLIP (ViT-B/32 or ViT-L/14)
General-purpose visual search across diverse catalogues. Strong zero-shot performance on novel object categories.
512 / 768
Default for new catalogue deployments without prior training data. Zero-shot baseline before fine-tuning.
EfficientNet-B4/B7 (fine-tuned)
Domain-specific catalogues where CLIP zero-shot underperforms — e.g. industrial parts, medical devices, proprietary products.
1792
When the query population is narrow and visually homogeneous. Fine-tuned on customer catalogue data.
ResNet-50 / ResNet-101 (fine-tuned)
Defect similarity search where texture and surface detail matter more than semantic similarity.
2048
QC archive search for defect types. Fine-tuned on labelled defect examples from your production line.
DINOv2 (ViT-B/S)
High-quality dense features for fine-grained similarity. Strong on industrial parts with subtle visual differences.
768
Parts matching where inter-class similarity is high — e.g. fasteners, connectors, similar-looking components.
OSNet (appearance re-ID)
Person and vehicle search across camera archives. Trained for cross-angle, cross-camera appearance matching.
512
Video archive search for person or vehicle queries. Non-biometric — appearance model, not face recognition.
Custom fine-tune on customer data
Any domain where off-the-shelf models underperform on your specific query population.
Varies
Always evaluated during discovery. Fine-tuning is scoped when zero-shot baseline accuracy is below threshold.
Model selection is confirmed during discovery against your actual data. We benchmark top-1 and top-5 recall on a held-out test set before committing to a production model. Fine-tuning is scoped only when the zero-shot baseline is below the accuracy threshold agreed during discovery.
Index architecture and scale
The Faiss index type is determined by catalogue or archive size, query latency requirement, and available RAM on the inference server. We select the index type during architecture review — it is not a default.
Index type
Index size
Query latency
When we use it
Flat (exact search)
Up to ~50K vectors
< 5 ms
Small–medium catalogues where exact similarity matters and recall must be 100%. Parts catalogues, small QC archives.
Faiss IVF (inverted file index)
50K – 5M vectors
5–50 ms
Large catalogues and mid-size video archives. Trades marginal recall for significant speed. Configurable nprobe for accuracy/speed balance.
Faiss HNSW (hierarchical NSW graph)
Any size; memory-resident
1–10 ms at any scale
High-QPS interactive search where latency matters more than memory cost. Product visual search, real-time parts lookup.
Faiss IVF-PQ (product quantisation)
Millions to billions of vectors
10–100 ms
Very large video archives (weeks of multi-camera footage). Reduces memory footprint by 8–16× at modest recall cost.
Scoped sub-index (filtered search)
Any size, filtered by metadata
Adds < 5 ms to base latency
When queries are naturally scoped: 'find this defect in footage from Line 3 in Q4 2024'. Pre-filter reduces search space before ANN.
Performance reference
Metric
Target
Notes
Catalogue search latency (p99)
< 500 ms end-to-end
Query embedding extraction + index search + result formatting. CPU inference on embedding model; Faiss HNSW index in RAM.
Video archive search latency (p99)
< 10 s per multi-hour query
Depends on archive size and index type. Pre-built frame-level index; query does not require re-scanning footage.
Top-1 recall (parts matching, fine-tuned)
> 90%
On held-out test set from your catalogue. Zero-shot CLIP baseline benchmarked during discovery; fine-tuned model target set against your accuracy threshold.
Top-5 recall (parts matching, fine-tuned)
> 97%
Ranked list of 5 candidates. Human confirmation on top-5 is sufficient for most parts lookup workflows.
Index build time (100K vectors)
< 30 minutes
Offline. Index is pre-built, not built at query time. Incremental index updates available for new catalogue additions.
Index build time (1M vectors)
< 4 hours
Offline. Parallelised embedding extraction on GPU; Faiss index training on CPU.
Concurrent query throughput
> 50 QPS (catalogue search)
On a single T4 GPU server with Faiss HNSW index in RAM. Scales horizontally with additional workers.
Bandwidth to cloud
0 KB/s
All embedding extraction, index storage, and search on customer infrastructure.
The full pipeline — end to end
Ingestion pipeline (offline / incremental).
New catalogue items or archive frames enter the ingestion pipeline: image pre-processing (resize, normalise, colour correction for consistency), embedding extraction via the selected model (GPU-accelerated, batched for throughput), vector storage (embedding + metadata: item ID, source, timestamp, any structured attributes for filtered search), and Faiss index update (append for flat/HNSW indices; periodic re-train for IVF indices above a threshold). Ingestion pipeline runs offline for initial index build, then incrementally as new items arrive.
Query pipeline (online, latency-sensitive).
A query image arrives via the REST or gRPC API: pre-processing (same normalisation as ingestion — consistency here is critical; a mismatch between query and index pre-processing degrades recall significantly), embedding extraction (same model as ingestion; GPU or CPU depending on latency budget), Faiss nearest-neighbour search (ANN search against the index; optional metadata filter applied before or after search depending on filter selectivity), result formatting (top-k results with similarity scores, metadata, and optionally a thumbnail or archive frame reference), and API response. Total: under 500ms for catalogue search, under 10 seconds for archive search on appropriately sized indices.
Pre-processing consistency.
The single most common implementation error in visual search systems: query images are pre-processed differently from index images. A resize operation that uses bilinear interpolation at ingestion and nearest-neighbour at query time produces different embeddings for the same image. We enforce a shared pre-processing configuration used identically at both ingestion and query time, version-controlled alongside the embedding model.
The hard problems we plan for
Fine-grained similarity in homogeneous catalogues.
A parts catalogue containing 200 M6 bolts of different lengths, coatings, and thread pitches is a fine-grained retrieval problem. General-purpose embedding models collapse visually similar items into nearly identical vectors — their top-k results are all M6 bolts but not necessarily the right one. We address this through domain fine-tuning (training the embedding model to distinguish the features that matter in your taxonomy) and through structured attribute filtering (pre-filtering the index by category or dimension before the embedding similarity search).
Query image quality variance.
Field photographs submitted by technicians vary enormously in lighting, angle, focus, and occlusion. We build a query quality estimator into the interface — real-time feedback to the user on image quality before submission, with guidance on how to improve it. For automated query sources, we control quality in hardware (fixed camera position, controlled lighting). We benchmark accuracy across your actual query quality distribution, not idealised conditions.
Index freshness as the catalogue grows.
A Faiss flat or HNSW index supports efficient incremental insertion. An IVF index requires periodic re-training as the index grows beyond the original training size. We design the index maintenance pipeline during architecture review, with a re-train trigger based on index size growth, and benchmark the re-train time against your operational window (overnight batch, weekend job, or continuous background job depending on ingestion rate).
Embedding model drift.
If the embedding model is updated — due to a fine-tuning cycle, a model architecture change, or a base model upgrade — all existing index embeddings are invalidated. They must be re-generated with the new model before the new model can be deployed to the query path. We version models and indices together, run index re-generation as a background job before model promotion, and gate promotion on a recall benchmark against the held-out test set.
Cross-mode query (catalogue and archive from a single query).
A technician photographs an unknown part and wants to know: (a) what is it (catalogue match) and (b) has it appeared in any QC failure frames (archive match). This requires searching two indices from a single query embedding. We support this through a unified query API that fans out to multiple indices in parallel and merges results — the embedding is extracted once and reused across both searches.
One architecture, seven operational profiles
Vertical
Primary use case
Index type
Query latency target
Key constraint
Manufacturing
Parts matching, defect similarity, QC archive retrieval
Catalogue + QC frame archive
< 500 ms (parts) / < 10 s (archive)
Fine-grained similarity on homogeneous parts; offline index build from existing QC data
Maintenance, Repair & Overhaul
Identify unknown part from photo; find compatible replacements
Catalogue embedding index
< 500 ms
Parts catalogue may be large and heterogeneous; multi-supplier normalisation
Retail & e-commerce
Customer photo → product results; 'shop the look'
Product catalogue index (HNSW)
< 300 ms interactive
High QPS; index must update as catalogue changes; mobile query image quality varies
Security & investigations
Find person or vehicle across camera archive
Appearance embedding index (video frames)
< 10 s per query
Non-biometric re-ID for person search; large archive size; multi-camera coverage
Logistics & supply chain
Identify shipment contents from photo; match against manifest
SKU / cargo catalogue index
< 500 ms
Heterogeneous item types; query images often low-quality photos
Brand protection & compliance
Identify counterfeit or non-compliant product from image
Approved product index
< 500 ms
High precision required; false positives have commercial consequences
Healthcare & medical devices
Identify device or instrument from photo; find similar cases in archive
Device catalogue + case archive
< 2 s
Regulatory sensitivity; on-prem only; no cloud processing
Deployment topology
On-prem GPU server (standard)
Single T4 or A10 GPU server handling embedding extraction for both ingestion and query paths, with Faiss index resident in RAM. GPU handles embedding model inference; CPU handles Faiss search (Faiss is CPU-optimised). Appropriate for catalogues up to several million items and query volumes up to 50+ QPS. Zero cloud dependency.
CPU-only (small indices)
For catalogues under ~50K items and query volumes under 10 QPS, a CPU-only server is sufficient — embedding extraction on CPU is slower but viable at low volume, and Faiss flat search on CPU is fast at this index size. Lower hardware cost; appropriate for MRO lookup tools with intermittent usage.
Multi-server (large archives)
For very large archives (tens of millions of frames), the Faiss index may be sharded across multiple servers. Query API fans out to all shards in parallel, merges results, and returns the global top-k. Embedding extraction can be parallelised across GPU workers for faster archive ingestion.
Air-gapped
Full pipeline with no outbound network. Appropriate for defence, government, and regulated manufacturing environments. Model updates and index re-generation via signed packages on a controlled inbound channel.
[Request Architecture Review →] [Talk to an Engineer]
Model promotion
A new or fine-tuned model is never promoted directly to production. It runs in shadow — embedding a sample of live queries and comparing its recall against the production model on the same queries — for a configurable burn-in period. Promotion is gated on: recall improvement on the held-out test set, no regression on any existing catalogue sub-category, index re-generation complete and validated. Rollback is one command — the previous model and its index are retained until the new model is confirmed stable.
Integration surface
REST API: POST a query image, receive a JSON array of top-k results with similarity scores, item IDs, metadata, and optionally thumbnail URLs. OpenAPI spec shipped with every deployment.
gRPC API: lower-latency alternative for high-QPS deployments or embedded integrations. Proto definitions shipped with every deployment.
Batch query API: submit a list of query images, receive results asynchronously. Used for overnight QC retrospective runs or large investigative queries across a video archive.
Webhook / event output: push search results to downstream systems on query completion. Used for ERP integration (parts identification result triggers purchase order lookup), SIEM integration (archive search result triggers incident record), or QC system integration (defect match triggers review workflow).
Embedded UI component: a React or web component query interface that can be embedded in your existing QC dashboard, ERP portal, or internal tool. Configurable result display: thumbnail grid, ranked list, or map view for archive search with timestamp and camera labels.
SDKs: Python and TypeScript. Example clients and integration templates shipped with every deployment.
The integration surface is a first-class deliverable, not documentation written after the index ships.
Security and data architecture
Data residency: all embedding extraction, index storage, and search on customer infrastructure. No query image, no catalogue image, and no archive frame transits our servers.
Index encryption: Faiss index and embedding vectors encrypted at rest (AES-256, customer-managed keys). Encrypted in transit between query client and search API (TLS 1.3).
RBAC: role-based access to the query API, the index management interface, and the fine-tuning pipeline. Read-only search role for end users; index admin role for operations team; model admin role for ML team.
Audit logging: every query logged with timestamp, query image hash (not the image itself), result set, and confidence scores. Every index update and model promotion logged. Logs structured for ingestion into your existing observability stack.
Query image retention: query images are not stored by default — only the query embedding and result set are logged. Configurable image retention for audit purposes, with a separate access-controlled store and a configurable retention period.
Air-gapped operation: full pipeline with no outbound network. Model and index updates via signed packages.
Build vs buy
What a 3-month internal build looks like
A capable ML team can stand up CLIP embeddings and a Faiss flat index for a small parts catalogue in two weeks. Fine-tuning for domain accuracy, incremental index maintenance, recall monitoring, model versioning with index re-generation, the query interface, and ERP integration are the other ten weeks. Hidden costs: a held-out evaluation dataset your team needs to build, and the operational discipline to maintain a live embedding model and index as both the catalogue and the model evolve.
Where building makes sense
When visual search is a core product feature — an e-commerce platform, an industrial software product, a parts procurement platform — and you intend to invest in a permanent ML team with embedding model expertise. The build amortises.
Where buying makes sense
When visual search is an enabling capability — a quality team needs defect retrospective search, a maintenance team needs parts lookup, a security team needs archive investigation. The build does not amortise, and the ongoing model maintenance cost exceeds the engagement cost of a specialised partner.
What we won't do
Promise recall numbers we have not measured on your data — top-1 recall on a benchmark dataset is not a deployment accuracy guarantee.
Recommend a Faiss index type without benchmarking query latency on your hardware at your expected index size and QPS.
Ship a fine-tuned model without validating it against a held-out test set drawn from your catalogue — not our test set.
Lock you into our index format or model checkpoints — you own the Faiss index, the embedding model weights, and the training data.
Build a person search system using face recognition without confirming legal basis — person archive search in this system uses appearance re-ID (non-biometric), not face recognition. Face recognition on archive footage is a separate capability with separate compliance requirements.
Engagement model
Engagements start with a paid technical discovery: 2–4 weeks against your actual catalogue or archive data. We run a zero-shot embedding baseline on your data, benchmark top-1 and top-5 recall on a sample evaluation set, assess query image quality distribution, and specify the index architecture. By the end you have a reference architecture, a measured accuracy baseline, and a go/no-go decision based on actual numbers.
Production engagements run on milestone-based contracts with defined acceptance criteria: top-1 recall target on the held-out test set, query latency SLA (p99 at expected QPS), index build completion and validation. Ongoing operations (index maintenance, recall monitoring, fine-tuning cycles, on-call) run as a separate retainer scoped to your catalogue size, archive growth rate, and query volume.
[Request Architecture Review →] [Book a Discovery Call →]