
AI-Powered Building Intelligence
How we combined YOLO, SAM, and CLIP to build a computer vision platform that analyses architectural floorplans in seconds, not hours.
Manual review was the bottleneck
Architectural compliance checking is one of the most time-consuming stages in building design. Every floorplan needs to be reviewed against hundreds of regulations covering fire safety, accessibility, structural clearances, and spatial requirements.
For Arcus, this meant teams of reviewers spending hours on each plan, manually measuring corridors, counting fire exits, and cross-referencing room layouts against building codes. Errors were common. Feedback cycles were slow. Projects stalled waiting for sign-off.
They needed a system that could read a floorplan the way an experienced architect does, but at machine speed and with consistent accuracy.
A vision pipeline built for precision
We designed and built a multi-stage computer vision pipeline that combines three specialist AI models, each handling the part of the problem it does best.
YOLO handles fast, accurate object detection across 40+ element types. SAM delivers pixel-precise segmentation for room boundaries and spatial analysis. CLIP provides zero-shot classification, understanding what each space is without needing labelled examples for every room type.
The result is a system that ingests a floorplan, identifies every structural element, understands the spatial layout, and checks compliance against building regulations - all in under eight seconds.
The analysis pipeline
Five stages transform a raw floorplan into structured compliance data. Each stage builds on the previous, creating a progressively richer understanding of the document.
Document intake
Floorplans, CAD exports, and scanned drawings are normalised into a consistent format. OpenCV handles skew correction, noise reduction, and adaptive thresholding to produce clean binary images ready for detection.
Object detection with YOLO
A fine-tuned YOLOv8 model identifies structural elements - walls, doors, windows, columns, staircases, and fire exits - in real time. The model was trained on thousands of annotated floorplans across residential, commercial, and industrial building types.
Precision segmentation with SAM
Meta's Segment Anything Model extracts pixel-precise boundaries for each detected element. Room boundaries, corridors, and open-plan areas are isolated with sub-pixel accuracy, enabling reliable area calculations and spatial reasoning.
Semantic understanding with CLIP
CLIP's zero-shot classification identifies room types, labels, and annotations without manual tagging. By matching visual regions against natural language descriptions, the system understands context: distinguishing a kitchen from a bathroom, or a fire escape from a standard exit.
Compliance engine
Detected elements and spatial relationships are evaluated against building regulations. Corridor widths, fire exit distances, accessibility clearances, and room proportions are checked automatically, with violations flagged and localised on the original plan.
From manual to machine-assisted
- Manual measurement of every corridor, doorway, and room
- Hours per floorplan for a single compliance review
- Inconsistent results depending on the reviewer
- Feedback delays slowing down project timelines
- Errors caught late, often during construction
- No structured data output for downstream systems
- Automated detection of 40+ element types in seconds
- Sub-8-second analysis time per floorplan
- Consistent, repeatable results on every review
- Instant feedback with violations localised on the plan
- Issues caught at design stage, before construction begins
- Structured JSON output feeding BIM and project management tools
The stack behind the intelligence
Each tool was chosen for a specific role in the pipeline. No bloat, no unnecessary abstraction - just the right model for each job.
Real-time detection of 40+ architectural element classes. Optimised for floorplan line art with custom anchor ratios and augmentation strategies tuned for technical drawings.
Pixel-precise boundary extraction for rooms and structural elements. Prompt-based segmentation allows operators to refine results interactively when edge cases arise.
Natural language-driven classification eliminates the need for labelled training data when new room types or annotations appear. Handles multilingual plans without retraining.
Pre-processing pipeline for document normalisation, adaptive thresholding, contour detection, and geometric measurement. The backbone that ensures consistent input quality across scan types.
Custom training loops with mixed-precision training and distributed data parallel for efficient fine-tuning. Model versioning and A/B evaluation pipelines for continuous improvement.
Async Python API serving model predictions with sub-second latency. Batch processing endpoints for bulk plan analysis and webhook-based notifications for long-running jobs.
Detection accuracy across element types
Average analysis time per floorplan
Architectural element classes recognised
Reduction in manual review time
Proof that AI, when trained and tuned properly, can transform an industry
"We went from spending half a day on a single compliance review to getting results in seconds. The system catches things that even experienced reviewers miss, and the structured output feeds directly into our BIM workflows."
What made this work
Specialist models, not one-size-fits-all
Rather than forcing a single model to handle detection, segmentation, and classification, we used three purpose-built models and orchestrated them into a pipeline. Each model does what it does best.
Domain-specific training data
Off-the-shelf models struggle with technical drawings. We built custom training datasets from real architectural plans, annotated by people who understand building design - not generic crowdsourced labelling.
Human-in-the-loop refinement
SAM's prompt-based interface means operators can correct edge cases interactively. Those corrections feed back into the training pipeline, so the system improves with every plan it processes.
Safer, smarter design decisions
Arcus now processes hundreds of floorplans per week with consistent accuracy. Compliance issues that used to surface during construction are caught at the design stage, saving time, cost, and risk.
The structured output integrates with BIM platforms and project management tools, giving architects and safety teams a shared, data-driven view of every project.
Most importantly, the system keeps learning. Every plan it processes, every correction an operator makes, feeds back into the training pipeline. The models get better with use, not worse.

Ready to put AI to work on your toughest problems?
Whether it is computer vision, automation, or a challenge we have not seen yet, we will find the right approach.