Project Willow*
Case Study

Project Willow*

A leading Nordic engineering consultancy faced a significant challenge: decades of valuable geotechnical investigation data locked in paper archives and legacy PDF formats. With over 30,000 historical ground investigation reports spanning more than 80 years, the organization needed an AI-powered solution to extract, georeference, and make this data accessible for modern infrastructure projects. Our team developed a comprehensive AI platform that combines advanced document processing, machine learning, and geographic information systems to transform this historical archive into a searchable, map-integrated database.

94%
Extraction Accuracy
80+
Years of Data
Centralized repository
Universal Database

The Challenge

Geotechnical engineering firms accumulate vast archives of ground investigation documents over decades of operations. These documents contain critical information about subsurface conditions, drilling points, soil compositions, and site assessments that remain valuable for future projects in the same geographic areas. Document Variety: Maps, boreplan documents, scanned pages, and handwritten records from different time periods dating back to the 1940s. Document Condition: Older documents with faded colors, physical damage, and varying scan quality. Coordinate Systems: Inconsistent or missing coordinate data, various UTM zones, and legacy reference systems. Data Accessibility: No searchable database or geographic visualization of historical drilling locations.

Our Approach

We developed a comprehensive AI-powered platform with three core capabilities: 1. Intelligent Document Processing: Our hybrid extraction approach combines multiple AI technologies to maximize accuracy across diverse document types. The system employs Optical Character Recognition (OCR), Azure Form Recognizer, and Vision Language Models (VLLM) to extract metadata, coordinates, drilling point information, and site details from both modern and historical documents. 2. Multi-Approach Georeferencing: The platform implements four distinct georeferencing approaches to handle varying document quality and available location data: Location Identification: Extracts coordinates from building numbers, property records, street names, and UTM coordinates. Coordinate Pairing: Precisely locates boreplans using extracted X/Y coordinates with calculated scale factors. Alternative Reference: Handles documents lacking coordinates using property records, building names, and street addresses. Pattern Matching: Uses Canny Edge Detection and image pattern matching algorithms to align boreplan images with satellite/map imagery.

The Results

This platform transforms a passive document archive into an active decision-support tool. Engineers can now instantly locate historical drilling data for any geographic area, reducing duplicate investigations and informing project planning with decades of institutional knowledge. The ability to visualize all historical drilling points on a single map provides unprecedented insight into ground conditions across the organization's service area, supporting better risk assessment and more efficient resource allocation for new projects.

Technology Stack

AI/ML: OCR, Azure Form Recognizer, Vision Language Models (VLLM), Canny Edge DetectionGIS Integration: ArcGIS API, coordinate transformation, map overlay systemsImage Processing: Pattern matching algorithms, image sharpening, blur detectionData Management: Universal database architecture, document clustering by assignment number

Interested in a similar solution?

Start a Conversation