Back to blog
    Predictive Maintenance AI: Preparing Sensor + Document Data On-Premise
    predictive-maintenancemanufacturingsensor-datadata-preparationon-premisesegment:enterprise

    Predictive Maintenance AI: Preparing Sensor + Document Data On-Premise

    How to prepare predictive maintenance training data by combining sensor time-series, maintenance logs, and failure reports — on-premise in air-gapped manufacturing environments.

    EErtas Team·

    Predictive maintenance promises to replace scheduled maintenance with condition-based maintenance — intervening only when equipment actually shows signs of degradation, not on arbitrary calendar intervals. The AI models that enable this need training data that combines sensor readings (vibration, temperature, pressure, current) with maintenance records (what failed, why, and what was done about it).

    Preparing this training data is harder than it sounds. Sensor data and maintenance logs live in different systems, use different formats, and are owned by different teams. Bringing them together into a unified training dataset — on-premise, in an air-gapped manufacturing environment — requires a deliberate data preparation pipeline.

    The Two Data Streams

    Sensor Time-Series Data

    Manufacturing equipment generates continuous sensor readings:

    • Vibration sensors: Acceleration, velocity, displacement — primary indicators of bearing and rotating machinery health
    • Temperature sensors: Bearing temperatures, motor winding temperatures, process temperatures
    • Pressure sensors: Hydraulic pressure, pneumatic pressure, coolant pressure
    • Electrical sensors: Motor current, voltage, power factor — indicators of electrical and mechanical load
    • Flow sensors: Coolant flow, lubricant flow, process material flow

    This data is typically stored in a historian (OSIsoft PI, Aveva, InfluxDB) at sampling rates from once per second to hundreds of times per second. The volume is substantial — a single machine with 20 sensors sampling at 1 Hz generates 1.7 million data points per day.

    Maintenance Records

    Maintenance logs capture what happened to the equipment:

    • Work orders: Structured records of planned and unplanned maintenance activities
    • Technician notes: Free-text descriptions of symptoms, observations, and actions taken
    • Failure reports: Root cause analyses linking failures to contributing factors
    • Parts replacement records: What was replaced, when, and why
    • Equipment manuals: Manufacturer maintenance procedures and failure mode descriptions

    This data is typically stored in a CMMS (Computerized Maintenance Management System) like SAP PM, Maximo, or Fiix — and the free-text fields are where the real intelligence lives.

    The Data Preparation Challenge

    Aligning Time-Series with Events

    The core challenge: connecting sensor patterns to maintenance outcomes. A vibration spike on March 15 needs to be linked to the bearing replacement on March 17 to create a labeled training example.

    This alignment requires:

    • Timestamp synchronization between historian and CMMS (which often use different time zones or clock sources)
    • Event windowing: Defining the time window before a failure that constitutes the "pre-failure" pattern (hours? days? weeks?)
    • Normal vs. degrading: Labeling which sensor windows represent normal operation vs. progressive degradation
    • Multiple failure modes: The same equipment can fail in different ways, each with different sensor signatures

    Extracting Intelligence from Maintenance Logs

    Technician notes contain critical information in unstructured form:

    "Checked motor on Line 3 press. Unusual vibration noted during operation. Replaced upper bearing assembly. Found significant wear on inner race. Possible contamination from coolant leak last month."

    From this note, a trained maintenance professional extracts:

    • Failure mode: Bearing wear
    • Root cause: Coolant contamination
    • Equipment: Line 3 press motor
    • Component: Upper bearing assembly
    • Severity: Significant (required replacement)

    An ML engineer without maintenance experience would miss the causal chain between the coolant leak and the bearing failure. This is why domain expert labeling is essential.

    Handling Class Imbalance

    Equipment failures are (hopefully) rare events. In a healthy manufacturing operation, 95-99% of sensor readings represent normal operation. The failure patterns that predictive maintenance needs to detect are in the remaining 1-5%.

    Training data preparation must address this:

    • Oversampling failure windows
    • Synthetic data generation for rare failure modes
    • Careful windowing to maximize the use of degradation data (the gradual decline before failure is more useful than the failure moment itself)

    The Pipeline

    Step 1: Sensor Data Export and Cleaning

    • Export from historian for relevant time ranges (typically 6-24 months per equipment unit)
    • Resample to consistent intervals if sensors have different sampling rates
    • Handle missing data (sensor dropouts, historian gaps)
    • Remove outliers caused by sensor malfunctions (not equipment issues)
    • Normalize across different sensor types and scales

    Step 2: Maintenance Record Processing

    • Export work orders and technician notes from CMMS
    • Parse free-text fields for failure mode, root cause, component, and severity
    • Standardize terminology (same failure described differently by different technicians)
    • Map maintenance events to equipment identifiers that match sensor data
    • Build a timeline of maintenance events per equipment unit

    Step 3: Data Fusion

    • Align sensor time-series with maintenance event timelines
    • Create labeled windows: "normal" (no maintenance event following), "pre-failure" (maintenance event within N days), "post-maintenance" (recently serviced)
    • Attach maintenance context to sensor windows (failure mode, root cause)
    • Create feature vectors combining sensor statistics (mean, std, peak, RMS, frequency features) with equipment metadata

    Step 4: Labeling and Validation

    • Maintenance engineers validate the alignment between sensor patterns and failure events
    • Domain experts review edge cases: Was this really a failure, or scheduled maintenance? Was the sensor reading genuine, or a measurement artifact?
    • Label remaining of health (RUL) where equipment records support it

    Step 5: Export

    • Structured datasets for time-series classification models
    • Feature matrices for traditional ML models (Random Forest, XGBoost)
    • Sequence data for LSTM/Transformer-based models
    • Documentation of sensor-to-failure mappings for model interpretability

    Why This Must Happen On-Premise

    Predictive maintenance data preparation has three hard requirements for on-premise processing:

    1. OT network isolation: Sensor data lives on the operational technology network, which is typically air-gapped from the IT network and the internet
    2. Trade secret protection: Equipment configurations, process parameters, and failure patterns are competitive intelligence
    3. Data volume: Months of high-frequency sensor data from hundreds of machines is too large for practical cloud transfer

    The data preparation tool must work within these constraints — fully offline, on local infrastructure, with no cloud dependencies.

    Ertas Data Suite is designed for exactly this environment: a native desktop application that runs air-gapped, processes both structured sensor data and unstructured maintenance logs, and exports in formats suitable for predictive maintenance models. The interface is accessible to maintenance engineers and reliability professionals who understand the equipment — not just data scientists who understand the algorithms.

    Turn unstructured data into AI-ready datasets — without it leaving the building.

    On-premise data preparation with full audit trail. No data egress. No fragmented toolchains. EU AI Act Article 30 compliance built in.

    Keep reading