ONNX Format Guide

用於跨平台推論的開放神經網路交換格式

Model Weights

Specification

ONNX（Open Neural Network Exchange，開放神經網路交換）是一種用於表示機器學習模型的開放標準格式。最初由 Microsoft 和 Facebook 於 2017 年開發，現由 Linux Foundation 管理，ONNX 定義了一組通用的運算子、資料型別和計算圖格式，使在一個框架中訓練的模型能夠部署到另一個框架中。該格式使用 Protocol Buffers（protobuf）進行序列化，並支援廣泛的模型架構，包括卷積神經網路、遞迴網路、Transformer 和傳統機器學習模型。

ONNX 模型表示為有向無環圖（DAG），其中節點代表操作（卷積、矩陣乘法、激活函數等），邊代表在操作之間流動的張量，圖具有定義的輸入和輸出。ONNX 運算子集（opset）是版本化的，允許模型指定它們需要哪個版本的運算子定義。截至 opset 21，ONNX 定義了超過 200 個運算子，涵蓋神經網路操作、數學函數、張量操作和控制流。

該格式包含一個可擴展的中繼資料系統，用於模型文件、訓練資訊和自訂屬性。ONNX 還支援量化資料型別（INT8、UINT8）和混合精度表示，實現模型最佳化以部署在資源受限的裝置上。ONNX Model Zoo 提供了一系列 ONNX 格式的預訓練模型，涵蓋影像分類、物件偵測、NLP、語音辨識和其他常見任務。

When to Use ONNX

當您需要跨平台模型部署時——在 PyTorch 或 TensorFlow 中訓練並在不同硬體或運行時環境中部署——ONNX 是正確的選擇。ONNX Runtime（ORT）提供跨 CPU（x86、ARM）、GPU（NVIDIA、AMD、Intel）、NPU 和專用加速器的最佳化推論。如果您的部署目標包括 Windows 應用程式（透過 DirectML）、行動裝置（透過 ONNX Runtime Mobile）、網頁瀏覽器（透過 ONNX Runtime Web/WebAssembly）或邊緣裝置，ONNX 提供了一種適用於所有目標的單一模型格式。

當推論性能至關重要且您想受益於 ONNX Runtime 廣泛的圖最佳化時，選擇 ONNX，這些最佳化包括運算子融合、常數折疊、佈局最佳化和量化。ONNX Runtime 在各種模型架構和硬體平台上持續位居最快推論引擎之列。它對用於分類、NER 和嵌入任務的 Transformer 編碼器模型（BERT、RoBERTa、DeBERTa）特別強。

ONNX 不太適合非常大的生成式語言模型（超過 7B 參數），其中 vLLM、TensorRT-LLM 或 llama.cpp 等專門的推論引擎透過連續批處理、PagedAttention 和推測解碼等技術提供更好的性能。ONNX 也存在轉換差距——並非所有 PyTorch 操作都受 ONNX 匯出器支援，特別是動態控制流和某些自訂運算子，在匯出期間可能需要變通方法。

Schema / Structure

protobuf

// ONNX Model structure (simplified from onnx.proto3)
message ModelProto {
  int64 ir_version = 1;           // IR version (currently 9)
  repeated OperatorSetIdProto opset_import = 8;
  string producer_name = 2;       // e.g., "pytorch"
  string producer_version = 3;
  string domain = 4;
  int64 model_version = 5;
  string doc_string = 6;
  GraphProto graph = 7;           // The computation graph
  repeated StringStringEntryProto metadata_props = 14;
}

message GraphProto {
  string name = 1;
  repeated NodeProto node = 2;      // Operations in the graph
  repeated TensorProto initializer = 5; // Pretrained weights
  repeated ValueInfoProto input = 11;
  repeated ValueInfoProto output = 12;
}

message NodeProto {
  repeated string input = 1;
  repeated string output = 2;
  string op_type = 4;              // e.g., "Conv", "MatMul", "Relu"
  repeated AttributeProto attribute = 5;
}

簡化的 ONNX protobuf 架構，展示模型、圖和節點結構

Example Data

python

import torch
import onnx
import onnxruntime as ort
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Export a HuggingFace model to ONNX
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

dummy_input = tokenizer("This is a test", return_tensors="pt")
torch.onnx.export(
    model,
    (dummy_input["input_ids"], dummy_input["attention_mask"]),
    "sentiment_model.onnx",
    input_names=["input_ids", "attention_mask"],
    output_names=["logits"],
    dynamic_axes={"input_ids": {0: "batch", 1: "seq"},
                  "attention_mask": {0: "batch", 1: "seq"},
                  "logits": {0: "batch"}},
    opset_version=17,
)

# Run inference with ONNX Runtime
session = ort.InferenceSession("sentiment_model.onnx")
inputs = tokenizer("Great product, highly recommend!", return_tensors="np")
outputs = session.run(None, dict(inputs))
print(f"Logits: {outputs[0]}")  # [[negative_score, positive_score]]

將 HuggingFace 情感模型匯出為 ONNX 並使用 ONNX Runtime 運行推論

Ertas Support

Ertas Studio 支援 ONNX 作為已訓練模型的匯出目標，使部署跨越 ONNX Runtime 支援的多樣化運行時環境。透過 Ertas 雲端訓練管線訓練的模型可以匯出為 ONNX 格式，並針對您的目標部署平台進行最佳化，無論是邊緣裝置上的 CPU 推論、資料中心的 GPU 推論，還是透過 WebAssembly 的瀏覽器推論。

Related Resources

Glossary

Inference

Ship AI that runs on your users' devices.

Free plan with 30 credits/mo, no card required. Paid plans from $25/mo USD.

or view pricing →