SafeTensors Format Guide

由 Hugging Face 开发的安全快速的模型权重存储格式

Model Weights

Specification

SafeTensors 是由 Hugging Face 开发的一种二进制文件格式，用于安全地存储和加载机器学习模型权重。该格式于 2023 年推出，作为 Python pickle 系列格式（PyTorch 的 .bin、.pt 文件）的安全替代方案。SafeTensors 消除了 pickle 反序列化中固有的任意代码执行漏洞。当你加载基于 pickle 的模型文件时，文件中嵌入的任意 Python 代码会自动执行——这在从不可信来源加载模型时是一个严重的安全风险。SafeTensors 仅存储张量数据和元数据，使得嵌入可执行代码成为不可能。

SafeTensors 格式采用简单的结构：一个 8 字节的头部大小（小端序 uint64），一个包含张量元数据（名称、数据类型、形状和字节偏移量）的 JSON 头部，然后是原始张量数据。该格式支持所有常见数值类型，包括 float32、float16、bfloat16、int8、int32 和 int64。张量在内存中连续存储，支持通过内存映射实现零拷贝加载——张量数据可以直接从磁盘访问，无需复制到单独的内存缓冲区。

SafeTensors 已被 Hugging Face Hub 采纳为默认模型格式，大多数模型仓库都支持自动转换。主要的机器学习框架包括 PyTorch、TensorFlow、JAX 和 ONNX Runtime 均支持加载 SafeTensors 文件。该格式还支持分片存储，即单个模型的权重被分割到多个文件中，并通过索引文件将张量名称映射到相应的分片——这对于单个文件无法容纳的大型模型至关重要。

When to Use SafeTensors

SafeTensors 应该是你在开发过程中存储和分发模型权重以及进行 GPU 推理的默认格式。它是 Hugging Face Hub 上的标准格式，并得到所有主流训练和推理框架的支持。在保存训练检查点、与协作者共享模型、上传模型到模型注册中心或使用 vLLM、TGI、Triton 等 GPU 推理服务器部署模型时，都应使用 SafeTensors。

出于安全考虑，应选择 SafeTensors 而非基于 pickle 的 PyTorch 格式（.bin、.pt）——SafeTensors 不能包含可执行代码，消除了通过恶意模型文件进行供应链攻击的风险。当你需要全精度权重用于继续训练或 GPU 加速推理时，应选择 SafeTensors 而非 GGUF，因为 GGUF 是为量化 CPU 推理设计的。由于内存映射和无反序列化开销，SafeTensors 的加载速度显著快于 pickle 文件。

当目标部署为基于 CPU 的本地推理时（此类场景请使用 GGUF），或需要跨框架可移植性和运行时优化时（可考虑 ONNX），SafeTensors 不太适用。对于格式开销占比较大的极小型模型，它也不是最佳选择，不过在实践中这种情况很少出现。

Schema / Structure

json

{
  "__metadata__": {
    "format": "pt"
  },
  "model.embed_tokens.weight": {
    "dtype": "F16",
    "shape": [32000, 4096],
    "data_offsets": [0, 262144000]
  },
  "model.layers.0.self_attn.q_proj.weight": {
    "dtype": "F16",
    "shape": [4096, 4096],
    "data_offsets": [262144000, 295698432]
  },
  "model.layers.0.self_attn.k_proj.weight": {
    "dtype": "F16",
    "shape": [4096, 4096],
    "data_offsets": [295698432, 329252864]
  }
}

SafeTensors JSON 头部，展示包含名称、数据类型、形状和字节偏移量的张量元数据

Example Data

python

from safetensors.torch import save_file, load_file
import torch

# Save model weights to SafeTensors
tensors = {
    "model.embed_tokens.weight": torch.randn(32000, 4096, dtype=torch.float16),
    "model.layers.0.self_attn.q_proj.weight": torch.randn(4096, 4096, dtype=torch.float16),
    "model.layers.0.self_attn.v_proj.weight": torch.randn(4096, 4096, dtype=torch.float16),
    "lm_head.weight": torch.randn(32000, 4096, dtype=torch.float16),
}
metadata = {"format": "pt", "model_type": "llama"}
save_file(tensors, "model.safetensors", metadata=metadata)

# Load weights (zero-copy with memory mapping)
loaded = load_file("model.safetensors", device="cuda:0")
print(loaded["model.embed_tokens.weight"].shape)  # torch.Size([32000, 4096])

# Load a Hugging Face model using SafeTensors
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3-8B",
    torch_dtype=torch.float16,
    use_safetensors=True  # default on HF Hub
)

使用 PyTorch 和 Hugging Face Transformers 保存、加载和使用 SafeTensors

Ertas Support

Ertas Studio 使用 SafeTensors 作为云端训练过程中模型检查点的内部格式。训练检查点以 SafeTensors 格式保存，确保安全性和性能，保证模型产物不会包含嵌入的恶意代码。训练完成后，存储为 SafeTensors 的模型可以转换为 GGUF 用于本地推理导出，或保留 SafeTensors 格式用于 GPU 部署。

Ertas Studio 中的 Vault 以静态加密和访问控制存储 SafeTensors 模型文件，提供安全的模型注册中心，在 SafeTensors 格式的安全保障基础上增加了组织级安全控制。

Related Resources

Glossary

Fine-Tuning

Glossary

Quantization

Ship AI that runs on your users' devices.

Free plan with 30 credits/mo, no card required. Paid plans from $25/mo USD.

or view pricing →