Docker容器健康检查与自动重启配置实战

Docker容器健康检查与自动重启配置实战

为什么需要健康检查?

NAS上的Docker容器可能因为各种原因崩溃:内存泄漏、依赖服务不可用、配置错误等。健康检查机制可以:

  • 自动检测容器健康状�?- 故障时自动重�?- 保障服务持续可用

Docker健康检查详�?

HEALTHCHECK指令

Dockerfile中可以通过HEALTHCHECK定义健康检查:

FROM nginx:latest

# �?0秒检查一次,超时5秒,连续3次失败视为不健康
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
  CMD curl -f http://localhost/ || exit 1

健康检查选项

参数说明示例�?
–interval检查间�?30s
–timeout单次检查超�?5s
–start-period启动等待时间10s
–retries连续失败次数3

docker-compose中的健康检�?

标准配置示例

version: '3.8'

services:
  nginx:
    image: nginx:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s
    restart: unless-stopped

  mysql:
    image: mysql:8
    environment:
      MYSQL_ROOT_PASSWORD: password
    healthcheck:
      test: ["CMD", "mysqladmin", "ping", "-h", "localhost"]
      interval: 30s
      timeout: 5s
      retries: 5
    restart: unless-stopped

  redis:
    image: redis:7
    command: redis-server --appendonly yes
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 30s
      timeout: 3s
      retries: 3
    restart: unless-stopped

常用服务健康检查命�?

服务检查命�?
Nginxcurl -f http://localhost/
MySQLmysqladmin ping -h localhost
PostgreSQLpg_isready -U postgres
Redisredis-cli ping
Elasticsearchcurl -f http://localhost:9200/_cluster/health
Jellyfincurl -f http://localhost:8096/health

restart policy(重启策略)

四种策略对比

策略行为适用场景
no不重�?临时测试
on-failure失败时重�?开发环�?
always始终重启需要持续运�?
unless-stopped手动停止外始终重�?生产环境推荐

推荐配置

services:
  homeassistant:
    image: homeassistant/home-assistant:latest
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8123/"]
      interval: 30s
      timeout: 10s
      retries: 3

  jellyfin:
    image: jellyfin/jellyfin:latest
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8096/health"]
      interval: 60s
      timeout: 10s
      retries: 3

监控与告�?

查看容器健康状�?

# 查看所有容器状�?docker ps -a

# 查看健康检查详�?docker inspect --format='{{.State.Health}}' container_name

# 查看健康检查日�?docker inspect container_name | grep -A 10 "Health"

配合Portainer监控

Portainer提供了可视化健康状态查看:

  • 绿色:健�?- 黄色:正在检�?- 红色:不健康

告警脚本

#!/bin/bash
# health-alert.sh

CONTAINER=$1
STATUS=$(docker inspect --format='{{.State.Health.Status}}' $CONTAINER)

if [ "$STATUS" = "unhealthy" ]; then
    curl -s -X POST "https://notify.example.com/send" \
        -d "message=$CONTAINER is unhealthy!"
fi

常见问题解决

问题1:健康检查误�?

原因:检查间隔太短,容器启动�?解决:增�?--start-period �?

问题2:curl命令不存�?

解决:使用wget或安装curl�?```dockerfile RUN apt-get update && apt-get install -y curl


### 问题3:服务已停止但容器显示健�?
**原因**:健康检查命令有问题
**解决**:仔细测试检查命令,确保能正确检测服务状�?
## 总结

健康检查是保障NAS Docker服务稳定运行的关键。建议:
1. 所有长期运行的服务都配置健康检�?2. 使用 `restart: unless-stopped` 策略
3. 定期检查容器健康状态日�?
**相关文章**�?- [NAS Docker应用推荐](/guide/nas-docker-apps-recommend-2026)
- [Docker最佳实践](/guide/docker-best-practice)
← 返回首页