freeleaps-ops/jobs/freeleaps-data-backup/README.md

277 lines
6.6 KiB
Markdown
Raw Permalink Normal View History

# Freeleaps PVC Backup Job
This job creates daily snapshots of critical PVCs in the Freeleaps production environment using Azure Disk CSI Snapshot feature.
## Overview
The backup job runs daily at 00:00 PST (Pacific Standard Time) and creates snapshots for the following PVCs:
- `gitea-shared-storage` in namespace `freeleaps-prod`
- `data-freeleaps-prod-gitea-postgresql-ha-postgresql-0` in namespace `freeleaps-prod`
## Components
- **backup_script.py**: Python script that creates snapshots and monitors their status
- **Dockerfile**: Container image definition
- **build.sh**: Script to build the Docker image
- **deploy-argocd.sh**: Script to deploy via ArgoCD
- **helm-pkg/**: Helm Chart for Kubernetes deployment
- **argo-app/**: ArgoCD Application configuration
## Features
- ✅ Creates snapshots with timestamp-based naming (YYYYMMDD format)
- ✅ Uses PST timezone for snapshot naming
- ✅ Monitors snapshot status until ready
- ✅ Comprehensive logging to console
- ✅ Error handling and retry logic
- ✅ RBAC permissions for secure operation
- ✅ Resource limits and security context
- ✅ Concurrency control (prevents overlapping jobs)
- ✅ Helm Chart for flexible configuration
- ✅ ArgoCD integration for GitOps deployment
- ✅ Incremental snapshots for cost efficiency
## Building and Deployment
### Option 1: ArgoCD Deployment (Recommended)
#### 1. Build and Push Docker Image
```bash
# Make build script executable
chmod +x build.sh
# Build the image
./build.sh
# Push to registry
docker push freeleaps-registry.azurecr.io/freeleaps-pvc-backup:latest
```
#### 2. Deploy via ArgoCD
```bash
# Deploy ArgoCD Application
./deploy-argocd.sh
```
#### 3. Monitor in ArgoCD
```bash
# Check ArgoCD application status
kubectl get applications -n freeleaps-devops-system
# Access ArgoCD UI
kubectl port-forward svc/argocd-server -n freeleaps-devops-system 8080:443
```
Then visit `https://localhost:8080` in your browser.
### Option 2: Direct Helm Deployment
#### 1. Build and Push Docker Image
```bash
# Build the image
./build.sh
# Push to registry
docker push freeleaps-registry.azurecr.io/freeleaps-pvc-backup:latest
```
#### 2. Deploy with Helm
```bash
# Deploy using Helm Chart
helm install freeleaps-data-backup ./helm-pkg/freeleaps-data-backup \
--values helm-pkg/freeleaps-data-backup/values.prod.yaml \
--namespace freeleaps-prod \
--create-namespace
```
## Monitoring
### Check CronJob Status
```bash
kubectl get cronjobs -n freeleaps-prod
```
### Check Job History
```bash
kubectl get jobs -n freeleaps-prod
```
### View Job Logs
```bash
# Get the latest job name
kubectl get jobs -n freeleaps-prod --sort-by=.metadata.creationTimestamp
# View logs
kubectl logs -n freeleaps-prod job/freeleaps-data-backup-<timestamp>
```
### Check Snapshots
```bash
kubectl get volumesnapshots -n freeleaps-prod
```
## Configuration
### Schedule
The job runs daily at 00:00 PST. To modify the schedule, edit the `cronjob.schedule` field in `helm-pkg/freeleaps-data-backup/values.prod.yaml`:
```yaml
cronjob:
schedule: "0 8 * * *" # UTC 08:00 = PST 00:00
```
### PVCs to Backup
To add or remove PVCs, modify the `backup.pvcs` list in `helm-pkg/freeleaps-data-backup/values.prod.yaml`:
```yaml
backup:
pvcs:
- "gitea-shared-storage"
- "data-freeleaps-prod-gitea-postgresql-ha-postgresql-0"
# Add more PVCs here
```
### Snapshot Class
The job uses the `csi-azuredisk-vsc` snapshot class with incremental snapshots enabled. This can be modified in `helm-pkg/freeleaps-data-backup/values.prod.yaml`:
```yaml
backup:
snapshotClass: "csi-azuredisk-vsc"
```
### Resource Limits
Resource limits can be configured in `helm-pkg/freeleaps-data-backup/values.prod.yaml`:
```yaml
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "500m"
```
## How It Works
### Snapshot Naming
Snapshots are named using the format: `{PVC_NAME}-snapshot-{YYYYMMDD}`
Examples:
- `gitea-shared-storage-snapshot-20250805`
- `data-freeleaps-prod-gitea-postgresql-ha-postgresql-0-snapshot-20250805`
### Processing Flow
1. **PVC Verification**: Each PVC is verified to exist before processing
2. **Snapshot Creation**: Individual snapshots are created for each PVC
3. **Status Monitoring**: Each snapshot is monitored until ready
4. **Independent Processing**: PVCs are processed independently (one failure doesn't affect others)
### Incremental Snapshots
The job uses Azure Disk CSI incremental snapshots, which:
- Save storage costs by only storing changed data blocks
- Create faster than full snapshots
- Maintain full recovery capability
## Troubleshooting
### Common Issues
1. **Permission Denied**: Ensure RBAC is properly configured
2. **PVC Not Found**: Verify PVC names and namespace
3. **Snapshot Creation Failed**: Check Azure Disk CSI driver status
4. **Job Timeout**: Increase timeout in the values file if needed
### Debug Mode
To run the script locally for testing:
```bash
# Install dependencies
pip install -r requirements.txt
# Run with local kubeconfig
python3 backup_script.py
```
## Security
- The job runs with minimal required permissions
- Non-root user execution
- Dropped capabilities
- Resource limits enforced
- No privileged access
## Maintenance
### Cleanup Old Snapshots
Old snapshots can be cleaned up manually:
```bash
# List all snapshots
kubectl get volumesnapshots -n freeleaps-prod
# Delete specific snapshot
kubectl delete volumesnapshot <snapshot-name> -n freeleaps-prod
# Delete snapshots older than 30 days (example)
kubectl get volumesnapshots -n freeleaps-prod -o jsonpath='{.items[?(@.metadata.creationTimestamp<"2024-07-05T00:00:00Z")].metadata.name}' | xargs kubectl delete volumesnapshot -n freeleaps-prod
```
### Updating Configuration
To update the backup configuration:
1. Modify the appropriate values file in `helm-pkg/freeleaps-data-backup/`
2. Commit and push changes to the repository
3. ArgoCD will automatically sync the changes
4. Or manually upgrade with Helm: `helm upgrade freeleaps-data-backup ./helm-pkg/freeleaps-data-backup --values values.prod.yaml`
## Backup Data
### What Gets Backed Up
- **gitea-shared-storage**: Gitea repository data, attachments, and configuration
- **data-freeleaps-prod-gitea-postgresql-ha-postgresql-0**: PostgreSQL database data
### Recovery
To restore from a snapshot:
```bash
# Create a PVC from snapshot
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: restored-pvc
namespace: freeleaps-prod
spec:
dataSource:
name: <snapshot-name>
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
EOF
```