freeleaps-ops/docs/Kubernetes_Bootstrap_Guide.md
2025-09-04 00:58:59 -07:00

9.1 KiB

Kubernetes Bootstrap Guide

🎯 Overview

This guide explains how to bootstrap a complete Kubernetes cluster from scratch using Azure VMs and the freeleaps-ops repository. Kubernetes does NOT create automatically - you need to manually bootstrap the entire infrastructure.

📋 Prerequisites

1. Azure Infrastructure

  • Azure VMs (already provisioned)
  • Network connectivity between VMs
  • Azure AD tenant configured
  • Resource group: k8s

2. Local Environment

  • freeleaps-ops repository cloned
  • Ansible installed (pip install ansible)
  • Azure CLI installed and configured
  • SSH access to VMs

3. VM Requirements

  • Master Nodes: 2+ VMs for control plane
  • Worker Nodes: 2+ VMs for workloads
  • Network: All VMs in same subnet
  • OS: Ubuntu 20.04+ recommended

🚀 Step-by-Step Bootstrap Process

Step 1: Verify Azure VMs

# Check VM status
az vm list --resource-group k8s --query "[].{name:name,powerState:powerState,privateIP:privateIps}" -o table

# Ensure all VMs are running
az vm start --resource-group k8s --name <vm-name>

Step 2: Configure Inventory

Edit the Ansible inventory file:

cd freeleaps-ops
vim cluster/ansible/manifests/inventory.ini

Example inventory structure:

[all:vars]
ansible_user=wwwadmin@mathmast.com
ansible_ssh_common_args='-o StrictHostKeyChecking=no'

[kube_control_plane]
prod-usw2-k8s-freeleaps-master-01 ansible_host=10.10.0.4 etcd_member_name=freeleaps-etcd-01 host_name=prod-usw2-k8s-freeleaps-master-01
prod-usw2-k8s-freeleaps-master-02 ansible_host=10.10.0.5 etcd_member_name=freeleaps-etcd-02 host_name=prod-usw2-k8s-freeleaps-master-02

[kube_node]
prod-usw2-k8s-freeleaps-worker-nodes-01 ansible_host=10.10.0.6 host_name=prod-usw2-k8s-freeleaps-worker-nodes-01
prod-usw2-k8s-freeleaps-worker-nodes-02 ansible_host=10.10.0.7 host_name=prod-usw2-k8s-freeleaps-worker-nodes-02

[etcd]
prod-usw2-k8s-freeleaps-master-01
prod-usw2-k8s-freeleaps-master-02

[k8s_cluster:children]
kube_control_plane
kube_node

Step 3: Test Connectivity

cd cluster/ansible/manifests
ansible -i inventory.ini all -m ping -kK

Step 4: Bootstrap Kubernetes Cluster

cd ../../3rd/kubespray
ansible-playbook -i ../../cluster/ansible/manifests/inventory.ini ./cluster.yml -kK -b

What this does:

  • Installs Docker/containerd on all nodes
  • Downloads Kubernetes binaries (v1.31.4)
  • Generates certificates and keys
  • Bootstraps etcd cluster
  • Starts Kubernetes control plane
  • Joins worker nodes
  • Configures Calico networking
  • Sets up OIDC authentication

Step 5: Get Kubeconfig

# Get kubeconfig from master node
ssh wwwadmin@mathmast.com@10.10.0.4 "sudo cat /etc/kubernetes/admin.conf" > ~/.kube/config

# Test cluster access
kubectl get nodes
kubectl get pods -n kube-system

Step 6: Deploy Infrastructure

cd ../../cluster/manifests

# Deploy in order
kubectl apply -f freeleaps-controls-system/
kubectl apply -f freeleaps-devops-system/
kubectl apply -f freeleaps-monitoring-system/
kubectl apply -f freeleaps-logging-system/
kubectl apply -f freeleaps-data-platform/

Step 7: Setup Authentication

cd ../../cluster/bin
./freeleaps-cluster-authenticator auth

🤖 Automated Bootstrap Script

Use the provided bootstrap script for automated deployment:

cd freeleaps-ops/docs
./bootstrap-k8s-cluster.sh

Script Features:

  • Prerequisites verification
  • Azure VM status check
  • Connectivity testing
  • Automated cluster bootstrap
  • Infrastructure deployment
  • Authentication setup
  • Status verification

Usage Options:

# Full bootstrap
./bootstrap-k8s-cluster.sh

# Only verify prerequisites
./bootstrap-k8s-cluster.sh --verify

# Only bootstrap cluster (skip infrastructure)
./bootstrap-k8s-cluster.sh --bootstrap

🔧 Manual Bootstrap Commands

If you prefer manual control, here are the detailed commands:

1. Install Prerequisites

# Install Ansible
pip install ansible

# Install Azure CLI
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

# Install kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

2. Configure Azure

# Login to Azure
az login

# Set subscription
az account set --subscription <subscription-id>

3. Bootstrap Cluster

# Navigate to kubespray
cd freeleaps-ops/3rd/kubespray

# Run cluster installation
ansible-playbook -i ../../cluster/ansible/manifests/inventory.ini ./cluster.yml -kK -b

4. Verify Installation

# Get kubeconfig
ssh wwwadmin@mathmast.com@<master-ip> "sudo cat /etc/kubernetes/admin.conf" > ~/.kube/config

# Test cluster
kubectl get nodes
kubectl get pods -n kube-system

🔍 Verification Steps

1. Cluster Health

# Check nodes
kubectl get nodes -o wide

# Check system pods
kubectl get pods -n kube-system

# Check cluster info
kubectl cluster-info

2. Network Verification

# Check Calico pods
kubectl get pods -n kube-system | grep calico

# Check network policies
kubectl get networkpolicies --all-namespaces

3. Authentication Test

# Test OIDC authentication
kubectl auth whoami

# Check permissions
kubectl auth can-i --list

🚨 Troubleshooting

Common Issues

1. Ansible Connection Failed

# Check VM status
az vm show --resource-group k8s --name <vm-name> --query "powerState"

# Test SSH manually
ssh wwwadmin@mathmast.com@<vm-ip>

# Check network security groups
az network nsg rule list --resource-group k8s --nsg-name <nsg-name>

2. Cluster Bootstrap Failed

# Check Ansible logs
ansible-playbook -i inventory.ini cluster.yml -kK -b -vvv

# Check VM resources
kubectl describe node <node-name>

# Check system pods
kubectl get pods -n kube-system
kubectl describe pod <pod-name> -n kube-system

3. Infrastructure Deployment Failed

# Check CRDs
kubectl get crd

# Check operator pods
kubectl get pods --all-namespaces | grep operator

# Check events
kubectl get events --all-namespaces --sort-by='.lastTimestamp'

Recovery Procedures

If Bootstrap Fails

  1. Clean up failed installation
# Reset VMs to clean state
az vm restart --resource-group k8s --name <vm-name>
  1. Retry bootstrap
cd freeleaps-ops/3rd/kubespray
ansible-playbook -i ../../cluster/ansible/manifests/inventory.ini ./cluster.yml -kK -b

If Infrastructure Deployment Fails

  1. Check prerequisites
kubectl get nodes
kubectl get pods -n kube-system
  1. Redeploy components
kubectl delete -f <component-directory>/
kubectl apply -f <component-directory>/

📊 Post-Bootstrap Verification

1. Core Components

# ArgoCD
kubectl get pods -n freeleaps-devops-system | grep argocd

# Cert-manager
kubectl get pods -n freeleaps-controls-system | grep cert-manager

# Prometheus/Grafana
kubectl get pods -n freeleaps-monitoring-system | grep prometheus
kubectl get pods -n freeleaps-monitoring-system | grep grafana

# Logging
kubectl get pods -n freeleaps-logging-system | grep loki

2. Access Points

# ArgoCD UI
kubectl port-forward svc/argocd-server -n freeleaps-devops-system 8080:80

# Grafana UI
kubectl port-forward svc/kube-prometheus-stack-grafana -n freeleaps-monitoring-system 3000:80

# Kubernetes Dashboard
kubectl port-forward svc/kubernetes-dashboard-kong-proxy -n freeleaps-infra-system 8443:443

3. Authentication Setup

# Setup user authentication
cd freeleaps-ops/cluster/bin
./freeleaps-cluster-authenticator auth

# Test authentication
kubectl auth whoami
kubectl get nodes

🔒 Security Considerations

1. Network Security

  • Ensure VMs are in private subnets
  • Configure network security groups properly
  • Use VPN or bastion host for access

2. Access Control

  • Use Azure AD OIDC for authentication
  • Implement RBAC for authorization
  • Regular access reviews

3. Monitoring

  • Enable audit logging
  • Monitor cluster health
  • Set up alerts

📚 Next Steps

1. Application Deployment

  • Deploy applications via ArgoCD
  • Configure CI/CD pipelines
  • Set up monitoring and alerting

2. Maintenance

  • Regular security updates
  • Backup etcd data
  • Monitor resource usage

3. Scaling

  • Add more worker nodes
  • Configure auto-scaling
  • Optimize resource allocation

🆘 Support

Emergency Contacts

  • Infrastructure Team: [Contact Information]
  • Azure Support: [Contact Information]
  • Kubernetes Community: [Contact Information]

Useful Commands

# Cluster status
kubectl get nodes
kubectl get pods --all-namespaces

# Logs
kubectl logs -n kube-system <pod-name>

# Events
kubectl get events --all-namespaces --sort-by='.lastTimestamp'

# Resource usage
kubectl top nodes
kubectl top pods --all-namespaces

Last Updated: September 3, 2025 Version: 1.0 Maintainer: Infrastructure Team