# Azure Kubernetes Node Addition Runbook ## Overview This runbook provides step-by-step instructions for adding new Azure Virtual Machines to an existing Kubernetes cluster installed via Kubespray. ## Prerequisites - Access to Azure CLI with appropriate permissions - SSH access to the new VM - Access to the existing Kubernetes cluster - Kubespray installation directory ## Pre-Installation Checklist ### 1. Verify New VM Details ```bash # Get VM details from Azure az vm show --resource-group --name --query "{name:name,ip:publicIps,privateIp:privateIps}" -o table ``` ### 2. Verify SSH Access ```bash # Test SSH connection to the new VM ssh wwwadmin@mathmast.com@ # You will be prompted for password ``` ### 3. Verify Network Connectivity ```bash # From the new VM, test connectivity to existing cluster ping ``` ## Step-by-Step Process ### Step 1: Update Ansible Inventory 1. **Navigate to Kubespray directory** ```bash cd freeleaps-ops/3rd/kubespray ``` 2. **Edit the inventory file** ```bash vim ../cluster/ansible/manifests/inventory.ini ``` 3. **Add the new node to the appropriate group** For a worker node: ```ini [kube_node] # Existing nodes... prod-usw2-k8s-freeleaps-worker-nodes-06 ansible_host= ansible_user=wwwadmin@mathmast.com host_name=prod-usw2-k8s-freeleaps-worker-nodes-06 ``` For a master node: ```ini [kube_control_plane] # Existing nodes... prod-usw2-k8s-freeleaps-master-03 ansible_host= ansible_user=wwwadmin@mathmast.com etcd_member_name=freeleaps-etcd-03 host_name=prod-usw2-k8s-freeleaps-master-03 ``` ### Step 2: Verify Inventory Configuration 1. **Check inventory syntax** ```bash ansible-inventory -i ../cluster/ansible/manifests/inventory.ini --list ``` 2. **Test connectivity to new node** ```bash ansible -i ../cluster/ansible/manifests/inventory.ini kube_node -m ping -kK ``` ### Step 3: Run Kubespray Scale Playbook 1. **Execute the scale playbook** ```bash cd ../cluster/ansible/manifests ansible-playbook -i inventory.ini ../../3rd/kubespray/scale.yml -kK -b ``` **Note**: - `-k` prompts for SSH password - `-K` prompts for sudo password - `-b` enables privilege escalation ### Step 4: Verify Node Addition 1. **Check node status** ```bash kubectl get nodes ``` 2. **Verify node is ready** ```bash kubectl describe node ``` 3. **Check node labels** ```bash kubectl get nodes --show-labels ``` ### Step 5: Post-Installation Verification 1. **Test pod scheduling** ```bash # Create a test pod to verify scheduling kubectl run test-pod --image=nginx --restart=Never kubectl get pod test-pod -o wide ``` 2. **Check node resources** ```bash kubectl top nodes ``` 3. **Verify node components** ```bash kubectl get pods -n kube-system -o wide | grep ``` ## Troubleshooting ### Common Issues #### 1. SSH Connection Failed ```bash # Verify VM is running az vm show --resource-group --name --query "powerState" # Check network security groups az network nsg rule list --resource-group --nsg-name ``` #### 2. Ansible Connection Failed ```bash # Test with verbose output ansible -i ../cluster/ansible/manifests/inventory.ini kube_node -m ping -kK -vvv ``` #### 3. Node Not Ready ```bash # Check node conditions kubectl describe node # Check kubelet logs kubectl logs -n kube-system kubelet- ``` #### 4. Pod Scheduling Issues ```bash # Check node taints kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints # Check node capacity kubectl describe node | grep -A 10 "Capacity" ``` ### Recovery Procedures #### If Scale Playbook Fails 1. **Clean up the failed node** ```bash kubectl delete node ``` 2. **Reset the VM** ```bash # Reset VM to clean state az vm restart --resource-group --name ``` 3. **Retry the scale playbook** ```bash ansible-playbook -i inventory.ini ../../3rd/kubespray/scale.yml -kK -b ``` #### If Node is Stuck in NotReady State 1. **Check kubelet service** ```bash ssh wwwadmin@mathmast.com@ sudo systemctl status kubelet ``` 2. **Restart kubelet** ```bash ssh wwwadmin@mathmast.com@ sudo systemctl restart kubelet ``` ## Security Considerations ### 1. Network Security - Ensure the new VM is in the correct subnet - Verify network security group rules allow cluster communication - Check firewall rules if applicable ### 2. Access Control - Use SSH key-based authentication when possible - Limit sudo access to necessary commands - Monitor node access logs ### 3. Compliance - Ensure the new node meets security requirements - Verify all required security patches are applied - Check compliance with organizational policies ## Monitoring and Maintenance ### 1. Node Health Monitoring ```bash # Set up monitoring for the new node kubectl get nodes -o wide kubectl top nodes ``` ### 2. Resource Monitoring ```bash # Monitor resource usage kubectl describe node | grep -A 5 "Allocated resources" ``` ### 3. Log Monitoring ```bash # Monitor kubelet logs kubectl logs -n kube-system kubelet- --tail=100 -f ``` ## Rollback Procedures ### If Node Addition Causes Issues 1. **Cordon the node** ```bash kubectl cordon ``` 2. **Drain the node** ```bash kubectl drain --ignore-daemonsets --delete-emptydir-data ``` 3. **Remove the node** ```bash kubectl delete node ``` 4. **Update inventory** ```bash # Remove the node from inventory.ini vim ../cluster/ansible/manifests/inventory.ini ``` ## Documentation ### Required Information - VM name and IP address - Resource group and subscription - Node role (worker/master) - Date and time of addition - Person performing the addition ### Post-Addition Checklist - [ ] Node appears in `kubectl get nodes` - [ ] Node status is Ready - [ ] Pods can be scheduled on the node - [ ] All node components are running - [ ] Monitoring is configured - [ ] Documentation is updated ## Emergency Contacts - **Infrastructure Team**: [Contact Information] - **Kubernetes Administrators**: [Contact Information] - **Azure Support**: [Contact Information] --- **Last Updated**: [Date] **Version**: 1.0 **Author**: [Name]