☸️ Kubernetes Mastery Guide
The Complete Encyclopedia • 180+ Components • Production Ready
180+
Total Components
74
Pod Signals
50+
API Resources
17
Failure Signals
3
Certifications
🏗️ Cluster Architecture
🎯 Control Plane Components
kube-apiserver
CriticalFront-end to control plane; validates and configures data
All kubectl commands hit this; authentication gateway
etcd
CriticalDistributed key-value store; cluster brain
Backup every 30 mins; enable TLS encryption
kube-scheduler
ControlAssigns pods to nodes based on constraints
Scheduling ML workloads on GPU nodes
kube-controller-manager
ControlRuns controller processes
Node controller marking unhealthy nodes
cloud-controller-manager
CloudInterfaces with cloud provider APIs
Provisioning LoadBalancers on AWS/Azure/GCP
🖥️ Worker Node Components
kubelet
AgentPrimary node agent; registers node with cluster
Reports node status; executes liveness probes
kube-proxy
NetworkNetwork proxy; maintains network rules
Implements Service via iptables/IPVS
Container Runtime
RuntimeRuns containers (containerd, CRI-O)
containerd is industry standard
📡 Addon Components
CoreDNS
DNSService discovery within cluster
Pods discover services via DNS names
Metrics Server
MonitoringResource usage metrics
Powers kubectl top commands
Ingress Controller
NetworkL7 load balancing
NGINX, HAProxy routing external traffic
📦 Core Workload Resources
| Resource | API Version | Namespaced | Description | Real-World Use Case |
|---|---|---|---|---|
| Pod | v1 | ✅ Yes | Smallest deployable unit | Web server + sidecar logging agent |
| Deployment | apps/v1 | ✅ Yes | Declarative pod updates | Zero-downtime rolling updates |
| StatefulSet | apps/v1 | ✅ Yes | Stateful applications | Kafka brokers, MySQL clusters |
| DaemonSet | apps/v1 | ✅ Yes | Run pod on every node | Fluentd log collection |
| Job | batch/v1 | ✅ Yes | Run to completion | Data processing batch |
| CronJob | batch/v1 | ✅ Yes | Scheduled jobs | Nightly backups |
🚀 Deployment Strategies
RollingUpdate
ProductionGradually replace pods with zero downtime
E-commerce during Black Friday
Recreate
DevelopmentTerminate all, then create new
Dev environment testing
Canary
Safe Rollout10% traffic + monitoring
New feature rollout to 10% users
Blue/Green
MigrationSwitch at load balancer
MySQL version upgrade
🌐 Networking Deep Dive
🎯 Service Types
ClusterIP
InternalInternal cluster IP only
Backend API consumed by frontend
NodePort
ExternalExpose on each node's IP:port
Development testing
LoadBalancer
CloudCloud provider provisions LB
Production web services
Headless
StatefulNo cluster IP; direct pod DNS
StatefulSet discovery (Kafka)
🔒 Network Policies - Zero-Trust Model
| Layer | Allowed Traffic | Denied Traffic |
|---|---|---|
| Internet → Ingress | 80, 443 | Everything else |
| Ingress → Web Pods | 8080 | Everything else |
| Web Pods → API Pods | 8080 | Everything else |
| API Pods → DB Pods | 5432 | Everything else |
🌐 CNI Plugin Comparison
Calico
ProductionNetworkPolicy, BGP, eBPF
Enterprise with strict security
Cilium
AdvancedeBPF, Hubble, service mesh
Microservices with observability
Flannel
SimpleSimple overlay network
Quick setup, basic needs
💾 Storage & Persistence
📀 Volume Types
emptyDir
EphemeralCache, scratch space
Redis cache
hostPath
Node-boundNode-level logs, Docker socket
Log collection
PVC
PersistentProduction data
MySQL data
📊 Access Modes
| Access Mode | Abbr | Description | Example |
|---|---|---|---|
| ReadWriteOnce | RWO | Single node read-write | MySQL database |
| ReadOnlyMany | ROX | Multiple nodes read-only | Static content |
| ReadWriteMany | RWX | Multiple nodes read-write | Shared filesystem |
🔐 Security & RBAC
🎭 RBAC Roles in Production
| Role Type | Scope | Permissions | Real Implementation |
|---|---|---|---|
| Viewer | Namespace/Cluster | List/get pods, services | Auditors, read-only monitoring |
| Developer | Namespace | Create/update deployments | Dev team deploying apps |
| SRE | Cluster | Cluster-wide view, debug | Operations team |
| Admin | Cluster | Full access | Platform team |
🔐 Secret Types
Opaque
Arbitrary key-value
API keys, passwords
kubernetes.io/tls
TLS certificates
SSL certs for ingress
kubernetes.io/dockerconfigjson
Registry credentials
Pull from private registry
📊 Observability & Health
🩺 Probes
LivenessProbe
RestartIs app alive? Restart if fails
Web server responding to /healthz
ReadinessProbe
TrafficIs app ready for traffic?
App loading cache
StartupProbe
Slow StartHas app started?
Java app with 2-min startup
📈 Metrics Pipeline
| Component | Purpose | Popular Tools |
|---|---|---|
| Node Metrics | CPU, memory, disk per node | Node Exporter |
| Container Metrics | Per-container resource usage | cAdvisor |
| Collection | Scrape and store metrics | Prometheus |
| Visualization | Dashboards | Grafana |
⚡ Autoscaling
HPA
PodsScale pods by metrics
Scale from 3 to 20 pods at 70% CPU
VPA
ResourcesScale resources per pod
Adjust CPU from 250m to 500m
Cluster Autoscaler
NodesScale cluster nodes
Add nodes when pods pending
📊 HPA Metric Types
CPU Utilization
Average CPU across pods
Target 70% utilization
Memory Utilization
Average memory across pods
Target 80% utilization
Custom Metrics
Requests per second
Scale based on business metrics
🎯 Advanced Scheduling
📍 Node Affinity
requiredDuringScheduling
Must match to schedule
GPU workloads must go to GPU nodes
preferredDuringScheduling
Try to match, not required
Prefer SSD storage when available
🚫 Taints & Tolerations
| Node Type | Taint | Workloads Allowed |
|---|---|---|
| Control Plane | node-role.kubernetes.io/master:NoSchedule | System pods only |
| GPU Nodes | gpu=true:NoSchedule | ML workloads with toleration |
| Spot Nodes | spot=true:NoExecute | Fault-tolerant batch jobs |
📊 PriorityClass Values
| Priority Level | Value | Use Case |
|---|---|---|
| system-cluster-critical | 2000000000 | CoreDNS, metrics-server |
| high-priority | 1000000 | User-facing APIs |
| low-priority | 100 | CI/CD test jobs |
📏 Policy & Governance
📊 ResourceQuota - Real-World Implementation
| Environment | CPU Request | Memory Request | Pods | PVCs |
|---|---|---|---|---|
| Production | 20 cores | 80 Gi | 50 | 10 |
| Staging | 10 cores | 40 Gi | 25 | 5 |
| Development | 5 cores | 20 Gi | 15 | 2 |
📏 LimitRange Strategies
| Environment | Min CPU | Max CPU | Default CPU | Ratio |
|---|---|---|---|---|
| Production | 250m | 4 | 500m | 2:1 |
| Staging | 100m | 2 | 250m | 4:1 |
| Development | 50m | 1 | 100m | 10:1 |
📈 Pod-Level Signals (74+)
🩺 Health Signals (9)
LivenessProbe
ReadinessProbe
StartupProbe
ReadinessGates
PodConditions
ContainersReady
Ready
Initialized
PodScheduled
🔄 Lifecycle Signals (13)
PodPhase
ContainerStateWaiting
ContainerStateRunning
ContainerStateTerminated
ContainerLastState
RestartCount
ExitCode
TerminationMessage
TerminationGracePeriodSeconds
DeletionTimestamp
Finalizers
PreStopHook
PostStartHook
📊 Resource Signals (13)
CPUUsage
MemoryUsage
EphemeralStorageUsage
ResourceRequests
ResourceLimits
QoSClass
OOMKilled
CPUThrottling
Evicted
MemoryPressure
DiskPressure
PIDPressure
🚨 Production Failure Signals
CrashLoopBackOff
CriticalContainer crashes repeatedly
Check logs, fix app
ImagePullBackOff
WarningCannot pull image
Verify image name/tag
OOMKilled
CriticalOut of memory killed
Increase memory limit
NodeNotReady
WarningNode not ready
Check kubelet, restart
Evicted
CriticalPod evicted
Add resources, reduce load
FailedScheduling
WarningCannot schedule
Add nodes, reduce requests
🔧 Extending Kubernetes
CRD
CustomDefine custom resources
Define PostgreSQL custom resource
Operator
AutomationApplication lifecycle automation
Prometheus Operator
Admission Webhooks
ValidationMutate/validate requests
Inject sidecar containers
🤖 Operator Maturity Levels
Level 1
Basic install
Deploy Prometheus with defaults
Level 3
Full lifecycle
Backup, restore, failover
Level 5
Auto-pilot
Auto-scaling, auto-healing
📋 Complete API Reference
| Category | Resource | API Version | Namespaced |
|---|---|---|---|
| 📦 Workloads | Pod | v1 | ✅ Yes |
| 📦 Workloads | Deployment | apps/v1 | ✅ Yes |
| 🌐 Networking | Service | v1 | ✅ Yes |
| 🌐 Networking | IngressClass | networking.k8s.io/v1 | ❌ No |
| 💾 Storage | PersistentVolume | v1 | ❌ No |
| 💾 Storage | StorageClass | storage.k8s.io/v1 | ❌ No |
| 🔐 Security | Role | rbac.authorization.k8s.io/v1 | ✅ Yes |
| 🔐 Security | ClusterRole | rbac.authorization.k8s.io/v1 | ❌ No |
🎓 Certification Path
CKA
Certified Kubernetes Administrator
6-12 months exp
Cluster administration, networking, troubleshooting
CKAD
Certified Kubernetes Application Developer
3-6 months exp
Application design, configuration, multi-container pods
CKS
Certified Kubernetes Security Specialist
1-2 years exp
Security, RBAC, policy enforcement
🛠️ Essential kubectl Commands
kubectl get all -A
List all resources in all namespaces
kubectl describe pod <name>
Detailed info about a resource
kubectl logs <pod> --tail=50
View container logs
kubectl exec -it <pod> -- /bin/sh
Shell into a container
kubectl port-forward <pod> 8080:80
Forward local port to pod
kubectl top pod -n prod
Show pod resource usage
kubectl get events --sort-by='.lastTimestamp'
View recent events
kubectl api-resources
List all available resources
kubectl explain pod
Documentation for resource
kubectl auth can-i create pods
Check permissions
📚 Learning Path
1
Week 1-2: Fundamentals
Pods, Deployments, Services, ConfigMaps
2
Week 3-4: Storage & Networking
Volumes, PV/PVC, Ingress, Network Policies
3
Week 5-6: Security & RBAC
Roles, Bindings, ServiceAccounts, Secrets
4
Week 7-8: Policy & Governance
ResourceQuota, LimitRange, PodDisruptionBudget
5
Week 9-10: Autoscaling & Scheduling
HPA, VPA, Affinity, Taints, Tolerations
6
Month 3-6: Advanced Topics
Operators, CRDs, Service Mesh, Multi-cluster
✅ Production Readiness Checklist
✓
Understand cluster architecture (Control Plane, Nodes, Addons)
✓
Deploy pods with liveness/readiness probes
✓
Configure Services (ClusterIP, NodePort, LoadBalancer)
✓
Set up ConfigMaps and Secrets (not in git!)
✓
Configure ResourceQuota for namespaces
✓
Set LimitRange for containers
✓
Implement NetworkPolicy for zero-trust
✓
Set up monitoring (Prometheus + Grafana)
✓
Configure HPA for autoscaling
✓
Implement RBAC (Roles, RoleBindings)
✓
Backup etcd regularly
0 Comments