K8sGPT: Redefining Kubernetes Troubleshooting Through AI Integration

Atul Srivastava
7 min readApr 25, 2024

--

Introduction

In the dynamic landscape of modern software development and deployment, Kubernetes has emerged as a cornerstone technology for managing containerized applications at scale. Its ability to automate deployment, scaling, and management of containerized workloads has revolutionized the way organizations build, deploy, and manage their applications. Simultaneously, advancements in artificial intelligence (AI) have opened new horizons for automation, optimization, and intelligent decision-making across various domains.

Enter K8sGPT — a tool for scanning your Kubernetes clusters, diagnosing, and triaging issues in simple English using AI. It detects issues in your kubernetes cluster and uses supported AI backends to recommend solutions for the issues detected.

In this blog, we embark on a journey to explore the convergence of Kubernetes and AI through the lens of K8sGPT.

Installation

K8sGPT offers two installation options: as a CLI on your workstation or as a Kubernetes operator within your cluster. For the purpose of this blog, we’ll focus on the CLI installation. If you’re interested in deploying the K8sGPT operator, you can find instructions here.

To install k8sGPT as a CLI on your workstation, please follow the instructions from the official guide here.

Below are the steps for Mac:

brew tap k8sgpt-ai/k8sgpt
brew install k8sgpt

To verify the installation:

% k8sgpt --help
Kubernetes debugging powered by AI

Usage:
k8sgpt [command]

Available Commands:
analyze This command will find problems within your Kubernetes cluster
auth Authenticate with your chosen backend
cache For working with the cache the results of an analysis
completion Generate the autocompletion script for the specified shell
filters Manage filters for analyzing Kubernetes resources
generate Generate Key for your chosen backend (opens browser)
help Help about any command
integration Integrate another tool into K8sGPT
serve Runs k8sgpt as a server
version Print the version number of k8sgpt

Flags:
--config string Default config file (/Users/atul.srivastava/Library/Application Support/k8sgpt/k8sgpt.yaml)
-h, --help help for k8sgpt
--kubeconfig string Path to a kubeconfig. Only required if out-of-cluster.
--kubecontext string Kubernetes context to use. Only required if out-of-cluster.

Use "k8sgpt [command] --help" for more information about a command.

LocalAI Setup

While K8sGPT offers various AI backends, such as OpenAI by default, this blog will focus on setting up and utilizing LocalAI on your workstation. We’ll walk through the process of configuring K8sGPT to use LocalAI as its backend.

I preferred to build localAI from source on my Mac following the instructions below:

  1. Install xcode from App store.
  2. Install build dependencies
% brew install abseil cmake go grpc protobuf wget

3. Clone LocalAI github repo and build the binary.

% git clone https://github.com/go-skynet/LocalAI.git
% cd LocalAI
% make build

4. Download any supported model(s) from https://huggingface.co/models?search=ggml into the models directory. Models supported by LocalAI for instance are Vicuna, Alpaca, LLaMA, Cerebras, GPT4ALL, GPT4ALL-J and koala. I downloaded 2 models — ggml-gpt4all-j and llama-2–7b-chat.

# Download gpt4all model
% wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j

# Use a template from the examples
% cp -rf prompt-templates/ggml-gpt4all-j.tmpl models/

# Download llama-2-7b-chat model
% wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/blob/main/llama-2-7b-chat.ggmlv3.q8_0.bin -O models/llama-2-7b-chat

# Use a template from the examples
% cp -rf prompt-templates/llama2-chat-message.tmpl models/

5. Start LocalAI

% ./local-ai --models-path=./models/ --debug=true

1:39AM INF loading environment variables from file envFile=.env
1:39AM DBG Setting logging to debug
1:39AM INF Starting LocalAI using 4 threads, with models path: /Users/atul.srivastava/Documents/k8sgpt/LocalAI/LocalAI/models
1:39AM INF LocalAI version: v2.12.3-100-gd344daf (d344daf129e5d4504ce29ada434b6e6b1025ce31)
1:39AM INF Preloading models from /Users/atul.srivastava/Documents/k8sgpt/LocalAI/LocalAI/models
1:39AM DBG Extracting backend assets files to /tmp/localai/backend_data
1:39AM DBG processing api_keys.json
1:39AM DBG api keys loaded from api_keys.json
1:39AM DBG processing external_backends.json
1:39AM DBG external backends loaded from external_backends.json
1:39AM INF core/startup process completed!
1:39AM DBG No configuration file found at /tmp/localai/upload/uploadedFiles.json
1:39AM DBG No configuration file found at /tmp/localai/config/assistants.json
1:39AM DBG No configuration file found at /tmp/localai/config/assistantsFile.json
1:39AM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8080

We now have a working AI setup in our workstation with the models downloaded above, ready to take requests on http://0.0.0.0:8080.

Integrate LocalAI With K8sGPT

With our LocalAI instance up and running, let’s integrate it as a backend for k8sGPT. We will utilize the “k8sgpt auth add” command to seamlessly incorporate the AI backend. Specify the backend type as “localai”, along with the LocalAI URL endpoint, and designate the desired model (in this case, “llama-2–7b-chat”) that was downloaded earlier.

% k8sgpt auth add -b localai -u http://localhost:8080/v1 -m llama-2-7b-chat
localai added to the AI backend provider list

Detect And Fix Issues

With all components ready, it’s time to unleash k8sGPT’s power to detect and resolve issues within our Kubernetes cluster. I’ve set up a test cluster on my laptop using kind, and configured kubectl to interface with it.

k8sGPT cli can point to your cluster using your kubeconfig.

% kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:61960
CoreDNS is running at https://127.0.0.1:61960/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To initiate issue detection, I’ll employ the “k8sgpt analyze” command. Since there are no deployments in the default namespace of the cluster yet, it’s anticipated that no issues will be uncovered.

% k8sgpt analyze --namespace default
AI Provider: AI not used; --explain not set

No problems detected

Note: I intentionally omitted the “--explain” flag in the command, ensuring that k8sgpt refrains from contacting the AI backend to provide solutions. I’ll revisit this shortly.

Next, I’ll proceed to deploy the following nginx deployment within the default namespace. Once deployed, the nginx containers from this deployment are expected to enter the CrashLoopBackOff state.

apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
securityContext:
readOnlyRootFilesystem: true
% kubectl apply -f nginx-deployment.yml
deployment.apps/nginx-deployment created

% kubectl get pods -w
NAME READY STATUS RESTARTS AGE
nginx-deployment-866dc6df9c-94hgg 0/1 Error 2 (18s ago) 20s
nginx-deployment-866dc6df9c-mj7zw 0/1 Error 2 (18s ago) 20s
nginx-deployment-866dc6df9c-vv5cs 0/1 Error 2 (19s ago) 20s
nginx-deployment-866dc6df9c-94hgg 0/1 CrashLoopBackOff 2 (12s ago) 26s
nginx-deployment-866dc6df9c-mj7zw 0/1 CrashLoopBackOff 2 (12s ago) 28s
nginx-deployment-866dc6df9c-vv5cs 0/1 CrashLoopBackOff 2 (12s ago) 30s

Running “k8sgpt analyze” should detect these issues now.

% k8sgpt analyze --namespace default
AI Provider: AI not used; --explain not set

0 default/nginx-deployment-866dc6df9c-94hgg(Deployment/nginx-deployment)
- Error: back-off 1m20s restarting failed container=nginx pod=nginx-deployment-866dc6df9c-94hgg_default(c5eb759a-048a-433a-a61c-99edc8aba72f)
- Error: the last termination reason is Error container=nginx pod=nginx-deployment-866dc6df9c-94hgg

1 default/nginx-deployment-866dc6df9c-mj7zw(Deployment/nginx-deployment)
- Error: back-off 1m20s restarting failed container=nginx pod=nginx-deployment-866dc6df9c-mj7zw_default(46cbddff-8299-48d8-a5d3-1dcf45c6d77d)
- Error: the last termination reason is Error container=nginx pod=nginx-deployment-866dc6df9c-mj7zw

2 default/nginx-deployment-866dc6df9c-vv5cs(Deployment/nginx-deployment)
- Error: back-off 1m20s restarting failed container=nginx pod=nginx-deployment-866dc6df9c-vv5cs_default(de927b1a-2457-40eb-a7c3-9807a3a23b8d)
- Error: the last termination reason is Error container=nginx pod=nginx-deployment-866dc6df9c-vv5cs

I’ll now enable k8sGPT to communicate with my AI backend and provide solutions for the detected issues. To achieve this, I’ll execute the analyze command with the “ — explain” flag, specifying the AI backend name as well.

% k8sgpt analyze --explain --backend localai --namespace default
100% |████████████████████████████████████████████████████████████████████████████| (3/3, 2 it/min)
AI Provider: localai

0 default/nginx-deployment-866dc6df9c-94hgg(Deployment/nginx-deployment)
- Error: back-off 5m0s restarting failed container=nginx pod=nginx-deployment-866dc6df9c-94hgg_default(c5eb759a-048a-433a-a61c-99edc8aba72f)
- Error: the last termination reason is Error container=nginx pod=nginx-deployment-866dc6df9c-94hgg
Note: {Any additional information or warning you want to add}
--- english --- language; back-off 5m0s restarting failed container=nginx pod=nginx-deployment-866dc6df9c-94hgg_default(c5eb759a-048a-433a-a61c-99edc8aba72f) the last termination reason is Error container=nginx pod=nginx-deployment-866dc6df9c-94hgg
Error: Container nginx failed to start due to an error.
Solution:
1. Check the container logs using kubectl exec to identify the error.
2. If the issue persists, try scaling down and up the deployment to reset the container.
3. If step 2 fails, try rolling back the deployment to a previous version.
Note: Be cautious when scaling or rolling back deployments, as it may cause data loss.

k8sGPT established a connection with the AI backend and presented it with the error messages it identified, prompting for step-by-step solutions. Debug logs from LocalAI exhibit the prompts generated by k8sGPT and the corresponding responses received from LocalAI.

2:41AM DBG Prompt (after templating): Simplify the following Kubernetes error message delimited by triple dashes written in --- english --- language; --- back-off 5m0s restarting failed container=nginx pod=nginx-deployment-866dc6df9c-94hgg_default(c5eb759a-048a-433a-a61c-99edc8aba72f) the last termination reason is Error container=nginx pod=nginx-deployment-866dc6df9c-94hgg ---.
Provide the most possible solution in a step by step style in no more than 280 characters. Write the output in the following format:
Error: {Explain error here}
Solution: {Step by step solution here}
2:41AM DBG Model already loaded in memory: llama-2-7b-chat
2:41AM DBG Model 'llama-2-7b-chat' already loaded
2:41AM DBG Response: {"created":1713989359,"object":"chat.completion","id":"e63a7b2e-3745-46e1-a43c-80068ef50ac9","model":"llama-2-7b-chat","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"\tNote: {Any additional information or warning you want to add}\n--- english --- language; back-off 5m0s restarting failed container=nginx pod=nginx-deployment-866dc6df9c-94hgg_default(c5eb759a-048a-433a-a61c-99edc8aba72f) the last termination reason is Error container=nginx pod=nginx-deployment-866dc6df9c-94hgg\nError: Container nginx failed to start due to an error.\nSolution:\n1. Check the container logs using kubectl exec to identify the error.\n2. If the issue persists, try scaling down and up the deployment to reset the container.\n3. If step 2 fails, try rolling back the deployment to a previous version.\nNote: Be cautious when scaling or rolling back deployments, as it may cause data loss."}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

The suggestions offered by k8sGPT can serve as actionable insights for engineers to resolve the above CrashLoopBackOff errors effectively. It can be integrated with tools like Slack to send out notifications and alerts along with remediation steps.

Conclusion

K8sGPT epitomizes the synergy between Kubernetes and AI. As we embrace its transformative potential, we embark on a journey towards a future where intelligent automation reshapes Kubernetes operations.

Our exploration of K8sGPT has merely touched the tip of its potential. This powerful tool not only enhances Kubernetes operations but also seamlessly integrates with other solutions, such as Trivy for vulnerability management, elevating its capabilities to new heights. For a deeper dive into its capabilities, I encourage you to explore the K8sGPT documentation and unlock the full spectrum of its functionalities.

Sign up to discover human stories that deepen your understanding of the world.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Atul Srivastava
Atul Srivastava

Written by Atul Srivastava

DevOps Consultant | DevOps Solutions Architect | DevOps Enabler

No responses yet

Write a response