Jenkins has served as the backbone of the CI/CD landscape for over a decade. Throughout these years, CI/CD practices have transformed from jobs executed in companies’ own data centers to those running in the cloud. Jenkins has adapted and evolved throughout this time, remaining a workhorse in the ever-changing CI/CD domain.
If you looked at a typical AWS-based Jenkins setup, you would probably see a master Jenkins node running in EC2. When initiating a job, the master node dynamically spawns an EC2 worker instance to execute the task, subsequently terminating the worker node upon completion. This setup saves time and money from the old ways of having your master and worker nodes on dedicated hardware, always running even if there are no jobs to run.
Even though running Jenkins in EC2 saves time and money, we can do better. For example, instead of spawning a whole EC2 instance for a single job, we can spawn it in a Docker container. CI/CD platforms like GitHub Actions and Circle CI can do just this; however, we don’t have the configurability and full control of worker nodes that Jenkins gives us.
We can solve this by once again evolving Jenkins to execute jobs within pods rather than EC2 instances and using tools like the Jenkins Operator and Karpenter in EKS to achieve this.
Here is an example of Jenkins running in EKS using Kaprenter node scaling, let’s get started!
Prerequisites
- Running EKS Cluster and permissions to create IAM roles and policies.
- AWS CLI is installed and configured.
- The Kubernetes command-line tool (kubectl) is installed.
- Installing services using Helm
Jenkins Operator
I will be installing the Jenkins Operator to manage our CI/CD environment in EKS. The operator will create and maintain our Jenkins master server, seed test jobs from a GitHub repo, and manage worker pods that have jobs spawned from the master.
I’m using Helm to install the operator with the following values.yaml file:
jenkins: seedJobs: - id: jenkins-operator targets: "services/jenkins/cicd/jobs/*.jenkins" description: "Test Jenkins Jobs" repositoryBranch: blog/jenkins repositoryUrl: https://github.com/drogerschariot/gitops-playground basePlugins: - name: kubernetes version: 4029.v5712230ccb_f8 - name: workflow-job version: 1342.v046651d5b_dfe - name: workflow-aggregator version: 596.v8c21c963d92d - name: git version: 5.2.1 - name: job-dsl version: "1.85" - name: configuration-as-code version: 1670.v564dc8b_982d0 - name: kubernetes-credentials-provider version: 1.234.vf3013b_35f5b_a - name: prometheus version: 2.5.0 enabled: true namespace: jenkins latestPlugins: true resources: limits: cpu: 500m memory: 1.5Gi requests: cpu: 250m memory: 1Gi volumes: - name: backup persistentVolumeClaim: claimName: jenkins-backup backup: enabled: true pvc: enabled: true size: 5Gi resources: limits: cpu: 100m memory: 500Mi requests: cpu: 100m memory: 500Mi env: - name: BACKUP_DIR value: /backup - name: JENKINS_HOME value: /jenkins-home - name: BACKUP_COUNT value: "3" volumeMounts: - name: jenkins-home mountPath: /jenkins-home - mountPath: /backup name: backup cert-manager: startupapicheck: enabled: false operator: replicaCount: 1
Let’s go through some important configurations in the values.yaml file. Here is the git repository, branch, and path which has the DSL seed job code the Jenkins Operator will use to sync pipelines. I will talk about the seed job process later:
seedJobs: - id: jenkins-operator targets: "services/jenkins/cicd/jobs/*.jenkins" description: "Test Jenkins Jobs" repositoryBranch: blog/jenkins repositoryUrl: https://github.com/drogerschariot/gitops-playground
Here is the resource definition of the Jenkins master instance, and where backups will be saved:
resources: limits: cpu: 500m memory: 1.5Gi requests: cpu: 250m memory: 1Gi volumes: - name: backup persistentVolumeClaim: claimName: jenkins-backup
Installing the Operator
Run the following to install the operator and use the values.yaml mentioned above:
$ kubectl create namespace jenkins $ helm repo add jenkins https://raw.githubusercontent.com/jenkinsci/kubernetes-operator/master/chart $ helm install jenkins jenkins/jenkins-operator -n jenkins --values values.yaml
1. Watch Jenkins instances being created:
$ kubectl --namespace jenkins get pods -w
2. Get Jenkins credentials:
$ kubectl --namespace jenkins get secret jenkins-operator-credentials-jenkins -o 'jsonpath={.data.user}' | base64 -d $ kubectl --namespace jenkins get secret jenkins-operator-credentials-jenkins -o 'jsonpath={.data.password}' | base64 -d
3. Port forward to the Jenkins master running in the cluster:
$ kubectl --namespace jenkins port-forward jenkins-jenkins 8080:8080
Now just browse to http://localhost:8080 and use the credentials from the command above.
Seeding jobs
The seeding process in the DSL Jenkins plugin involves creating and managing Jenkins jobs using Groovy scripts, often referred to as Jenkins Job DSL scripts. These scripts live as IaC, usually in a git repo. By following this process, you can use automation to manage and maintain Jenkins job configurations efficiently, especially in environments with many jobs or frequent changes.
For my tests, I have 4 jobs located at https://github.com/drogerschariot/gitops-playground/blob/blog/jenkins/services/jenkins/cicd/jobs/k8s_jobs.jenkins. When you run the seed job, it will sync with the repo and make changes if they exist:
Running Jobs
When a job is started, the Jenkins operator will start a pod, which is defined in the DSL PodTemplate. Let’s focus on the podTemplate()
function:
podTemplate( label: label, containers: [ containerTemplate( name: 'build-npm', image: 'alpine:3.11', ttyEnabled: true, resourceLimitCpu: '500m', resourceLimitMemory: '500Mi', resourceRequestCpu: '250m', resourceRequestMemory: '250Mi' ) ], )
This template will define the pod for the “build-maven” job. The benefit of this is that every Jenkins pipeline can have resources and a Docker image specifically designed for the task.
Here I run 3 build-maven and 3 build-npm pipelines; notice the pods running compared to the executers.
The controller will read the pod template, then create the pod for the executors to connect to. When the job is done, the controller will remove the pod. This drastically improves the speed of pipelines compared to executors starting EC2 instances per task.
However, what if there are no more resources available and you start seeing the dreaded “Pending” status because there are no more physical nodes left? EKS has two ways of node scaling, and one of them works perfectly with the Jenkins operator.
Karpenter
In the past, AWS would recommend using the Cluster Autoscaler for node scaling in EKS. The autoscaler would be controlled by a pod inside EKS that monitored pending pods, then updated the managed node group’s ASG to scale up or down. This works fine; however you are tied to the ASG that you are scaling and don’t have much control over defining how scaling happens in different scenarios.
Now AWS recommends using Karpenter when doing just-in-time node scaling. Karpenter gives us the NodePool CRD, so we can now define exactly what and how we want the physical nodes to scale and not be tied down to an AWS Autoscaling Group.
Using Karpenter with the Jenkins Operator gives us even more options to speed up pipelines and save money. Let’s look at using Karpenter vs. the cluster autoscaler.
IAM Permissions
Note: I won’t go into the permissions needed for the cluster autoscaler; however they are very similar to Karpenter.
Karpenter uses pod identity and a Kubernetes service account to create EC2 instances and add them to EKS, so we need the following IAM role:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "pods.eks.amazonaws.com" }, "Action": [ "sts:AssumeRole", "sts:TagSession" ] } ] }
and policy:
{ "Statement": [ { "Action": [ "ssm:GetParameter", "ec2:DescribeImages", "ec2:RunInstances", "ec2:DescribeSubnets", "ec2:DescribeSecurityGroups", "ec2:DescribeLaunchTemplates", "ec2:DescribeInstances", "ec2:DescribeInstanceTypes", "ec2:DescribeInstanceTypeOfferings", "ec2:DescribeAvailabilityZones", "ec2:DeleteLaunchTemplate", "ec2:CreateTags", "ec2:CreateLaunchTemplate", "ec2:CreateFleet", "ec2:DescribeSpotPriceHistory", "iam:GetInstanceProfile", "iam:CreateInstanceProfile", "iam:TagInstanceProfile", "iam:AddRoleToInstanceProfile", "iam:PassRole", "pricing:GetProducts" ], "Effect": "Allow", "Resource": "*", "Sid": "Karpenter" }, { "Sid": "AllowInterruptionQueueActions", "Effect": "Allow", "Resource": "${aws_sqs_queue.karpenter_queue.arn}", "Action": [ "sqs:DeleteMessage", "sqs:GetQueueUrl", "sqs:ReceiveMessage" ] }, { "Action": "ec2:TerminateInstances", "Condition": { "StringLike": { "ec2:ResourceTag/karpenter.sh/nodepool": "*" } }, "Effect": "Allow", "Resource": "*", "Sid": "ConditionalEC2Termination" }, { "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::<AWS_ACCOUNT>:role/KarpenterNodeRole-<EKS_CLUSTER_NAME>", "Sid": "PassNodeIAMRole" }, { "Effect": "Allow", "Action": "eks:DescribeCluster", "Resource": "arn:aws:eks:${var.region}:<AWS_ACCOUNT>:cluster/<EKS_CLUSTER_NAME>", "Sid": "EKSClusterEndpointLookup" } ], "Version": "2012-10-17" }
When we attach this role to Pod Identity associations, the Kapenter operator will have access to create EC2 instances and add them as worker nodes in EKS:
Install Karpenter
We will use Helm to install the Karpenter Operator:
$ helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version "0.35.0" --namespace "kube-system" \ --set "settings.clusterName=my-eks-cluster" \ --set "settings.interruptionQueue=my-eks-cluster" \ --set controller.resources.requests.cpu=250m \ --set controller.resources.requests.memory=256Mi \ --set controller.resources.limits.cpu=500m \ --set controller.resources.limits.memory=512Mi \ --wait
We see the operator running:
$ kubectl get pods -l "app.kubernetes.io/name=karpenter" -n kube-system NAME READY STATUS RESTARTS AGE karpenter-84749cc94f-qsxn5 1/1 Running 0 5m21s karpenter-84749cc94f-xczdw 1/1 Running 0 5m21s
The Karpenter operator will install the NodePool and EC2NodeClass CRDs. Here is an example of 3 NodePool configs using one EC2NodeClass:
apiVersion: karpenter.sh/v1beta1 kind: NodePool metadata: name: small spec: template: spec: requirements: - key: kubernetes.io/os operator: In values: ["linux"] - key: node.kubernetes.io/instance-type operator: In values: ["t4g.nano", "t4g.micro", "t4g.small", "t4g.medium"] - key: "karpenter.sh/capacity-type" operator: In values: ["spot"] nodeClassRef: name: default limits: cpu: 250 disruption: consolidationPolicy: WhenUnderutilized expireAfter: 720h # 30 * 24h = 720h --- apiVersion: karpenter.sh/v1beta1 kind: NodePool metadata: name: large spec: template: spec: requirements: - key: kubernetes.io/os operator: In values: ["linux"] - key: node.kubernetes.io/instance-type operator: In values: ["t4g.xlarge"] - key: "karpenter.sh/capacity-type" operator: In values: ["spot"] nodeClassRef: name: default limits: cpu: 250 disruption: consolidationPolicy: WhenUnderutilized expireAfter: 720h # 30 * 24h = 720h --- apiVersion: karpenter.sh/v1beta1 kind: NodePool metadata: name: on-demand spec: template: spec: requirements: - key: kubernetes.io/os operator: In values: ["linux"] - key: node.kubernetes.io/instance-type operator: In values: ["t4g.xlarge"] - key: "karpenter.sh/capacity-type" operator: In values: ["on-demand"] nodeClassRef: name: default limits: cpu: 250 disruption: consolidationPolicy: WhenUnderutilized expireAfter: 720h # 30 * 24h = 720h --- apiVersion: karpenter.k8s.aws/v1beta1 kind: EC2NodeClass metadata: name: default spec: amiFamily: AL2 # Amazon Linux 2 role: "karpenter-node-role" subnetSelectorTerms: - tags: karpenter.sh/discovery: my-eks-cluster securityGroupSelectorTerms: - tags: karpenter.sh/discovery: my-eks-cluster
Lets go through each NodePool:
- NodePool small:
- Only use t4g.nano, t4g.micro, t4g.small, t4g.medium EC2 types
- Only use spot instances
- NodePool large:
- Only use t4g.xlarge EC2 types
- Only use spot instances
- NodePool on-demand:
- Only use t4g.xlarge EC2 types
- Only use on-demand instances
Now we can choose which NodePool our Jenkins jobs will use when we add the nodeSelector: 'karpenter.sh/capacity-type=spot'
property to our PodTemplate.
podTemplate( label: label, nodeSelector: 'karpenter.sh/capacity-type=spot', // Matching label for NodePool containers: [ containerTemplate( name: 'build-npm', image: 'alpine:3.11', ttyEnabled: true, resourceLimitCpu: '500m', resourceLimitMemory: '500Mi', resourceRequestCpu: '250m', resourceRequestMemory: '250Mi' ) ], )
To test this I will run 4 pipelines all at the same time:
- build-npm: requires 500m CPU and 500Mi Memory and spot instances 15 times
- build-maven: requires 1000m CPU and 1Gi Memory and spot instances 15 times.
- build-npm-large: requires 4000m CPU and 2500Mi Memory and on-demand instances 5 times.
- build-maven-large: requires 4000m CPU and 2000Mi Memory and on-demand instances 5 times.
Here are the results when using cluster autoscaler (with time sped up):
Here it took 5:40 minutes to run all 30 pipelines. The cluster autoscaler needed to spin up 8 t4g.xlarge instances for 17 minutes to shutdown all added nodes. This turns into costing $0.1692 to run all tests.
Now here is the same test using Karpenter:
Here it took 3:56 minutes to run all 30 pipelines. Karpenter needed to spin up 8 t4g.xlarge and 6 t4g.small instances for 6 minutes to shutdown all added nodes. This turns into costing $0.05424 to run all tests. Even though Karpenter spun up 6 extra instances, it was able to use smaller, cheaper instances for most of the tests, and once done, shut the instances down quicker than the cluster autoscaler saving 70% of the cost compared to the cluster autoscaler.
Conclusion
Jenkins has significantly evolved from its early days as a standalone CI/CD tool to become a more robust, scalable, and efficient solution when running in a managed Kubernetes environment like EKS. The Jenkins Operator simplifies Jenkins’ deployment and management in Kubernetes environments and takes advantage of dynamic scaling capabilities.
Integrating Karpenter with Jenkins introduces an innovative approach to resource management, optimizing the use of computing resources and reducing costs. Karpenter’s ability to automatically adjust resource allocation based on workload demands ensures that Jenkins runs more efficiently, providing a seamless CI/CD pipeline that is both cost-effective and time-efficient.