Blog

How to set up a bare metal Kubernetes cluster using kubeadm

Introduction

Kubernetes is a very popular container orchestrator used by most of the organisations. It is open source and managed by Cloud Native Computing Foundation. A lot among us might be wanting to know how to install and use a Kubernetes cluster. A typical kubernetes cluster looks like the below diagram.

[Figure 1: Kubernetes Architecture]

Users could deploy single node clusters like Minikube , Kind and Microk8s on their individual machines to practice kubernetes. Katakoda is also another option where somebody could play around kubernetes. These are very easy to set up and play around. If anybody is interested he could check my previous posts here and here to know how to set up Minikube and run spring boot applications on that.

But when it comes to setting up a multi node kubernetes cluster a lot of us are clueless. Kubernetes Playground is a good option where the application allows you to create a multi node cluster and play with it for 4 hours. A lot of the set up is abstract, but someone could get a feel about it. If learners wish for permanent cluster they might have to go for commercial cloud offering like Amazon EKS, VMWare Tanzu, Redhat OpenShift or Google GKE. Most of these offerings provides some trial period or initial free credits to let the developers get acquainted with the platform and start charging after that. For self learning purpose these charges are pretty high.

For such learners I am publishing this article which will help them set up a multi node bare metal kubernetes cluster on their personal computers. We will create a VM for Master Node and one VM for worker node, then we would initialise the cluster, deploy workloads on it and then access the workloads outside of the cluster. This needs to be done in below steps.

Prerequisites

Hardware

For this exercise the user needs to have a computer with at least 100 GB of free memory, 16 GB RAM. The computing resources mentioned above is for 1 Master Node VM (15 GB Memory and 4 GB RAM) and 1 Worker Node VM (15 GB Memory and 4 GB RAM) which is the minimum requirement. The RAM size could be reduced to 3 GB and memory to 10GB. If there would be more Node VMs then more computing resources must be available on the host computer.

Software

The machine must have two software. It must have Oracle Virtual Box installed. Users could refer the installation manual here. We would use Virtual Box as our hypervisor. We would be using Ubuntu images for creating the VMs. User could download the Ubuntu 20.04 image from here and store it on the computer.

Before moving to the next step Virtual Box must be installed and Ubuntu image must be downloaded.

Set Up a Master Node

With this step we would create a Master node VM using Virtual Box. Start the virtual box and press CTRL + N or choose Machine –> New from the menu bar. Enter the VM name as master and Type as Linux with Version as Ubuntu(64-bit). Use the below image for reference.

[Figure 2 : Step 1 of Master Node VM]

In next step choose the RAM size for the VM. It should be minimum of 2048 MB and more. Refer the below image.

[ Figure 3 : Step 2 of Master Node VM]

In next step choose “Create a virtual hard disk now” as displayed in the below image.

[ Figure 4 : Step 3 of Master Node VM]

In next step choose VDI as your hard disk as per the below image.

[Figure 5 : Step 4 of Master Node VM]

In next step choose “Fixed Size” hard disk as mentioned in below image.

[ Figure 6: Step 5 of Master Node VM]

In next step we will allocate memory for the VM. Allocate 15 GB memory for the VM as mentioned in the below image.

[ Figure 7: Step 6 of Master Node VM]

Now the VM would be displayed in the Virtual Box. We need to add the Ubuntu ISO image already downloaded to the VM. Also we need to set up the networking and add CPU.

To add the image click on the VM and choose Settings –> System –> Processor as displayed in the image and change the number of processors to 3.

[ Figure 8: Step 7 of Master Node VM]

Then click on the storage option to set the ubuntu image we have already downloaded. Choose Storage –> Controller : IDE –> Empty and click on the blue CD image next to the IDE Secondary Master and select Choose/Create a Virtual Optical Disk. Select the location of the Ubuntu .iso file which will act as the image for the VM.

[ Figure 9: Step 8 of Master Node VM]

In next step we would set up the networking for the VM. We would set 2 network adapters. The first adapter would be attached to bridged network adapter (to give access between guest and host machines, between guest machines and outside network).

[ Figure 10 : Step 9 of Master Node VM]

The second adapter would be attached to Host-only adapter (to give access between guest and host machines).

[Figure 11: Step 10 of Master Node VM]

Once all these steps are completed, press the OK button. Now the hardware part of the VM is ready. In next step we have to install the ubuntu image in the VM and create user/password. To do so select the VM on VirtualBox and click start button.

On starting it will provide an window showing options Try Ubuntu and Install Ubuntu. We need to choose Install Ubuntu and proceed with the wizard.

[Figure 12: Step 11 of Master Node VM]

Choose Minimal Installation option and select the suitable options during the steps. In one of the steps it will ask you to create username and password for root. Enter vm name, username as master and password also master. You could also choose username and password of your choice. It will take some time to install ubuntu based on the available internet speed. After the installation you need to reboot the system

Next we need to set up few directories, install docker [container engine], kubeadm, kubelet and kubectl [kubernetes client] in the master node. All these we need to do being the super user. You need to open a terminal and execute the below commands one by one to finish the process. Please refer the kubernetes documentation to install the runtime and kubeadm.

 sudo su [Press Enter and provide the root user password]

 mkdir -p /etc/apt/trusted.gpd.d
 touch /etc/apt/trusted.gpd.d/docker.gpg
 
##Install Docker Repository

 sudo apt-get update && sudo apt-get install -y \
 apt-transport-https ca-certificates curl software-properties-common gnupg2

## Install Repository Key
 curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key --keyring /etc/apt/trusted.gpg.d/docker.gpg add -
 
##Install the apt Repository
 sudo add-apt-repository \
 "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
 $(lsb_release -cs) \
 Stable"
 
##Install the Docker Container Engine
 sudo apt-get update && sudo apt-get install -y \
 containerd.io=1.2.13-2 \
 docker-ce=5:19.03.11~3-0~ubuntu-$(lsb_release -cs) \
 docker-ce-cli=5:19.03.11~3-0~ubuntu-$(lsb_release -cs)
 
##Set up Docker Daemon
 cat <<EOF | sudo tee /etc/docker/daemon.json
 {
 "exec-opts": ["native.cgroupdriver=systemd"],
 "log-driver": "json-file",
 "log-opts": {
 "max-size": "100m"
 },
 "storage-driver": "overlay2"
 }
 EOF

 sudo mkdir -p /etc/systemd/system/docker.service.d
 sudo systemctl daemon-reload
 sudo systemctl restart docker
 sudo systemctl enable docker

 echo ‘alias k=kubectl’ >> ~/.bashrc
 echo ‘swapoff -a’ >> ~/.bashrc

Once the steps completed for docker installation, we would proceed for the installation of kubeadm, kubelet and kubectl. The following commands would set things up. All these commands need to be executed as super user.

 sudo su [Enter the root password]

 ##iptable set up
 cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
 net.bridge.bridge-nf-call-ip6tables = 1
 net.bridge.bridge-nf-call-iptables = 1
 EOF

 sudo sysctl –system

## Setting up the tools
 sudo apt-get update && sudo apt-get install -y apt-transport-https curl
 
 curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
 cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
 deb https://apt.kubernetes.io/ kubernetes-xenial main
 EOF

 sudo apt-get update

 sudo apt-get install -y kubelet kubeadm kubectl
 sudo apt-mark hold kubelet kubeadm kubectl

 ##kubelet restart

 systemctl daemon-reload
 systemctl restart kubelet

 ##Install Net tools
 apt install net-tools

Till now we have set up docker, kubeadm, kubelet and kubectl. We need to create few directories which would be required by both the master and worker nodes. These directories and files need to created as super user. Execute the below commands to create them.

 sudo su
 mkdir -p /var/lib/calico
 touch /var/lib/calico/nodename
 mkdir -p /var/run/bird
 touch /var/run/bird/bird.ctl

Now we can stop the master VM and proceed to create the worker node.

Set Up Worker Node(s)

We have already created a master node vm. We will use it create the worker node(s) instead of going through all the process of creating a new VM from scratch.

To clone the master node VM, select the VM called master and press CTRL + O which will open the clone window. You need to enter the name of the vm as node01.

[ Figure 13: Cloning the master vm ]

In next step choose the option as “Hard Clone” and the new node vm would be ready. Start the vm, the user name and password would be same as the master vm. We need to change the hostname of the vm to node01. This we need to do in a terminal being the super user. Open the terminal and execute the below commands in it.

sudo su [Enter root password]
cat node01 > /etc/hostname

The above action would change the hostname of the node from master to node01. After that stop the vm. If you want to create more nodes keep on repeating the above steps of cloning and renaming.

Next we would proceed to the master node vm to create the control plane.

Create the Control Plane

Start the master vm and login. As part of creating the control plane we will create the control plane, initialize the single node cluster. You must deploy a Container Network Interface (CNI) based Pod network add-on so that your Pods can communicate with each other. Cluster DNS (CoreDNS) will not start up before a network is installed. So we will install Calico Pod network so that workloads. First check the ip address of your host machine. For linux machines you need to execute the “ifconfig” command and note down the address which starts with “192.168.x.xxx”. The addresses of the vms will also have the same pattern.

You need to open a terminal and execute below commands as super user to create the kubernetes cluster, control plane and install calico network.

 sudo su [Enter root password]

 ifconfig   [It will give you a list of ip addresses, note down the address which is of the pattern 
 192.168.x.xxx. That is the ip address of the VM]

 kubeadm config images pull
 kubeadm init –apiserver-advertise-address= --pod-network-cidr=192.168.0.0/16 [Copy the last part of the output. That would be used by worker nodes to join the master node]

 mkdir -p $HOME/.kube
 sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
 sudo chown $(id -u):$(id -g) $HOME/.kube/config

 ##Install the pod network calico
 kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
 kubectl get nodes -w [This will provide details about the master node only]

 ## Currently no workload could be scheduled on the master node. We need to remove the taint on the master node to schedule pods on it
 kubectl taint nodes --all node-role.kubernetes.io/master-

The last part of the output of “kubeadm join” command contains something like “kubeadm join <control-plane-host>:<control-plane-port> --token <token> --discovery-token-ca-cert-hash sha256:<hash>”

Copy the same command as we have to use that command in worker node(s) to join the control plane.

Worker Node joins Master

Start the worker node vm and login as master. Open the terminal, login as super user and execute the command copied from the master node vm.

 sudo su [Enter root password]
 
 ##Enter the kubeadm join command copied from the master node vm which would of below pattern
 kubeadm join <control-plane-host>:<control-plane-port> --token <token> --discovery-token-ca-cert-hash sha256:<hash>

After the command execution the worker node would join the control plane and pods could be scheduled on the node. If there are more than one worker node repeat the same process.

Deploy and expose Workload

To verify if the worker node(s) joined the cluster or not execute the below command on the master node as a super user.

 sudo su
 kubectl get nodes -o wide

The output should somewhat like below image.

[Figure 14: Node Status]

If all the node are ready we could create workloads on them. If the status is NotReady we need to wait for few minutes till nodes are Ready. Note down the INTERNAL-IP of both the nodes. We will access pods using the ips. Now we need to execute the below commands to create a namespace, a deployment and a service to expose the deployment and access the application from outside the cluster. Execute the below commands in the terminal.

 
 ##Create namespace called alpha
 kubectl create namespace alpha  
 kubectl config set-context –current –namespace alpha
 
 
 ##Create deployment called pages
 kubectl apply -f https://raw.githubusercontent.com/aditya-bhuyan/kube-ws-configs/master/YAML/probe/log-persistent-volumes.yml
 
 
 ##Create a NodePort service to expose the deployment  
 kubectl expose deploy pages –type=Nodeport –port=8080
 
 
 ##Verify all objects are created
 kubectl get all
 
 
 ##Verify the service 
 kubectl get service pages 

The out put of the last command has a Port column. Note down the port which is between 30000 – 32767 from from the Port column. We can access the service using the url http://<worker-node-internal-ip&gt;:<nodeport>. Open the url in the browser and try playing around the links given in the homepage.

Congratulations ! You have just deployed a multi node Kubernetes Cluster and also deployed a workload on it.

Conclusion

The cluster is fine to play around on the host machine. It has one limitation. The cluster is not accessible outside the host machine.

It however supports enough load and all the features of a full fledged Kubernetes Cluster. The user could upscale or downscale the cluster based on the hardware capacity of the host machine.

Happy playing.

Introduction to eXtreme Programming(XP)

What is XP :-

eXtreme Programming is a software development methodology which belongs to the “Agile Software Development” family. It advocates frequent releases of software involving short development cycles. The intent is to improve productivity, quality and accommodate changing customer requirements by using checkpoints.

Values of XP :-

The five values of XP are communication, simplicity, feedback, courage, and respect and are described in more detail below.

Communication

Software development is inherently a team sport that relies on communication to transfer knowledge from one team member to everyone else on the team. XP stresses the importance of the appropriate kind of communication – face to face discussion with the aid of a white board or other drawing mechanism.

Simplicity

Simplicity means “what is the simplest thing that will work?” The purpose of this is to avoid waste and do only absolutely necessary things such as keep the design of the system as simple as possible so that it is easier to maintain, support, and revise. Simplicity also means address only the requirements that you know about; don’t try to predict the future.

Feedback

Through constant feedback about their previous efforts, teams can identify areas for improvement and revise their practices. Feedback also supports simple design. Your team builds something, gathers feedback on your design and implementation, and then adjust your product going forward.

Courage

Kent Beck defined courage as “effective action in the face of fear” (Extreme Programming Explained P. 20). This definition shows a preference for action based on other principles so that the results aren’t harmful to the team. You need courage to raise organizational issues that reduce your team’s effectiveness. You need courage to stop doing something that doesn’t work and try something else. You need courage to accept and act on feedback, even when it’s difficult to accept.

Respect

The members of your team need to respect each other in order to communicate with each other, provide and accept feedback that honors your relationship, and to work together to identify simple designs and solutions.

Practices in XP :-

XP methodology suggests to use many great software engineering practices across the development cycle. These practices collectively or individually improves software deliveries and quality. These practices also help the development team mark the checkpoints required for delivery. These practices are time tested and worked very well in software development delivery model in the past. These famous 12 practices are described in detail below.

  1. Whole team:
    • The whole team practice is the idea that all the contributors to an XP project sit together in the same location, as members of a single team.
    • XP emphasizes the notion of generalizing specialist, as opposed to role specialists.

2. Planning Games:

XP has two primary planning Activities:

  • Releases are push of new functionality all the way to the production user.
  • Iterations are the short development cycles within a release that scrums call “sprints”

3. Small releases:

Frequent, small releases to a test environment are encouraged in XP, both at the iteration level, to demonstrate progress and increase visibility to the customer, and at the release level, to rapidly deploy working software to the end-users.

4. Customer Test:

The customer describes one or more test criteria that indicate the software is working as intended, then the team builds automated tests to prove themselves and the customer that the software has met those criteria.

5.Collective Code Ownership:

Pair of developers can improve any code it means multiple people will work on all the code, which results in increased visibility and broader knowledge of the code base.

6. Code Standards:

  • Although collective code ownership allowing anyone to amend any code can result in issues if the team members take different approaches to coding.
  • to address this risk, XP teams follow a consistent coding standard so that all the code looks as if it has been written by a single, knowledgeable programmer.

7. Metaphor(Poetically calling things something else):

XP uses it to explain designs and create a shared vision. these descriptions establish comparisons that all the stakeholders can understand to help explain how the system should work.

8. Continuous Integration:

XP employs continuous integration, which means every time a programmer checks in code to the code repository, integration tests are run automatically.

9. Test-Driven Development:

XP often use the practice of test-driven development. In TDD once requirement freezes, developer writes Test cases based on the requirement and creates code to pass the Unit Test. It helps in zero defect deliveries.

The cycle of TDD is:

  1. writetest2.testfails3.thenagainwritecode4.ifthetestpasses5.thenrefactorwritetest2.testfails3.thenagainwritecode4.ifthetestpasses5.thenrefactor

10. Refactoring:

  • Refactoring is the process of improving the design of existing code without altering its external behaviour or adding new functionality.
  • It’s focusing on removing duplicated code, lowering coupling, and increasing cohesion.

11. Simple design:

XP team can develop code quickly and adapt it as necessary. The design is kept appropriate for what the project currently requires. then revisited iteratively and incrementally to ensure it remains appropriate.

12. Pair Programming:

In XP, the production code is written by two developers working as a pair. while one person writes the code, the other developers review the code as it is being written and the two change roles frequently.

Difference between Apache Storm and Apache Spark

Apache Storm: Slowly dying

Apache Spark: Booming

Apache Storm: Real-time stream processing framework.

Apache Spark: Diverse platform, which can handle all the workloads like: batch, interactive, iterative, real-time, graph, etc.

Spark Streaming is the ecosystem component of Spark, which handles real-time stream, let’s compare it with Storm

Feature wise difference between Apache Storm vs Spark Streaming.

These differences will help you know which is better to use between Apache Storm and Spark. Let’s have a look on each feature one by one-

1. Processing Model

  • Storm: It supports true stream processing model through core storm layer.
  • Spark Streaming: Apache Spark Streaming is a wrapper over Spark batch processing.

2. Primitives

  • Storm: It provides a very rich set of primitives to perform tuple level process at intervals of a stream (filters, functions). Aggregations over messages in a stream are possible through group by semantics. It supports left join, right join, inner join (default) across the stream.
  • Spark Streaming: It provides 2 wide varieties of operators. First is Stream transformation operators that transform one DStream into another DStream. Second is output operators that write information to external systems. The previous includes stateless operators (filter, map, mapPartitions, union, distinct than on) still as stateful window operators (countByWindow, reduceByWindow then on).

3. State Management

  • Storm: Core Storm by default doesn’t offer any framework level support to store any intermediate bolt output (the result of user operation) as a state. Hence, any application has to create/update its own state as and once required.
  • Spark Streaming: The underlying Spark by default treats the output of every RDD operation(Transformations and Actions) as an intermediate state. It stores it as RDD. Spark Streaming permits maintaining and changing state via updateStateByKey API. A pluggable method couldn’t be found to implement state within the external system.

4. Message Delivery Guarantees (Handling message level failures)

  • Storm: It supports 3 message processing guarantees: at least onceat-most-once and exactly once. Storm’s reliability mechanisms are distributed, scalable, and fault-tolerant.
  • Spark Streaming: Apache Spark Streaming defines its fault tolerance semantics, the guarantees provided by the recipient and output operators. As per the Apache Spark architecture, the incoming data is read and replicated in different Spark executor’s nodes. This generates failure scenarios data received but may not be reflected. It handles fault tolerance differently in the case of worker failure and driver failure.

5. Fault Tolerance (Handling process/node level failures)

  • Storm: Storm is intended with fault-tolerance at its core. Storm daemons (Nimbus and Supervisor) are made to be fail-fast (that means that method self-destructs whenever any sudden scenario is encountered) and stateless (all state is unbroken in Zookeeper or on disk).
  • Spark Streaming: The Driver Node (an equivalent of JT) is SPOF. If driver node fails, then all executors will be lost with their received and replicated in-memory information. Hence, Spark Streaming uses data checkpointing to get over from driver failure.

6. Debuggability and Monitoring

  • Storm: Apache Storm UI support image of every topology; with the entire break-up of internal spouts and bolts. UI additionally contributes information having any errors coming in tasks and fine-grained stats on the throughput and latency of every part of the running topology. It helps in debugging problems at a high level. Metric based monitoring: Storm’s inbuilt metrics feature supports framework level for applications to emit any metrics, which can then be simply integrated with external metrics/monitoring systems.
  • Spark Streaming: Spark web UI displays an extra Streaming tab that shows statistics of running receivers (whether receivers are active, the variety of records received, receiver error, and so on.) and completed batches (batch process times, queuing delays, and so on). It is useful to observe the execution of the application. The following 2 info in Spark web UI are significantly necessary for standardization of batch size:
  1. Processing Time – The time to process every batch of data.
  2. Scheduling Delay – The time a batch stays in a queue for the process previous batches to complete.

7. Auto Scaling

  • Storm: It provides configuring initial parallelism at various levels per topology – variety of worker processes, executors, tasks. Additionally, it supports dynamic rebalancing, that permits to increase or reduces the number of worker processes and executors w/o being needed to restart the cluster or the topology. But, many initial tasks designed stay constant throughout the life of topology.
    Once all supervisor nodes are fully saturated with worker processes, and there’s a need to scale out, one merely has to begin a replacement supervisor node and inform it to cluster wide Zookeeper.
    It is possible to transform the logic of monitor the present resource consumption on every node in a very Storm cluster, and dynamically add a lot of resources. STORM-594 describes such auto-scaling mechanism employing a feedback system.
  • Spark Streaming: The community is currently developing on dynamic scaling to streaming applications. At the instant, elastic scaling of Spark streaming applications isn’t supported.
    Essentially, dynamic allocation isn’t meant to be used in Spark streaming at the instant (1.4 or earlier). the reason is that presently the receiving topology is static. the number of receivers is fixed. One receiver is allotted with every DStream instantiated and it’ll use one core within the cluster. Once the StreamingContext is started, this topology cannot be modified. Killing receivers leads to stopping the topology.

8. Yarn Integration

  • Storm: The Storm integration alongside YARN is recommended through Apache Slider. A slider is a YARN application that deploys non-YARN distributed applications over a YARN cluster. It interacts with YARN RM to spawn containersfor distributed application then manages the lifecycle of these containers. Slider provides out-of-the-box application packages for Storm.
  • Spark Streaming: Spark framework provides native integration along with YARN. Spark streaming as a layer above Spark merely leverages the integration. Every Spark streaming application gets reproduced as an individual Yarn application. The ApplicationMaster container runs the Spark driver and initializes the SparkContext. Every executor and receiver run in containers managed by ApplicationMaster. The ApplicationMaster then periodically submits one job per micro-batch on the YARN containers.

9. Isolation

  • Storm: Each employee process runs executors for a particular topology. That’s mixing of various topology tasks isn’t allowed at worker process level which supports topology level runtime isolation. Further, every executor thread runs one or more tasks of an identical element (spout or bolt), that’s no admixture of tasks across elements.
  • Spark Streaming: Spark application is a different application run on YARN cluster, wherever every executor runs in a different YARN container. Thus, JVM level isolation is provided by Yarn since 2 totally different topologies can’t execute in same JVM. Besides, YARN provides resource level isolation so that container level resource constraints (CPU, memory limits) can be organized.

11. Open Source Apache Community

  • Storm: Apache Storm powered-by page healthy list of corporations that are running Storm in production for many use-cases. Many of them are large-scale web deployments that are pushing the boundaries for performance and scale. For instance, Yahoo reading consists of two, 300 nodes running Storm for near-real-time event process, with the largest topology spanning across four hundred nodes.
  • Spark Streaming: Apache Spark streaming remains rising and has restricted expertise in production clusters. But, the general umbrella Apache Spark community is well one in all the biggest and thus the most active open supply communities out there nowadays. The general charter is space evolving given the massive developer base. this could cause maturity of Spark Streaming within the close to future.

12. Ease of development

  • Storm: It provides extremely easy, rich and intuitive APIs that simply describe the DAG nature of process flow (topology). The Storm tuples, which give the abstraction of data flowing between nodes within the DAG, are dynamically written. The motivation there’s to change the APIs for simple use. Any new custom tuple can be plugged in once registering its Kryo serializer. Developers will begin with writing topologies and run them in native cluster mode. In local mode, threads are used to simulate worker nodes, permitting the developer to set breakpoints, halt the execution, examine variables, and profile before deploying it to a distributed cluster wherever all this is often way tougher.
  • Spark Streaming: It offers Scala and Java APIs that have a lot of a practical programming (transformation of data). As a result, the topology code is way a lot of elliptic. There’s an upscale set of API documentation and illustrative samples on the market for the developer.

13. Ease of Operability

  • Storm: It is little tricky to deploy/install Storm through many tools (puppets, and then on ) and deploys the cluster. Apache Storm contains a dependency on Zookeeper cluster. So that it can meet coordination over clusters, store state and statistics. It implements CLI support to install actions like submit, activate, deactivate, list, kill topology. a powerful fault tolerance suggests that any daemon period of time doesn’t impact executing topology.
    In standalone mode, Storm daemons are compel to run in supervised mode. InYARN cluster mode, Storm daemons emerged as containers and driven by Application Master (Slider).
  • Spark Streaming: It uses Spark as the fundamental execution framework. It should be easy to feed up Spark cluster on YARN. There are many deployment requirements. Usually we enable checkpointing for fault tolerance of application driver. This could bring a dependency on fault-tolerant storage (HDFS).

What is Amazon VPC

Amazon VPC is the networking layer for Amazon EC2. If you’re new to Amazon EC2

Amazon Virtual Private Cloud (Amazon VPC) enables you to launch AWS resources into a virtual network that you’ve defined. This virtual network closely resembles a traditional network that you’d operate in your own data center, with the benefits of using the scalable infrastructure of AWS.

VPCs and Subnets

virtual private cloud (VPC) is a virtual network dedicated to your AWS account. It is logically isolated from other virtual networks in the AWS Cloud. You can launch your AWS resources, such as Amazon EC2 instances, into your VPC. You can specify an IP address range for the VPC, add subnets, associate security groups, and configure route tables.

To protect the AWS resources in each subnet, you can use multiple layers of security, including security groups and network access control lists (ACL).

subnet is a range of IP addresses in your VPC. You can launch AWS resources into a specified subnet. Use a public subnet for resources that must be connected to the internet, and a private subnet for resources that won’t be connected to the internet.

The original release of Amazon EC2 supported a single, flat network that’s shared with other customers called the EC2-Classic platform. Earlier AWS accounts still support this platform, and can launch instances into either EC2-Classic or a VPC. Accounts created after 2013-12-04 support EC2-VPC only.

By launching your instances into a VPC instead of EC2-Classic, you gain the ability to:

  • Assign static private IPv4 addresses to your instances that persist across starts and stops
  • Optionally associate an IPv6 CIDR block to your VPC and assign IPv6 addresses to your instances
  • Control the outbound traffic from your instances (egress filtering) in addition to controlling the inbound traffic to them (ingress filtering)
  • Assign multiple IP addresses to your instances
  • Define network interfaces, and attach one or more network interfaces to your instances
  • Change security group membership for your instances while they’re running
  • Add an additional layer of access control to your instances in the form of network access control lists (ACL)
  • Run your instances on single-tenant hardware

Accessing the Internet

You control how the instances that you launch into a VPC access resources outside the VPC.

Your default VPC includes an internet gateway, and each default subnet is a public subnet. Each instance that you launch into a default subnet has a private IPv4 address and a public IPv4 address. These instances can communicate with the internet through the internet gateway. An internet gateway enables your instances to connect to the internet through the Amazon EC2 network edge.

By default, each instance that you launch into a non default subnet has a private IPv4 address, but no public IPv4 address, unless you specifically assign one at launch, or you modify the subnet’s public IP address attribute. These instances can communicate with each other, but can’t access the internet.

Setting Up Ingress in Minikube for TCP and UDP

Introduction

Ingress is an addon available in minikube. It helps developers route traffic from their host (Laptop, Desktop, etc) to a Kubernetes service running inside their minikube cluster. The ingress addon uses the ingress nginx controller which by default is only configured to listen on ports 80 and 443. By default Ingress doesn’t support TCP and UDP. However we could enable ingress to listen to ports used by TCP and UDP.

Prerequisites

Before starting the tutorial it is assumed that the reader is already aware about Kubernetes, Minikube and Configmaps. To use this tutorial we need the below software/tools.

  • Minikube
  • Telnet
  • kubectl
  • A Text Editor

Steps to achieve

  1. Enable Ingress Addon in Minikube cluster
  2. Update TCP and UDP ConfigMaps
  3. Create a redis deployment inside Minikube cluster
  4. Create a redis service for the redis deployment
  5. Allow traffic to redis from outside

Enable Ingress

To enable ingress you need to execute the following command.

minikube addons enable ingress

Update TCP/UDP ConfigMaps

As mentioned earlier Ingress doesn’t support TCP/UDP by default. We need to edit the existing configmaps for our purpose. Below are the examples of configmap content for TCP and UDP.

apiVersion: v1
kind: ConfigMap
metadata:
  name: tcp-services
  namespace: ingress-nginx
apiVersion: v1
kind: ConfigMap
metadata:
  name: udp-services
  namespace: ingress-nginx

Create a redis deployment

To create a redis deployment we first need a .yaml file to provide the deployment details.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-deployment
  namespace: default
  labels:
    app: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - image: redis
        imagePullPolicy: Always
        name: redis
        ports:
        - containerPort: 6379
          protocol: TCP

Create a file redis-deployment.yaml and paste the contents above. Then install the redis deployment with the following command:

kubectl apply -f redis-deployment.yaml

The above command would install the redis deployment. Then we need to expose the deployment as a service so that it could be consumed.

Create Redis Service from the deployment

We need to create file called redis-service.yaml with the below content.

apiVersion: v1
kind: Service
metadata:
  name: redis-service
  namespace: default
spec:
  selector:
    app: redis
  type: ClusterIP
  ports:
    - name: tcp-port
      port: 6379
      targetPort: 6379
      protocol: TCP

Once the file is ready,install the redis service with the following command:

kubectl apply -f redis-service.yaml

Allow traffic to redis deployment

To allow traffic to the redis service from outside the pod we need to patch the tcp service in the kube-system and then patch the ingress-ngnix-controller to kube-systemWe could redo the same steps for UDP.

To add a TCP service to the nginx ingress controller you can run the following command:

kubectl patch configmap tcp-services -n kube-system --patch '{"data":{"6379":"default/redis-service:6379"}}'

Where:

  • 6379 : the port your service should listen to from outside the minikube virtual machine
  • default : the namespace that your service is installed in
  • redis-service : the name of the service

We can verify that our resource was patched with the following command:

kubectl get configmap tcp-services -n kube-system -o yaml

We should see something like this as output of the above command:

apiVersion: v1
data:
  "6379": default/redis-service:6379
kind: ConfigMap
metadata:
  creationTimestamp: "2019-10-01T16:19:57Z"
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
  name: tcp-services
  namespace: kube-system
  resourceVersion: "2857"
  selfLink: /api/v1/namespaces/kube-system/configmaps/tcp-services
  uid: 4f7fac22-e467-11e9-b543-080027057910

The only value you need to validate is that there is a value under the data property that looks like this:

  "6379": default/redis-service:6379

Patch the ingress-nginx-controller

There is one final step that must be done in order to obtain connectivity from the outside cluster. We need to patch our nginx controller so that it is listening on port 6379 and can route traffic to your service. To do this we need to create a patch file.

nginx-ingress-controller-patch.yaml

spec:
  template:
    spec:
      containers:
      - name: nginx-ingress-controller
        ports:
         - containerPort: 6379
           hostPort: 6379

Create a file called nginx-ingress-controller-patch.yaml and paste the contents above.

Next apply the changes with the following command:

kubectl patch deployment nginx-ingress-controller --patch "$(cat nginx-ingress-controller-patch.yaml)" -n kube-system

Test your connection

Test that you can reach your service with telnet via the following command:

telnet $(minikube ip) 6379

You should see the following output:

Trying 192.168.99.179...
Connected to 192.168.99.179.
Escape character is '^]'

To exit telnet enter the Ctrl key and ] at the same time. Then type quit and press enter.

Summary

These are the steps required to set up one redis service in a minikube cluster and it could listen outside traffic at port 6379. The same steps could be repeated for UDP.

Further Readings:

  1. Ingress Setup for TCP/UDP