Kubernetes is a useful tool for deploying and managing containerized applications. Applications deployed on Kubernetes are often stateless, meaning that the applications do not rely on any previously saved data in the cluster to function properly. However, there are scenarios where you need to deploy applications that collect and store data that must be always available, even when you terminate the pod. This is where Kubernetes’ persistent volumes come in.
Kubernetes persistent volumes provide persistent storage for your containerized applications: even after restarting, the application pod will still have access to the previously stored data. One of the most important functionalities of persistent volume is providing storage beyond the lifecycle of a pod. Some use cases for persistent volumes include providing storage for database applications, storage beyond the regular pod lifecycle, persistent storage for storing application logs, providing storage for storing user-generated files in content management applications, and so on. In this article, you will learn about Kubernetes persistent volumes, including some relevant use cases and how they can be effectively set up and used.
What are Persistent Volumes?
Persistent volume (PV) is a piece of storage provided by an administrator in a Kubernetes cluster. When a developer needs persistent storage for an application in the cluster, they request that storage by creating a persistent volume claim (PVC) and then mounting the volume to a path in the pod. Once that is done, the pod claims any volume that matches its requirements (such as size, access mode, and so on). An administrator can create multiple PVs with different capacities and configurations. It is up to the developer to provide a PVC for storage, and then Kubernetes matches a suitable PV with the PVC. If there is no PV to match the PVC, the <terminal inline>StorageClass<terminal inline> dynamically creates a PV and binds it to the PVC. We will go into more detail about <terminal inline>StorageClass<terminal inline> later in this article.
It is important to note that Kubernetes does not restrict PVs to a namespace, which means that a pod in any namespace can claim a PV for storage.
Persistent Volumes Use Cases
The common use cases for persistent volumes are providing storage for database applications and storage beyond the regular lifecycle of a pod.
Providing Storage for Database Applications
Database applications require persistent storage that can last beyond their lifecycle. A database application can collect and store millions of data from different sources while running. It becomes a huge problem if the data no longer exists when you restart or close the database application. Therefore, to host database applications in a Kubernetes cluster, you need to configure its pod to use PV so that the data is still available, even when the pod is no longer active.
Storage Beyond Regular Pod Lifecycle
Apart from database applications, several other types of applications require long-term storage. For example, applications that store error logs for future analysis require the logs to be available for an extended period, even if you terminate or replace the pod.
Understanding Persistent Volumes (PV)
Below is an example of a <terminal inline>PersistentVolume<terminal inline> YAML file used for creating persistent volume storage:
You can only create a PV resource declaratively. Kubernetes doesn’t provide an imperative way of creating PV.
Now, let’s look at the specifications of this YAML file.
When creating a PV, you indicate its storage size in the <terminal inline>Capacity<terminal inline> attribute. In the example above, you are creating a PV of 10 gibibytes.
There are currently four access modes for PVs in Kubernetes:
- <terminal inline>ReadWriteOnce<terminal inline>: This allows only a single node to access the volume in read-write mode. Furthermore, all pods in that single node can read and write to such volumes.
- <terminal inline>ReadWriteMany<terminal inline>: Multiple nodes can read and write to the volume.
- <terminal inline>ReadOnlyMany<terminal inline>: This means that the volume will be in a read-only mode and accessible by multiple nodes.
- <terminal inline>ReadWriteOncePod<terminal inline>: Only a single pod can gain access to the volume.
However, not all storage providers support the four access modes, so the available mode will vary. Check out the list of storage providers and the access modes they support here.
Type of PV
Next, specify the PV type you want to use. Several storage types are provided as Kubernetes plug-ins, and you can check them out here. The YAML file above uses the <terminal inline>hostPath<terminal inline> with a path of where it should read and write data.
The storageClassName is the name of the storage class that will bind the PV to the user’s PVC. When a developer needs storage, they request it by creating a PVC.
What are StorageClasses?
StorageClass is used to define the storage classes offered in the Kubernetes cluster. It abstracts the underlying storage provider.
For instance, you have an AWS Elastic Block Storage of 100Gi that you want to make available in the cluster. Think of <terminal inline>StorageClass<terminal inline> as the link that makes the Elastic Block Storage available in your cluster. However, you wouldn’t want applications in your cluster to gain access to all the storage in your Elastic Block Storage. Therefore, you create PVs to designate pieces of storage for applications that need it.
StorageClass is also useful for creating dynamic PVs. In another scenario, a pod requires 10Mi storage capacity, and you create a PVC for it, which you associate with a StorageClass. If Kubernetes cannot find a PV to match your PVC, the StorageClass automatically creates a PV for the PVC.
Most cloud providers’ managed Kubernetes services supply a default storage class when you set up a Kubernetes cluster. To check the default/available storage class in your cluster, run the command below.
The following is an example of a <terminal inline>StorageClass<terminal inline> manifest file used in creating a Storage Class:
We can better understand the above manifest file by explaining some of its keys.
The provisioner determines the volume plug-in used by the <terminal inline>storageClass<terminal inline>. Several plug-ins such as AWS EBS and GCE PD are available for different storage providers.
Parameters contain available configurations accepted by the provisioner.
The <terminal inline>reclaimPolicy<terminal inline> value can either be <terminal inline>Retain<terminal inline> or <terminal inline>Delete<terminal inline>. The StorageClass uses it when creating a dynamic PV. The <terminal inline>Retain<terminal inline> reclaim policy allows for manual reclamation of the PV resource after the PVC has been deleted. However, the PV cannot be used until the administrator deletes the data in it. The <terminal inline>Delete<terminal inline> reclaim policy removes the PV and the associated external storage asset once its PVC is deleted.
Allow Volume Expansion
When you set the <terminal inline>allowVolumeExpansion<terminal inline> value to <terminal inline>true<terminal inline>, the Storage Class can expand the PVs attached to it. To expand a PV, edit the configuration of the PVC` to the new capacity you need.
It is important to note that <terminal inline>AllowVolumeExpansion<terminal inline> is only used for volume expansion and not for shrinking.
The value specified in the <terminal inline>MountOptions<terminal inline> key will be used when creating dynamic PVs.
Volume Binding Mode
This mode controls when dynamic provisioning of PVs and <terminal inline>Volume Binding<terminal inline> should occur. This mode is explained in more detail in the Lifecycle of Persistent Volume and Persistent Volume Claim section of this article, and further information about it can be found here.
Persistent Volume Claims
Persistent volume claim is a request for storage usage by a Kubernetes developer.
Here is an example of a persistent volume claim manifest file:
Access Modes, Volume Mode and Resources
<terminal inline>accessModes<terminal inline>, <terminal inline>volumeMode<terminal inline>, and <terminal inline>resources<terminal inline> follow the same convention in persistent volumes.
Storage Class Name
You use this key to specify the <terminal inline>storageClass<terminal inline> you want to use for storage. When you apply the manifest file, the PV that matches the storage and uses the same <terminal inline>storageClassName<terminal inline> will be bound. The Storage Class defined in the <terminal inline>storageClassName<terminal inline> will dynamically provision a PV if a matching PV cannot be found.
Lifecycle of Persistent Volume and Persistent Volume Claim
Now that you have a deeper understanding of persistent volumes and persistent volume claims, we can look at the interaction between PVs and PVCs
There are two ways of provisioning PVs in a Kubernetes cluster, static and dynamic.
Statically provisioned volumes is the process where cluster administrators create several persistent volumes for consumption, which carry the details of the storage to be used in the cluster using a PV manifest file.
Dynamically provisioned volumes are dynamically created when a StorageClass provisions a dynamic PV for the PVC because none of the PVs created by the cluster administrator match the PVC requirements.
When you create a PVC, you specify the amount of storage your application needs. There is a control loop in the master plane that watches for new PVCs created. Once this loop detects a new PVC, it automatically finds a PV that matches the PVC requests and binds the two. However, if an appropriate PV for the PVC does not exist, the StorageClass dynamically creates the PV for the PVC.
In some scenarios, a PVC can remain indefinitely unbound because a matching PV does not exist or the associated StorageClass cannot create the PV.
You configure your application pod to use the PVC as a volume. Once you deploy the pod, Kubernetes looks for the PV associated with the PVC and mounts it to the pod. Once the claim is bound, the PV belongs to you as long as you need it, and no other developer in the cluster can use it.
Storage Object in Use Protection
The cluster uses this protection to ensure that the system does not delete PVs and PVCs currently being used by a pod. For instance, if you accidentally delete a PVC while a pod is still using it, the system won’t remove it until it’s no longer in use by the pod. Likewise, if a cluster administrator deletes a PV that is already bound to a PVC, the system won’t delete the PV until Kubernetes unbinds the PVC from the PV.
Once you are done with a PV, you can free it up for other developers in the cluster to use by deleting the PVC object. The reclaim policy defined in the PV informs the cluster of what to do after Kubernetes unbinds it from a PVC. The retain policy attribute can have one of the following values: <terminal inline>Retained<terminal inline>, <terminal inline>Recycled<terminal inline>, or <terminal inline>Deleted<terminal inline>.
Expanding Persistent Volumes Claims
There are scenarios where your application might require a larger volume, especially when it already exceeds the capacity limit. To increase the storage, edit the PVC object and specify a larger capacity than you need.
It is important to note that you shouldn’t directly edit the capacity of the PV, but rather the PVC. Furthermore, if you edit both the capacity of the PV and PVC to have the same size, the Kubernetes control plane will assume that the backing volume size has been manually increased and that it doesn’t need to resize it.
Bringing It Together
Let’s create a demo Nginx application bringing together all the concepts you have learned in this article. For this demo, you will be using MiniKube to create a local Kubernetes cluster on your computer.
Minikube also comes with a default StorageClass after installation, so you will be using that StorageClass for this demo. To check the StorageClass, run the command below:
The MiniKube uses the virtual node’s filesystem for storage. Now, you will create a dummy file in the virtual node by adding <terminal inline>ssh<terminal inline>. Run the following command:
Once the file is created, you can confirm it is created by running the command below:
Exit the MiniKube node by running the <terminal inline>exit<terminal inline> command.
Then create a file called <terminal inline>nginx-pv.yaml<terminal inline> and paste the following:
Then run the following command to create the objects:
You can check them by running the following command:
Now execute into the Nginx pod by running the command below:
Run the following commands:
This sequence of commands will output the content (Hello from ContainIQ testing Persistent Volume) you saved in the MiniKube virtual node.
Kubernetes persistent volumes provide storage for applications deployed in a Kubernetes cluster to store data for a very long time. This article introduced you to the concept of PV and its relationship to PVCs and StorageClasses. Furthermore, you learned about some PV use cases and how to create and configure PVs, PVCs, and StorageClasses.
The summary of this piece is that it’s important to make persistent storage available in the cluster, the cluster administrator must create classes of storage that link to an external or internal storage system using StorageClasses, then the administrator makes this storage available to cluster developers by creating chunks of storage of different capacity, access modes, etc., for the developers using PV. However, in order for a developer to use a PV, the developer must claim it by creating a PVC. In scenarios where no available PV matches the PVC, the StorageClass creates a dynamic PV.