Overview: What is a Volume in Kubernetes?

In Kubernetes, volume abstractions are used to provide an API that abstracts the physical implementation of storage from how it is consumed by application resources. Containers running on Kubernetes do not store the data they create or process. A Volume essentially provides a storage directory that can be used by containers running in PODs to store and share data.

Kubernetes supports two major types of Volumes:

1. Ephemeral Volumes – These are used for applications that need storage but do not need to access the data after a restart. Ephemeral volumes only last the lifetime of their PODs, and are deleted when the POD stops running. Ephemeral volumes are applicable for low latency applications where limited memory size may impact performance. Kubernetes allows various kinds of ephemeral volumes for different uses, including:

1. emptyDir

2. Secrets, ConfigMaps and the downwardAPI

3. CSI Ephemeral Volumes

4. Generic Ephemeral Volumes

2. Persistent Volumes – This is an API object that represents an abstract implementation of physical storage to be used by PODs, but they last beyond a POD’s lifetime. The PV is a portion of physical storage that PODs attach to so they can store their data which is available even after a container restarts.

In this article, we explore persistent volumes in detail, and the purpose these solve within a Kubernetes ecosystem.

Volume Plugins

Kubernetes implemented the Container Storage Interface (CSI) to standardize the creation of third-party plugins for storage implementation. Kubernetes uses these plugins to expose physical storage on nodes to Kubelets running in a cluster’s data plane. This way, Kubernetes abstractions can provision storage resources to PODs and containers. The plugin system enabled by the CSI also allows vendors to add storage systems to Kubernetes without having to modify core Kubernetes code and binaries.

Some of the most popular CSI Plugins for Kubernetes include:

1. AWS Elastic Block Storage

2. Azure disk

3. BeeGFS

4. CephFS

5. Dell EMC PowerMax

6. GCE Persistent Disk

7. Google Cloud Filestore

8. GlusterFS

9. Huawei Storage CSI

10. HyperV CSI

11. IBM Block Storage

12. OpenEBS

13. Portworx

14. Pure Storage CSI

The complete list of supported volume plugins in Kubernetes can be found here.

Persistent Storage in Kubernetes

Once a CSI plugin has been set up and is running in Kubernetes, resources and users can consume volumes using the Kubernetes Storage API objects: Persistent Volumes, Persistent Volume Claims and Storage Classes. This section explores these API objects and their role in providing persistent storage for containers in Kubernetes.

Persistent Volumes

A Persistent Volume is a piece of storage available to the cluster. The PV exposes object, file and block storage systems by capturing the details of its implementation protocol, be it iSCSI (SCSI over Internet), NFS or any storage systems offered by specific vendors and cloud providers. The Persistent Volume has its lifecycle independent of any POD consuming it. This means that a PV persists data to be used by containers throughout the application lifecycle.

A PV is a Kubernetes API object with configurations similar to:

apiVersion: v1
kind: PersistentVolume
metadata:
   name: darwin-pv
   labels:
      type: local
spec:
   capacity: 
      storage: 
   accessModes:
      - ReadWriteOnce 
      hostPath:
         path: "/tmp/data01"

Persistent Volume Claims

When a user requests for PV storage, they use a Persistent Volume Claim as a Kubernetes object that requests specific storage requirements such as access modes and size. A PVC is created by applying a YAML configuration file to the cluster with specifications similar to:

apiVersion: v1
kind: PersistentVolumeClaim
metadata: 
  name: darwin-pvc
spec: 
  accessModes: 
    - ReadWriteOnce
  resources: 
    requests: 
      storage: 200Gi
  storageClassName: darwin-dev-disk

PODs are attached to PVCs by stating it as a volumes spec in the PODs configuration file:

volumes: 
    - 
      name: app-data
      persistentVolumeClaim: 
        claimName: darwin-pvc

Once a POD is bound to a PVC, the PVC attached it to appropriate PVs based on the specified disk size and access modes specified in the configuration file.

Static vs. Dynamic Provisioning

PVs can either be provisioned statically or dynamically. In static PV provisioning, the storage object is first created and configured on the host and is then made available to the cluster. In this case, PODs are attached to a PV that points to a specific portion of this storage object.

If the PV is dynamically provisioned, a storage class object is used to define different storage implementation characteristics pointing to a physical storage system. The storage class object requests for a portion of the storage object then creates a volume matching the specifications in its configuration file. Storage classes allow for the automatic, dynamic allocation of PVs to Kubernetes objects.

The configuration file of a storage class object would look similar to:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: darwin-dev-disk
provisioner: kubernetes.io/glusterfs
parameters:
  resturl: "http://192.168.10.100:8080"
  restuser: ""
  secretNamespace: ""
  secretName: ""
allowVolumeExpansion: true

The Lifecycle of PVs and PVCs

PVCs represent requests for a PV resource. The interaction between the two objects follows the following pattern:

1. Provisioning – This is where the physical storage system is available to be consumed by users of a cluster. Provisioning could be either static or dynamic.

2. Binding – The process of matching a PVC with a suitable PV, then binding them together.

3. Using – The process where PODs consume a volume.

4. Reclaiming – When a user is done consuming a volume, the binding objects are deleted, allowing for reclamation of the storage resource. Some supported reclaim policies include:

a. Retain

b. Delete

c. Recycle

Some PVs can be reserved for specific PVs using a pre-binding procedure. This means the PVC will always be bound to a PV, regardless of whether its PODs are running an application or not.

PVs and PVCs in Monolithic Storage Systems

In traditional storage systems, Kubernetes interfaces with monolithic storage software that virtualizes and aggregates a number of storage devices. These devices could either be SAN storage, bare-metal servers or cloud-based block storage solutions. The software interfaces with CSI plugins that manage storage access using PVs, PVCs and Storage Classes.

Shared Storage Architecture
Traditional Shared Storage Architecture

Container Attached Storage (CAS) and Persistent Volumes

CAS allows organizations to leverage the flexibility and scalability of cloud-native platforms to extend the functionality of volume abstractions. In CAS, the storage solution is deployed as microservices in containers that can be managed by an orchestrator such as Kubernetes. The Data plane of the CAS cluster includes Replica PODs running the containers that provision volumes and allow for storage access. The control plane of a CAS cluster includes the policies, storage controllers and data plane configurations.

Persistent Volume

OpenEBS LocalPV Volumes

OpenEBS enables the provisioning of dynamic PVs for Local Volumes in Kubernetes. A Local Volume is a piece of cluster storage that is only available from a single node, such as a personal computer (PC) or Virtual Machine (VM). Local Volumes are used in applications that can tolerate unavailability when a node is unhealthy, and use local directories, partitions and disks to expose storage resources to a cluster. This makes the plugin best for local needs that need dynamic management and monitoring and for high-performance applications that need to self-manage replication and data security. Some use-cases for local volumes are:

1. Replicated Databases

2. Edge workloads running on single-node clusters

3. Stateful Workloads with their own HA configuration

Summary

Persistent Volumes expose physical storage implementations to Kubernetes clusters so PODs can store and share data. With PVs, data generated and stored by immutable containers can be persisted for use throughout an application’s lifecycle. This article has explored the concepts needed to understand persistent storage for Kubernetes, focusing mainly on PVs and PVCs. Container Attached Storage (CAS) extends the functionality of volumes by relying on microservices and container orchestration. CAS allows for the creation of flexible, granular, and highly available storage infrastructure that is cloud-native.

The OpenEBS LocalPV volume allows for the creation of dynamic PVs for Local Volumes. To explore and understand more OpenEBS features, you could reach out to the OpenEBS Slack Channel and connect with their team of developers and administrators.

This article has already been published on https://blog.mayadata.io/understanding-persistent-volumes-and-pvcs-in-kubernetes and authorised by MayaData for a republish.