Embedded etcd
This feature is only available for the following:
- Host Nodes
- Private Nodes
This feature is an Enterprise feature. See our pricing plans or contact our sales team for more information.
An issue exists when upgrading etcd (from version 3.5.1 or later, but earlier than 3.5.20) to version 3.6. This upgrade path can lead to a failed upgrade and cause the virtual cluster to break. etcd version 3.5.20 includes a fix that migrates membership data to the v3 data store. This migration prevents the issue when upgrading to version 3.6.
To avoid this issue, vCluster does not upgrade etcd to version 3.6 until vCluster version 0.29.0.
Any vCluster running a version earlier than 0.24.2, must first be upgraded to a version between 0.24.2 and 0.28.x, before upgrading to version 0.29.0.
For more information, see the official etcd documentation.
When using this backing store option, etcd is deployed as part of the vCluster control plane pod to reduce the overall footprint.
controlPlane:
backingStore:
etcd:
embedded:
enabled: true
How embedded etcd works​
Embedded etcd starts the etcd binary with the Kubernetes control plane inside the vCluster pod. This enables vCluster to run in high availability (HA) scenarios without requiring a separate StatefulSet or Deployment.
vCluster fully manages embedded etcd and provides these capabilities:
- Dynamic scaling: Scales the etcd cluster up or down based on vCluster replica count.
- Automatic recovery: Recovers etcd in failure scenarios such as corrupted members.
- Seamless migration: Migrates from SQLite or deployed etcd to embedded etcd automatically.
- Simplified deployment: Requires no additional
StatefulSetsorDeployments.
Scaling behavior​
vCluster dynamically builds the etcd cluster based on the number of desired replicas. For example, when you scale vCluster from 1 to 3 replicas, vCluster automatically adds the new replicas as members to the existing single-member cluster. Similarly, vCluster removes etcd members when you scale down the cluster.
When scaling down breaks quorum (such as scaling from 3 to 1 replicas), vCluster rebuilds the etcd cluster without data loss or interruption. This enables dynamic scaling up and down of vCluster.
Disaster recovery​
When embedded etcd encounters failures, vCluster provides both automatic and manual recovery options to restore cluster capabilities.
Automatic recovery​
vCluster recovers the etcd cluster automatically in most failure scenarios by removing and readding the failing member. Automatic recovery occurs in these cases:
- Unresponsive member: Etcd member is unresponsive for more than 2 minutes.
- Detected issues: Corruption or another alarm is detected on the etcd member.
vCluster attempts to recover only a single replica at a time. If recovering an etcd member results in quorum loss, vCluster does not recover the member automatically.
Manual recovery​
Recover a single replica​
When a single etcd replica fails, vCluster can recover the replica automatically in most cases, including:
- Replica database corruption
- Replica database deletion
- Replica
PersistentVolumeClaim(PVC) deletion - Replica removal from etcd cluster using
etcdctl member remove ID - Replica stuck as a learner
If vCluster cannot recover the single replica automatically, wait at least 10 minutes before deleting the replica pod and PVC. This action causes vCluster to rejoin the etcd member.
Recover the entire cluster​
In rare cases, the entire etcd cluster requires manual recovery. This occurs when the majority of etcd member replicas become corrupted or deleted simultaneously (such as 2 of 3, 3 of 5, or 4 of 7 replicas). In this scenario, etcd fails to start and vCluster cannot recover automatically.
Normal pod restarts or terminations do not require manual recovery. These events trigger automatic leader election within the etcd cluster.
Recovery procedures depend on whether the first replica (the pod ending with -0) is among the failing replicas.
The recovery procedure for the first replica also depends on your StatefulSet's podManagementPolicy configuration (Parallel or OrderedReady). See the first replica recovery section for details on migrating between policies if needed.
If using VirtualClusterInstance (platform), the vCluster StatefulSet runs in a different namespace than the VirtualClusterInstance itself. Find the StatefulSet namespace with:
kubectl get virtualclusterinstance <instance-name> -n <vci-namespace> -o jsonpath='{.spec.clusterRef.namespace}'
For example, if your VirtualClusterInstance is named my-vcluster in the p-default namespace, the StatefulSet might be in vcluster-my-vcluster-p-default.
If using Helm, the namespace is what you specified during installation (e.g., vcluster-my-team).
Use the following procedures when some replicas are still functioning:
- First replica is not failing
- First replica is failing
Scale the StatefulSet to one replica:
kubectl scale statefulset my-vcluster --replicas=1 -n vcluster-my-teamVerify only one pod is running:
kubectl get pods -l app=vcluster -n vcluster-my-teamMonitor the rebuild process:
kubectl logs -f my-vcluster-0 -n vcluster-my-teamWatch for log messages indicating etcd is ready and the cluster is in good condition.
Scale back up to your target replica count:
Modify the following with your specific values to generate a copyable command:kubectl scale statefulset my-vcluster --replicas=3 -n vcluster-my-teamVerify all replicas are running:
kubectl get pods -l app=vcluster -n vcluster-my-team
kubectl logs my-vcluster-0 -n vcluster-my-team | grep "cluster is ready"
Before attempting any recovery procedure, create a backup of your virtual cluster using vcluster snapshot create --include-volumes. This ensures both the virtual cluster's etcd data and persistent volumes are backed up.
If the virtual cluster's etcd is in a bad state and the snapshot command fails, you can still back up from the host cluster (which has its own functioning etcd). Use your preferred backup solution (e.g., Velero, Kasten, or cloud-native backup tools) to back up the host cluster namespace containing the vCluster resources. Ensure the backup includes:
- All Kubernetes resources in the vCluster namespace (StatefulSet, Services, etc.)
- PersistentVolumeClaims and their associated volume data (contains the virtual cluster's etcd data)
- Secrets and ConfigMaps
When restored, the vCluster pods will restart and the virtual cluster will be recreated from the backed-up etcd data.
If using namespace syncing, back up all synced namespaces on the host cluster as well.
The recovery procedure depends on your StatefulSet podManagementPolicy configuration. vCluster version 0.20 and later use Parallel by default. Earlier versions used OrderedReady.
If more than one pod is down with podManagementPolicy: OrderedReady, you must first migrate to Parallel before attempting recovery.
Check your configuration:
kubectl get statefulset my-vcluster -n vcluster-my-team -o jsonpath='{.spec.podManagementPolicy}'
- Parallel (default)
- OrderedReady (legacy)
First, identify the PVC for replica-0:
kubectl get pvc -l app=vcluster -n vcluster-my-teamThe PVC name typically follows the pattern
data-<vcluster-name>-0but may vary if customized in your configuration. Note the exact name from the output above, then delete the corrupted pod and its PVC:Modify the following with your specific values to generate a copyable command:kubectl delete pod my-vcluster-0 -n vcluster-my-team
kubectl delete pvc data-my-vcluster-0 -n vcluster-my-teamThe pod restarts with a new empty PVC. The initial attempts fail because the new member tries to join the existing etcd cluster but lacks the required data. After 1-3 pod restarts, vCluster's automatic recovery detects the empty member and properly adds it as a new learner, allowing it to sync data from healthy members and join the cluster.
Monitor the recovery process:
kubectl get pods -l app=vcluster -n vcluster-my-team -wCheck the logs to verify the pod rejoins successfully:
kubectl logs -f my-vcluster-0 -n vcluster-my-team
If more than one pod is down with podManagementPolicy: OrderedReady, migrate to Parallel first before attempting recovery.
Check that the StatefulSet retains PVCs on deletion:
kubectl get statefulset my-vcluster -n vcluster-my-team -o jsonpath='{.spec.persistentVolumeClaimRetentionPolicy}'The policy should be
Retain. This is the default but can be overridden bycontrolPlane.statefulSet.persistence.volumeClaim.retentionPolicyin your configuration.Delete the StatefulSet without deleting the pods:
kubectl delete statefulset my-vcluster -n vcluster-my-team --cascade=orphanUpdate your virtual cluster configuration to use
Parallelpod management policy.If using a VirtualClusterInstance, edit the instance and update the
podManagementPolicy:kubectl edit virtualclusterinstance my-vcluster -n vcluster-my-teamThen add or update this section in the spec:
spec:
template:
helmRelease:
values: |
controlPlane:
statefulSet:
scheduling:
podManagementPolicy: ParallelIf using Helm, update your
values.yamlto set the pod management policy:values.yamlcontrolPlane:
statefulSet:
scheduling:
podManagementPolicy: ParallelThen apply the update:
helm upgrade my-vcluster vcluster --repo https://charts.loft.sh --namespace vcluster-my-team --reuse-values -f values.yamlThe StatefulSet is recreated with
Parallelpolicy and pods pick up the existing PVCs.Now follow the same procedure as for
Parallelmode.First, identify the PVC for replica-0:
kubectl get pvc -l app=vcluster -n vcluster-my-teamThe PVC name typically follows the pattern
data-<vcluster-name>-0but may vary if customized in your configuration. Note the exact name from the output above, then delete the corrupted pod and its PVC:Modify the following with your specific values to generate a copyable command:kubectl delete pod my-vcluster-0 -n vcluster-my-team
kubectl delete pvc data-my-vcluster-0 -n vcluster-my-teamThe pod restarts with a new empty PVC. The initial attempts fail because the new member tries to join the existing etcd cluster but lacks the required data. After 1-3 pod restarts, vCluster's automatic recovery detects the empty member and properly adds it as a new learner, allowing it to sync data from healthy members and join the cluster.
Never clone PVCs from other replicas. Cloning PVCs causes etcd member ID conflicts and results in data loss.
Complete data loss recovery​
This recovery method results in data loss up to the last backup point. Only proceed if you have verified that all etcd replicas are corrupted and no working replicas remain.
When the majority of etcd member replicas become corrupted or deleted simultaneously, the entire cluster requires recovery from backup.
Before starting recovery, ensure you have:
- Created a snapshot using
vcluster snapshot create <vcluster-name> --include-volumes <storage-location> - The snapshot location URL (for example,
s3://my-bucket/backuporoci://registry/repo:tag) - Access to the host cluster namespace where the vCluster is deployed
For detailed snapshot creation instructions, see Create snapshots.
Verify all PVCs are corrupted or inaccessible:
kubectl get pvc -l app=vcluster -n vcluster-my-teamModify the following with your specific values to generate a copyable command:kubectl describe pvc data-my-vcluster-0 data-my-vcluster-1 data-my-vcluster-2 -n vcluster-my-teamStop all vCluster instances before beginning recovery:
kubectl scale statefulset my-vcluster --replicas=0 -n vcluster-my-teamVerify all pods have terminated:
kubectl get pods -l app=vcluster -n vcluster-my-team- PVC deletion timing
After scaling down, wait a few seconds to ensure pods have fully terminated before deleting PVCs. If a pod restarts immediately after PVC deletion, the PVC may get stuck in a "Terminating" state. If this happens, delete the pod again to allow the PVC deletion to complete.
Delete all corrupted PVCs:
Modify the following with your specific values to generate a copyable command:kubectl delete pvc data-my-vcluster-0 data-my-vcluster-1 data-my-vcluster-2 -n vcluster-my-teamVerify PVCs are deleted:
kubectl get pvc -l app=vcluster -n vcluster-my-teamExpected output:
No resources found - Why scale up before restore?
The vCluster CLI requires an accessible vCluster instance to execute the restore command. Scaling up creates a new, empty vCluster that the CLI can connect to. The
vcluster restorecommand will then scale it back down automatically, restore the etcd data from the snapshot, and restart the vCluster with restored data.Scale up to the desired number of replicas:
Modify the following with your specific values to generate a copyable command:kubectl scale statefulset my-vcluster --replicas=3 -n vcluster-my-teamWait for pods to be running:
kubectl get pods -l app=vcluster -n vcluster-my-teamExpected output showing all replicas running:
NAME READY STATUS RESTARTS AGE
my-vcluster-0 1/1 Running 0 45s
my-vcluster-1 1/1 Running 0 43s
my-vcluster-2 1/1 Running 0 41s Use the vCluster CLI to restore from your snapshot. The restore process will:
- Pause the vCluster (scale down to 0)
- Delete the current PVCs
- Start a snapshot pod to restore etcd data
- Restore PVCs from volume snapshots
- Resume the vCluster (scale back up)
Modify the following with your specific values to generate a copyable command:vcluster restore my-vcluster s3://my-bucket/backup -n vcluster-my-teamExpected output:
16:16:38 info Pausing vCluster my-vcluster
16:16:38 info Scale down statefulSet vcluster-my-team/my-vcluster...
16:16:39 info Deleting vCluster pvc vcluster-my-team/data-my-vcluster-0
16:16:39 info Deleting vCluster pvc vcluster-my-team/data-my-vcluster-1
16:16:39 info Deleting vCluster pvc vcluster-my-team/data-my-vcluster-2
16:16:39 info Starting snapshot pod for vCluster vcluster-my-team/my-vcluster...
...
Successfully restored snapshot
16:16:42 info Resuming vCluster my-vclusterAuthentication for remote storageIf using S3 or OCI registry, ensure you have the appropriate credentials configured:
- S3: Use AWS CLI credentials or pass credentials in the URL
- OCI: Use Docker login or pass credentials in the URL
See Create snapshots for authentication details.
Connect to the vCluster and verify your workloads are restored:
vcluster connect my-vcluster -n vcluster-my-teamCheck that your resources are present:
kubectl get pods -A
kubectl get pvc -AIf everything looks correct, disconnect:
vcluster disconnect
Config reference​
embedded required object ​
Embedded defines to use embedded etcd as a storage backend for the virtual cluster
embedded required object ​enabled required boolean false ​
Enabled defines if the embedded etcd should be used.
enabled required boolean false ​migrateFromDeployedEtcd required boolean false ​
MigrateFromDeployedEtcd signals that vCluster should migrate from the deployed external etcd to embedded etcd.
migrateFromDeployedEtcd required boolean false ​snapshotCount required integer ​
SnapshotCount defines the number of snapshots to keep for the embedded etcd. Defaults to 10000 if less than 1.
snapshotCount required integer ​extraArgs required string[] [] ​
ExtraArgs are additional arguments to pass to the embedded etcd.
extraArgs required string[] [] ​