Cluster stuck in “Paused” state causing node registration failures
集群卡在“暂停”状态,导致节点注册失败

环境
  • Rancher v2.6+  牧场主 v2.6+
  • A Rancher-provisioned RKE2 or K3s cluster 
    Rancher 配置的 RKE2 或 K3s 集群
情况

In some scenarios, a cluster may enter a paused state due to failed or interrupted cluster operations. While the cluster is paused, newly added nodes are unable to complete registration.
在某些情况下,集群可能因作失败或中断而进入暂停状态 。在集群暂停期间,新加入的节点无法完成注册。

During this time, the node installation script may repeatedly log the following error:
在此期间,节点安装脚本可能会反复记录以下错误:

[ERROR] 000 received while downloading Rancher connection information.
    Sleeping for 5 seconds and trying again

As a result, nodes remain stuck during provisioning and the cluster does not progress.
因此,节点在配置过程中会卡住,集群无法继续。

解决方案

Unpause the Cluster API (CAPI) cluster by setting the .spec.paused to false on the clusters.cluster.x-k8s.io object corresponding to the cluster.
通过将对应集群 clusters.cluster.x-k8s.io 对象的 .spec.paused 设置为 false,来解除 Cluster API(CAPI)集群的暂停。

Identify the CAPI cluster name
识别 CAPI 集群名称

<span style="background-color:#efefef"><code>kubectl get <a data-cke-saved-href="http://clusters.cluster.x-k8s.io/" href="http://clusters.cluster.x-k8s.io/">clusters.cluster.x-k8s.io</a> -n fleet-default
</code></span>

Edit the affected cluster
编辑受影响的集群

<span style="background-color:#efefef"><code>kubectl edit <a data-cke-saved-href="http://clusters.cluster.x-k8s.io/" href="http://clusters.cluster.x-k8s.io/">clusters.cluster.x-k8s.io</a> <cluster-name> -n fleet-default
</code></span>

In the cluster spec locate the field
在集群规格中定位字段

<span style="background-color:#efefef"><code>spec:
  paused: true
</code></span>

Change it to  改为

<span style="background-color:#efefef"><code>spec:
  paused: false
</code></span>

save and exit the editor.
保存并退出编辑器。

原因

This behavior occurs because Rancher intentionally pauses the CAPI cluster during snapshot restore, cert rotation and encryption key rotation operations. Pausing the cluster prevents Cluster API (CAPI) from reconciling resources during a potentially unsafe state.
这种行为是因为 Rancher 在快照恢复、证书轮换和加密密钥轮换作时故意暂停 CAPI 集群。暂停集群可以防止集群 API(CAPI)在潜在不安全状态下对资源进行对账。

If these operations fail or are interrupted, the cluster may remain paused and is not automatically unpaused. This is expected behavior by design, to avoid further reconciliation actions that could impact cluster stability.
如果这些作失败或被中断 ,集群可能会保持暂停状态, 不会自动恢复。 这是设计上的预期行为,以避免进一步的对账作影响集群稳定性。

Logo

开源鸿蒙跨平台开发社区汇聚开发者与厂商,共建“一次开发,多端部署”的开源生态,致力于降低跨端开发门槛,推动万物智联创新。

更多推荐