暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

TiUP升级集群报Run Command Timeout/SSH Timeout错误解决方案

PingCAP 2023-06-19
340

(1)问题现象:升级tiup过程中stop tikv节点超时:ERROR Run Command Timeout,其实登录到192.168.1.43查看tikv其实已经stop了。

2020-06-29T05:21:18.289+0800 INFO Stopping instance 192.168.1.43

2020-06-29T05:22:58.364+0800 INFO SSHCommand {“host”: “192.168.1.43”, “port”: “22”, “cmd”: “export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin sudo -H -u root bash -c “systemctl daemon-reload && systemctl stop tikv-20160.service””, “stdout”: “”, “stderr”: “Run Command Timeout!\"n”}

2020-06-29T05:22:58.364+0800 ERROR Run Command Timeout!

2020-06-29T05:22:58.364+0800 INFO Execute command finished {“code”: 1, “error”: “failed to upgrade: failed to stop 192.168.1.43: failed to stop: tikv 192.168.1.43:20160: executor.ssh.execute_timedout: Execute command over SSH timedout for ‘tidb@192.168.1.43:22’ {ssh_stderr: Run Command Timeout!\"n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin sudo -H -u root bash -c “systemctl daemon-reload && systemctl stop tikv-20160.service”}”, “errorVerbose”: “executor.ssh.execute_timedout: Execute command over SSH timedout for ‘tidb@192.168.1.43:22’ {ssh_stderr: Run Command Timeout!\"n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin sudo -H -u root bash -c “systemctl daemon-reload && systemctl stop tikv-20160.service”}\"n at github.com/pingcap/tiup/pkg/cluster/executor.(*SSHExecutor).Execute()\"n\"tgithub.com/pingcap/tiup@/pkg/cluster/executor/ssh.go:172\"n at github.com/pingcap/tiup/pkg/cluster/module.(*SystemdModule).Execute()\"n\"tgithub.com/pingcap/tiup@/pkg/cluster/module/systemd.go:89\"n at github.com/pingcap/tiup/pkg/cluster/operation.stopInstance()\"n\"tgithub.com/pingcap/tiup@/pkg/cluster/operation/action.go:574\"n at github.com/pingcap/tiup/pkg/cluster/operation.Upgrade()\"n\"tgithub.com/pingcap/tiup@/pkg/cluster/operation/upgrade.go:99\"n at github.com/pingcap/tiup/pkg/cluster/task.(*ClusterOperate).Execute()\"n\"tgithub.com/pingcap/tiup@/pkg/cluster/task/action.go:53\"n at github.com/pingcap/tiup/pkg/cluster/task.(*Serial).Execute()\"n\"tgithub.com/pingcap/tiup@/pkg/cluster/task/task.go:189\"n at github.com/pingcap/tiup/components/cluster/command.upgrade()\"n\"tgithub.com/pingcap/tiup@/components/cluster/command/upgrade.go:174\"n at github.com/pingcap/tiup/components/cluster/command.newUpgradeCmd.func1()\"n\"tgithub.com/pingcap/tiup@/components/cluster/command/upgrade.go:50\"n at github.com/spf13/cobra.(*Command).execute()\"n\"tgithub.com/spf13/cobra@v1.0.0/command.go:842\"n at github.com/spf13/cobra.(*Command).ExecuteC()\"n\"tgithub.com/spf13/cobra@v1.0.0/command.go:950\"n at github.com/spf13/cobra.(*Command).Execute()\"n\"tgithub.com/spf13/cobra@v1.0.0/command.go:887\"n at github.com/pingcap/tiup/components/cluster/command.Execute()\"n\"tgithub.com/pingcap/tiup@/components/cluster/command/root.go:220\"n at main.main()\"n\"tgithub.com/pingcap/tiup@/components/cluster/main.go:19\"n at runtime.main()\"n\"truntime/proc.go:203\"n at runtime.goexit()\"n\"truntime/asm_amd64.s:1357\"nfailed to stop: tikv 192.168.1.43:20160\"ngithub.com/pingcap/tiup/pkg/cluster/operation.stopInstance\"n\"tgithub.com/pingcap/tiup@/pkg/cluster/operation/action.go:593\"ngithub.com/pingcap/tiup/pkg/cluster/operation.Upgrade\"n\"tgithub.com/pingcap/tiup@/pkg/cluster/operation/upgrade.go:99\"ngithub.com/pingcap/tiup/pkg/cluster/task.(*ClusterOperate).Execute\"n\"tgithub.com/pingcap/tiup@/pkg/cluster/task/action.go:53\"ngithub.com/pingcap/tiup/pkg/cluster/task.(*Serial).Execute\"n\"tgithub.com/pingcap/tiup@/pkg/cluster/task/task.go:189\"ngithub.com/pingcap/tiup/components/cluster/command.upgrade\"n\"tgithub.com/pingcap/tiup@/components/cluster/command/upgrade.go:174\"ngithub.com/pingcap/tiup/components/cluster/command.newUpgradeCmd.func1\"n\"tgithub.com/pingcap/tiup@/components/cluster/command/upgrade.go:50\"ngithub.com/spf13/cobra.(*Command).execute\"n\"tgithub.com/spf13/cobra@v1.0.0/command.go:842\"ngithub.com/spf13/cobra.(*Command).ExecuteC\"n\"tgithub.com/spf13/cobra@v1.0.0/command.go:950\"ngithub.com/spf13/cobra.(*Command).Execute\"n\"tgithub.com/spf13/cobra@v1.0.0/command.go:887\"ngithub.com/pingcap/tiup/components/cluster/command.Execute\"n\"tgithub.com/pingcap/tiup@/components/cluster/command/root.go:220\"nmain.main\"n\"tgithub.com/pingcap/tiup@/components/cluster/main.go:19\"nruntime.main\"n\"truntime/proc.go:203\"nruntime.goexit\"n\"truntime/asm_amd64.s:1357\"nfailed to stop 192.168.1.43\"nfailed to upgrade”}

(2)解决方案:

1、升级tiup到最新版本: tiup update --self && tiup update --all 升级以下 tiup 及其组件

为啥要升级,目的是要使用最新版本的tiup的下面2个参数:

tiup cluster --help

Flags:

-h, --help help for tiup

–ssh-timeout int Timeout in seconds to connect host via SSH, ignored for operations that don’t need an SSH connection. (default 5)

-v, --version version for tiup

–wait-timeout int Timeout in seconds to wait for an operation to complete, ignored for operations that don’t fit. (default 60)

如果报ssh-timeout相关的报错,这个是中控机跟tikv/pd/tidb机器建立ssh连接的超时时间,如果遇到网络不好等情况,可以调大这个参数时间

如果报ERROR Run Command Timeout相关的报错,这个是中控机跟tikv/pd/tidb机器执行命令的超时时间,如果遇到执行比较慢,可以调大这个参数时间。

2、调整了相关的timeout超时时间,执行了多次还是升级不成功,那就祭出最大的杀器:–force

滚动升级会逐个升级所有的组件。升级 TiKV 期间,会逐个将 TiKV 上的所有 leader 切走再停止该 TiKV 实例。默认超时时间为 5 分钟,超过后会直接停止实例。

如果不希望驱逐 leader,而希望立刻升级,可以在上述命令中指定 --force,该方式会造成性能抖动(特别建议在凌晨低峰时间操作,将影响降低到最低),不会造成数据损失。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论