暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

寒冬中的温暖-SUN E4500温度过高当机

原创 eygle 2006-12-19
857
上个周末,一台数据库服务器SUN E4500因为故障,温度过高导致当机,那么温度有多高呢?
[ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C)
[ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to
over-temperature condition on SBus FFB SOC+ IO board 1
[ID 350302 kern.notice] NOTICE: SBus FFB SOC+ IO board 1 is cooling (temperature: 67C)
[ID 538492 kern.notice] NOTICE: System shutdown due to over-temperature condition cancelled
[ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C)
[ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to
over-temperature condition on SBus FFB SOC+ IO board 1
[ID 350302 kern.notice] NOTICE: SBus FFB SOC+ IO board 1 is cooling (temperature: 67C)
[ID 538492 kern.notice] NOTICE: System shutdown due to over-temperature condition cancelled
[ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C)
[ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to
over-temperature condition on SBus FFB SOC+ IO board 1
[ID 350302 kern.notice] NOTICE: SBus FFB SOC+ IO board 1 is cooling (temperature: 67C)
[ID 538492 kern.notice] NOTICE: System shutdown due to over-temperature condition cancelled
[ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C)
[ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to
over-temperature condition on SBus FFB SOC+ IO board 1
[ID 350302 kern.notice] NOTICE: SBus FFB SOC+ IO board 1 is cooling (temperature: 67C)
[ID 538492 kern.notice] NOTICE: System shutdown due to over-temperature condition cancelled
[ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C)
[ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to
over-temperature condition on SBus FFB SOC+ IO board 1
[ID 470940 kern.warning] WARNING: SBus FFB SOC+ IO board 1 still too hot (temperature: 68C).
Overtemp shutdown started

系统Shutdown的时候,温度达到了68度。在这寒冷的冬日里,这个温度真实太温暖了。
启动后检查,是一块IO板出了问题:
bash-2.03# /usr/platform/sun4u/sbin/prtdiag -v
System Configuration: Sun Microsystems sun4u 8-slot Sun Enterprise E4500/E5500
系统时钟频率:100 MHz
内存大小:2048Mb
========================= CPUs =========================
Run Ecache CPU CPU
Brd CPU Module MHz MB Impl. Mask
--- --- ------- ----- ------ ------ ----
0 0 0 400 8.0 US-II 10.0
0 1 1 400 8.0 US-II 10.0
2 4 0 400 8.0 US-II 10.0
2 5 1 400 8.0 US-II 10.0
4 8 0 400 8.0 US-II 10.0
4 9 1 400 8.0 US-II 10.0
========================= 内存 =========================
Intrlv. Intrlv.
Brd Bank MB Status Condition Speed Factor With
--- ----- ---- ------- ---------- ----- ------- -------
0 0 1024 Active OK 60ns 2-way A
2 0 1024 Active OK 60ns 2-way A
========================= IO 卡 =========================
Bus Freq
Brd Type MHz Slot Name Model
--- ---- ---- ---------- ---------------------------- --------------------
1 SBus 25 0 SUNW,socal/sf (scsi-3) 501-5266
1 SBus 25 3 SUNW,hme
1 SBus 25 3 SUNW,fas/sd (block)
1 SBus 25 13 SUNW,socal/sf (scsi-3) 501-3060
1 UPA 100 2 FFB, Double Buffered SUNW,501-4790
Detached Boards
===============
Slot State Type Info
---- --------- ------ -----------------------------------------
3 failed disk Disk 0: no disk Disk 1: no disk
系统中失败的字段取代单元 (FRU):
==============================================
disk-board 在 IO 板上不可用 #3 上
PROM 错误字符串:fail
失败的字段取代单元为 IO 板 3
Detected System Faults
======================
Board 1 fault: Overtemp
Detected Sat Dec 16 02:24:21 2006
Unit 2 Core Power Supply failure
Detected Fri Dec 15 23:24:23 2006
Unit 1 Core Power Supply failure
Detected Fri Dec 15 23:24:23 2006
PROM detected failure
Detected Fri Dec 15 23:24:23 2006
最近的 AC 电源故障:
=============================
Fri May 27 14:53:06 2005
========================= 环境状态 =========================
Keyswitch position is in Normal Mode
System Power Status: Minimum Available
System LED Status: GREEN YELLOW GREEN
WARNING ON ON BLINKING
Fans:
-----
Unit Status
---- ------
Rack OK
Key OK
AC OK
System Temperatures (Celsius):
------------------------------
Brd State Current Min Max Trend
--- ------- ------- --- --- -----
0 OK 39 36 43 stable
1 WARNING 66 46 67 stable
2 OK 39 36 43 stable
4 OK 53 50 55 stable
CLK OK 38 37 40 stable
Power Supplies:
---------------
Supply Status
--------- ------
0 OK
1 FAIL
2 FAIL
3 OK
PPS OK
System 3.3v OK
System 5.0v OK
Peripheral 5.0v OK
Peripheral 12v OK
Auxilary 5.0v OK
Peripheral 5.0v precharge OK
Peripheral 12v precharge OK
System 3.3v precharge OK
System 5.0v precharge OK
AC Power OK
========================= HW Revisions =========================
ASIC Revisions:
---------------
Brd FHC AC SBus0 SBus1 PCI0 PCI1 FEPS Board Type Attributes
--- --- -- ----- ----- ---- ---- ---- ---------- ----------
0 1 5 CPU 100MHz Capable
1 1 5 1 22 UPA-SBus-SOC+ 100MHz Capable
2 1 5 CPU 100MHz Capable
3 Unknown 100MHz Capable
4 1 5 CPU 100MHz Capable
Board 1 FFB Hardware Configuration:
-----------------------------------
Board rev: 2
FBC version: 0x3241906d
DAC: Brooktree 9070, version 1
3DRAM: Mitsubishi 130b, version 2
System Board PROM revisions:
----------------------------
Board 0: OBP 3.2.29 2001/06/18 17:28 POST 3.9.29 2001/06/18 17:50
Board 1: FCODE 1.8.29 2001/06/18 17:26 iPOST 3.4.29 2001/06/18 17:49
Board 2: OBP 3.2.29 2001/06/18 17:28 POST 3.9.29 2001/06/18 17:50
Board 4: OBP 3.2.29 2001/06/18 17:28 POST 3.9.29 2001/06/18 17:50

更郁闷的是,目前这台服务器处于关键运营时期,还不能重新启动更换硬件。
只好等下次何时Down机。
-The End-
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论