探究Oracle监听超时故障
2019-04-03江苏缪振龙
■ 江苏 缪振龙
编者按:本文是笔者亲历的一次Oracle TNS连接超时引起的故障,虽然看似疑难,但最终发现是由于一个简单的配置文件错误导致。
笔者最近准备对一台Oracle 10g数据库进行数据备份,采用exp方式全库导出。但尝试多次,一直提示如下错误:
EXP-00056: ORACLE error 12170 encountered
ORA-12170: TNS:Connect timeout occurred
EXP-00000: Export terminated unsuccessfully
经查阅相关资料得知,“EXP-00056”产生错误的原因是“ORA-12170”。具体来说是由于TNS连接超时导致。
TNS是Oracle Net的一部分,用来管理和配置Oracle数据库和客户端的连接工具,客户端只有正确配置TNS相关文件才能访问Oracle服务器。
故障分析
在日常维护中通常使用tnsping命令检查Oracle监听服务和网络连通性。笔者首先在服务器上tnsping实例名orcl,发现等了很久,返回操作超时错误。信息如下:
-bash-3.2$ tnsping orcl
TNS Ping Utility for Solaris: Version 10.2.0.1.0 - Production on 18-JAN-2019 19:58:22
Copyright (c) 1997,2005, Oracle.All rights reserved.
Used parameter files:
/u01/app/oracle/product/10.2.0/db_1/network/admin/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact(DESCRIPTION = (ADDRESS= (PROTOCOL = TCP)(HOST= SUN)(PORT = 1521))(CONNECT_DATA = (SERVER =DEDICATED) (SERVICE_NAME= orcl)))
TNS-12535: TNS:operation timed out
接下来,笔者tnsping本机IP,返回结果正常。如下:
-bash-3.2$ tnsping 172.28.28.36
TNS Ping Utility for Solaris: Version 10.2.0.1.0 - Production on 18-JAN-2019 20:08:50
Copyright (c) 1997,2005, Oracle.All rights reserved.
Used parameter files:
/u01/app/oracle/product/10.2.0/db_1/network/admin/sqlnet.ora
Used EZCONNECT adapter to resolve the alias
Attempting to contact
(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=172.28.28.36))(ADDRESS=(PROTOCOL=TCP)(HOST=172.28.28.36)(PORT=1521)))
OK (0 msec)
笔者怀疑监听服务存在问题,执行“lsnrctl stop” 和“lsnrctl start”命令来重启监听服务,发现故障提示依旧。接着执行“lsnrctl status”查看监听服务状态。返回结果如图1所示。
通过最后两行,我们可以看出实例orcl监听处于未知状态。orcl实例没有注册到监听服务中。于是,通过SQLPlus工具登录数据库进行实例注册。具体方法如下:
-bash-3.2$ sqlplus /nolog
SQL*Plus: Release 10.2.0.1.0 - Production on Fri Jan 18 20:10:51 2019
Copyright (c) 1982,2005, Oracle.All rights reserved.
SQL> conn system/rs_789 as sysdba;
Connected.
SQL> alter system register;
System altered.
图1 执行lsnrctl status查看监听服务状态
图2 查看Oracle监听服务相关日志
再次使用“lsnrctl status”命令查看监听服务状态,发现故障提示依旧。那看看配置文件是不是有问题呢?查看tnsnames.ora文件。配置如下:
ORCL =
(DESCRIPTION =
(ADDRESS =(PROTOCOL = TCP)(HOST =SUN)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME =orcl)
)
)
EXTPROC_CONNECTION_DATA =
(DESCRIPTION=
(ADDRESS_LIST =
(ADDRESS =(PROTOCOL = IPC)(KEY = EXTPROC1))
)
(CONNECT_DATA =
(SID =PLSExtProc)(PRESENTATION =RO)
)
该配置文件并未发现什么异常。接下来,笔者又查看了Oracle监听服务相关日志。最后几行日志如图2所示。
上面一直有警告提示,“Subscription for node down event still pending”。
故障解决
经上网查询相关资料,此错误引起的原因可能由于监听无法在指定端口启动,前一次监听服务未正常退出或有其他程序占用端口。提示查看/etc/hosts相关配置。于是,笔者打开/etc/hosts文件,配置如下:
-bash-3.2$ cat /etc/hosts
#
# Internet host table
#
::1 localhost
127.0.0.1 localhost
222.*.*.* SUN loghost
172.28.28.36 WWW
果然发现一些问题,由于当时服务器要对外网开放使用,服务器同时接入内网和外网。但后来处于网络安全考虑,所有服务器的外网都统一转移到上层硬件防火墙进行访问控制。服务器只接入内网。但/etc/hosts的配置信息并没有做相应修改。既然问题已找到,于是修改/etc/hosts文件,注意要先切换到root用户,配置完成后要使用“wq!”,强制保存退出。更改后配置如下:
-bash-3.2# cat /etc/hosts
#
# Internet host table
#
::1 localhost
127.0.0.1 localhost
172.28.28.36 SU
最后再切换到Oracle用户下,执行lsnrctl stop命令和lsnrctl start命令重启监听服务。刚开始监听服务还处于未知状态,稍等约1分钟,再次执行lsnrctl status命令,可以看到实例监听处于Ready状态。返回如下信息:
-bash-3.2$ lsnrctl status
LSNRCTL for Solaris:Version 10.2.0.1.0 -Production on 18-JAN-2019 20:11:35
Copyright (c) 1991,2005, Oracle.All rights reserved.
Connecting to (DESCRI PTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1)))
STATUS of the LISTENER
……
(DESCRIPTION=(AD DRESS=(PROTOCOL=tcp)(HOST=172.28.28.36)(PORT=1521)))
Services Summary...
Service "PLSExtProc"has 1 instance(s).
Instance"PLSExtProc", status UNKNOWN, has 1 handler(s)for this service...
Service "orcl" has 2 instance(s).
Instance "orcl",status UNKNOWN, has 1 handler(s) for this service...
Instance "orcl",status READY, has 1 handler(s) for this service...
Service "orclXDB" has 1 instance(s).
Instance "orcl",status READY, has 1 handler(s) for this service...
Service "orcl_XPT"has 1 instance(s).
Instance "orcl",status READY, has 1 handler(s) for this service...
The command completed successfully
通过本次故障可以发现,一个看似疑难的故障问题往往是由一个简单的配置文件错误导致。所以,在我们今后的工作中,一定要扎扎实实学好理论基础,面对故障问题才能迎刃而解。