副标题#e#
概述
最近生产环境有这么个现象,平时的订单调度只需要2s内可以出结果,但是多个人调度就会卡住,超过15分钟都没有结果出来,有时还会失败然后导致数据不准确。

下面记录一下生产环境卡顿时排查的过程。
1、获取ASH报告
- SQL> @?/rdbms/admin/ashrpt.sql
- --To specify absolute begin time:
- --[MM/DD/YY]] HH24:MI[:SS]
- --08/09/19 08:40:00




2、ASH分析
1、Top User Events

2、相关sql
Top SQL with Top Events

sql明细

3、存储过程

4、TOP sessions

从上面分析可以看到两个明显的等待事件:wait for stopper event to be increased 等待事件和wait for a undo record 等待事件,这个应该是批量任务调度的时候产生了大量的大事务,产生了一些回滚造成了严重的资源消耗
3、处理大事务并发回滚
一般情况下wait for stopper event to be increased 等待事件是跟wait for a undo record 等待事件联系起来的。
对于这个等待事件metalink上面有一篇文档
- 464246.1
- Sometimes Parallel Rollback of Large Transaction may become very slow. After killing a large running transaction
- (either by killing the shadow process or aborting the database) then database seems to hang, or smon and parallel query servers
- taking all the available cpu.
- In fast-start parallel rollback, the background process Smon acts as a coordinator and rolls back a set of transactions in parallel
- using multiple server processes. Fast start parallel rollback is mainly useful when a system has transactions that run a long time
- before comitting, especially parallel Inserts, Updates, Deletes operations. When Smon discovers that the amount of recovery work is
- above a certain threshold, it automatically begins parallel rollback by dispersing the work among several parallel processes.
- There are cases where parallel transaction recovery is not as fast as serial transaction recovery, because the pq slaves are interfering
- with each other. It looks like the changes made by this transaction cannot be recovered in parallel without causing a performance problem.
- The parallel rollback slave processes are most likely contending for the same resource, which results in even worse rollback performance
- compared to a serial rollback.
解决的办法:
- --关掉并发回滚,变成串行回滚(直接重启解决)
- sql> alter system set fast_start_parallel_rollback = false scope=spfile;

通常,如果有很多并发进程,可以根据v$px_session视图去查看,查看v$px_session视图,发现所有的并发进程都是由smon进程导致(即qcsid列为smon进程的session id)
而smon进程的等待事件为wait for stopper event to be increased
#p#副标题#e#
即smon进程在做大事务的回滚,默认参数fast_start_parallel_rollback参数为low,即回滚时会启动2*CPU个数 个并发进程。而由于是使用并发,所以可能由于并发之间相互使用共同的资源,导致回滚速度更慢。因为是生产环境,不能随便重启,所以我用了下面的方法来修改这个参数:
1.查找smon进程ID
- select pid,spid,pname,username,tracefile from v$process where pname='SMON'

2.禁用smon进程的事务清理(Disable SMON transaction cleanup)
- oradebug setorapid 'SMON's Oracle PID';
- oradebug event 10513 trace name context forever, level 2

3.查询V$FAST_START_SERVERS视图,将所有smon启用的并发进程杀掉

4.修改fast_start_parallel_rollback参数
- alter system set fast_start_parallel_rollback=false;
5.启用smon进程的事务清理(enable transaction recovery)
- oradebug setorapid 'SMON's Oracle PID';
- oradebug event 10513 trace name context off
6.获得tracefile name
- oradebug tracefile_name

7.验证

4、业务验证
修改后去业务验证,到高峰期还是有卡顿现象,不过频率减少了很多,报错之类的也没有了,同时观察新的报告可以发现并发回滚之类的等待事件已经没有了。
#p#副标题#e##p#分页标题#e#
【编辑推荐】
- 四种分布式数据库场景选型、优缺点对比分析和未来展望
- 到底选择PostgreSOL还是MySQL?看这里
- MySQL:常用的30种SQL查询语句优化方法
- SQLite使用内存数据库
- 到底选择SQL还是NoSQL?看这里!
【责任编辑:华轩 TEL:(010)68476606】
点赞 0