xiaobaoqiu Blog

Think More, Code Less

Mysql Thread_pool_size

最近项目中的beta环境,多个应用(10+个)连接数据库,使用的是连接池管理数据库连接.

最近应用中经常抛出数据库连接异常(我们使用的连接池是druid),简单的就是说无法获取Mysql的连接,创建连接失败(create connection error):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
[2015-05-14 15:50:51 ERROR com.alibaba.druid.pool.DruidDataSource:1363] create connection error 
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.7.0_45]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) ~[na:1.7.0_45]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.7.0_45]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526) ~[na:1.7.0_45]
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) ~[mysql-connector-java-5.1.21.jar:na]
    at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117) ~[mysql-connector-java-5.1.21.jar:na]
    at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:350) ~[mysql-connector-java-5.1.21.jar:na]
    at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2393) ~[mysql-connector-java-5.1.21.jar:na]
    at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2430) ~[mysql-connector-java-5.1.21.jar:na]
    at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2215) ~[mysql-connector-java-5.1.21.jar:na]
    at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:813) ~[mysql-connector-java-5.1.21.jar:na]
    at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47) ~[mysql-connector-java-5.1.21.jar:na]
    at sun.reflect.GeneratedConstructorAccessor34.newInstance(Unknown Source) ~[na:na]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.7.0_45]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526) ~[na:1.7.0_45]
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) ~[mysql-connector-java-5.1.21.jar:na]
    at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:399) ~[mysql-connector-java-5.1.21.jar:na]
    at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:334) ~[mysql-connector-java-5.1.21.jar:na]
    at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1296) ~[druid-0.2.18.jar:0.2.18]
    at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1352) ~[druid-0.2.18.jar:0.2.18]
    at com.alibaba.druid.pool.DruidDataSource$CreateConnectionThread.run(DruidDataSource.java:1361) ~[druid-0.2.18.jar:0.2.18]
Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.7.0_45]
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) ~[na:1.7.0_45]
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) ~[na:1.7.0_45]
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) ~[na:1.7.0_45]
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[na:1.7.0_45]
        at java.net.Socket.connect(Socket.java:579) ~[na:1.7.0_45]
        at java.net.Socket.connect(Socket.java:528) ~[na:1.7.0_45]
        at java.net.Socket.<init>(Socket.java:425) ~[na:1.7.0_45]
        at java.net.Socket.<init>(Socket.java:241) ~[na:1.7.0_45]
        at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:257) ~[mysql-connector-java-5.1.21.jar:na]
        at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:300) ~[mysql-connector-java-5.1.21.jar:na]
        ... 14 common frames omitted

开始怀疑机器的压力过大,将一些应用停止,还是出现这个问题.

但是相同的代码在dev环境一切正常,beta环境的机器配置要优于dev环境的.

1.Druid配置

首先怀疑那块的代码问题导致的Mysql连接没有被释放.

由于我们使用Druid连接池,尝试使用Druid的removeAbandoned功能,这个配置的意义就是连接泄漏监测,通过这个监控没有发现异常.

当程序存在缺陷时,申请的连接忘记关闭,这时候,就存在连接泄漏了。Druid提供了RemoveAbandanded相关配置,用来关闭长时间不使用的连接.

配置removeAbandoned对性能会有一些影响,建议怀疑存在泄漏之后再打开。

1
2
3
<property name="removeAbandoned" value="true" /> <!-- 打开removeAbandoned功能-->
<property name="removeAbandonedTimeout" value="600" /> <!-- 600秒,也就是10分钟-->
<property name="logAbandoned" value="true" /> <!-- 关闭abanded连接时输出错误日志-->

` 当removeAbandoned=true之后,可以在内置监控界面datasource.html中的查看ActiveConnection StackTrace属性的,可以看到未关闭连接的具体堆栈信息,从而方便查出哪些连接泄漏了。

参考: https://github.com/alibaba/druid/wiki/%E8%BF%9E%E6%8E%A5%E6%B3%84%E6%BC%8F%E7%9B%91%E6%B5%8B

2.diff mysql status

既然dev环境正常,因此我们的做法是拿出dev环境和beta环境的show status结果进行diff,结果发现了一个比较可疑的参数的配置有差异:

1
thread_pool_size

dev环境配置是16,而beta环境是4,同时线上环境的配置是24.初步判断beta环境的thread_pool_size设置应该是有问题的.

3.thread_pool_size

参考官网的描述

1
2
3
4
5
6
7
thread_pool_size is the most important parameter controlling thread pool performance. It can be set only at server startup. Our experience in testing the thread pool indicates the following:

If the primary storage engine is InnoDB, the optimal thread_pool_size setting is likely to be between 16 and 36, with the most common optimal values tending to be from 24 to 36. We have not seen any situation where the setting has been optimal beyond 36. There may be special cases where a value smaller than 16 is optimal.

For workloads such as DBT2 and Sysbench, the optimum for InnoDB seems to be usually around 36. For very write-intensive workloads, the optimal setting can sometimes be lower.

If the primary storage engine is MyISAM, the thread_pool_size setting should be fairly low. We tend to get optimal performance for values from 4 to 8. Higher values tend to have a slightly negative but not dramatic impact on performance.

简单的翻译一下: thread_pool_size是控制线程池性能最重要的一个参数,这个参数只能在Mysql服务启动的时候设置(这是官网5.5版本的文档,5.6版本证明是可以动态修改的).官网的推荐设置:

1.如果存储引擎是InnoDB,thread_pool_size值设置16到36之间比较好,一般配置在24到36之间;
2.如果存储引擎是MyISAM,thread_pool_size值应该被设置得相当小,倾向于设置在4到8之间;

官方网站: https://dev.mysql.com/doc/refman/5.5/en/thread-pool-tuning.html

3.1 thread_pool_size意义

参数thread_pool_size的命令可能会让大家产生误解,它不是指的线程池的大小,而是线程组的大小。

类似所有创建的线程都在某一个group里,group的编号从1~thread_pool_size,每个group里的worker线程数可以通过参数thread_pool_oversubscribe来控制(默认为3).

同时活跃的最大worker线程数=thread_pool_size * (thread_pool_oversubscribe + 1);

thread_pool_size默认值为CPU核心数,最大为128(MAX_THREAD_GROUPS),在启动时,就会把128个Group对应的结构体(all_groups)初始化好。每个group(编号小于等于thread_pool_size)会创建一个epoll对象;

当MySQL建立connection时, MySQL根据connection的thread id对thread_pool_size取模,将 connection发起的sql语句分配到对应的group.若worker达到最大数量后还是不足以处理回话请求, 则连接在本group上等待,导致sql语句的RT(Response time)增大。

简单示意图如下:

所有之前的问题出现的情况是:

1.thread_pool_size为4,即mysql每个thread pool的group中work线程的大小为4,如果有多个应用,恰好多个线程的connection被分发到某一个group上(假设为group-2),并且相当长一段时间一致在占有(比如我们在beta环境的连接,会占用并忙碌长达半小时),这时候,新来的连接正好也分发到group-2上,这时候Mysql上的group-2就没有多余的work线程来为其服务,即其无法和mysql创建连接,从而出现create connection error的异常.

参考: http://get.jobdeer.com/908.get http://chuansong.me/n/1192563 http://mysqllover.com/?p=826