linux - Weird Tomcat outage, possibly related to maxConnections -
in company experienced serious problem today: our production server went down. people accessing our software via browser unable connection, people had been using software able continue using it. our hot standby server unable communicate production server, using http, not going out broader internet. whole time server accessible via ping , ssh, , in fact quite underloaded - it's running @ 5% cpu load , lower @ time. no disk i/o.
a few days after problem started have new variation: port 443 (https) responding port 80 stopped responding. server load low. after restarting tomcat, port 80 started responding again.
we're using tomcat7, maxthreads="200", , using maxconnections=10000. serve data out of main memory, each http request completes quickly, have large number of users doing simple interactions (this high school subject selection). seems unlikely have 10,000 users browser open on our page @ same time.
my question has several parts:
- is "maxconnections" parameter cause of our woes?
- is there reason not set "maxconnections" ridiculously high value e.g. 100,000? (i.e. what's cost of doing so?)
- does tomcat output warning message anywhere once hits "maxconnections" message? (we didn't notice anything).
- is possible there's os limit we're hitting? we're using centos 6.4 (linux) , "ulimit -f" says "unlimited". (do firewalls understand concept of tcp/ip connections? there limit elsewhere?)
- what happens when tomcat hits "maxconnections" limit? try close down inactive connections? if not, why not? don't idea our server can held ransom people having browsers on it, sending keep-alive's keep connection open.
but main question is, "how fix our server?"
more info requested stefan , sharpy:
- our clients communicate directly server
- tcp connections in cases refused , in other cases timed out
- the problem evident connecting browser server within network, or hot standby server - in same network - unable database replication messages happens on http
- iptables - yes, iptables6 - don't think so. anyway, there's nothing between browser , server when test after noticing problem.
more info: looked had solved problem when realised using default tomcat7 setting of bio, has 1 thread per connection, , had maxthreads=200. in fact 'netstat -an' showed 297 connections, matches 200 + queue of 100. changed nio , restarted tomcat. unfortunately same problem occurred following day. it's possible misconfigured server.xml.
the server.xml , extract catalina.out available here: https://www.dropbox.com/sh/sxgd0fbzyvuldy7/aaczwobkxnkfxjssmkgkvgw_a?dl=0
more info: did load test. i'm able create 500 connections development laptop, , http 3 times on each, without problem. unless load test invalid (the java class in above link).
it's hard tell sure without hands-on debugging 1 of first things check file descriptor limit (that's ulimit -n
). tcp connections consume file descriptors, , depending on implementation in use, nio connections polling using selectablechannel
may eat several file descriptors per open socket.
to check if cause:
- find tomcat pids using
ps
- check
ulimit
process runs with:cat /proc/<pid>/limits | fgrep 'open files'
- check how many descriptors in use:
ls /proc/<pid>/fd | wc -l
if number of used descriptors lower limit, else cause of problem. if equal or close limit, it's limit causing issues. in case should increase limit in /etc/security/limits.conf
user account tomcat running , restart process newly opened shell, check using /proc/<pid>/limits
if new limit used, , see if tomcat's behavior improved.
Comments
Post a Comment