Hi,
From checking the Connection Counts (http://[IP address]:8086/connectioncounts) I found that as an application is restarted the number listed for the Application is very close to the sum of the connection counts listed for each stream that runs in that application. As time goes by these number diverge drastically and after 15-30 minutes the count for the Application tends to be 50% (to even close to double) of that I get when I add up the totals for all streams that are listed under that Application.
I’ve checked this mostly on loadbalancer edge, but have also seen it on the loadbalancer origin. Funnily enough when tracking the average data rate per stream from the moment the application is started (where the Application total and added stream totals are close together) to the point where the totals are very different, the average data rate tracks the Application connections number, not the added total of individual streams. This would imply the connections are real connections and streaming, but are not linked to any of the streams running in the Application.
E.g. Just after starting the application (and as it adds connection being sent from the loadbalancer listener) I saw patterns like the following:
Before application restart: Application count: 579, total of stream counts 316 (55%)
Right after application restart: Application count: 25, total of stream counts 23 (92%)
8 minutes after application restart: Application count: 441, total of stream counts 371 (84%)
30 minutes after application restart: Application count: 525, total of stream counts 381 (73%)
15 hours after application restart: Application count: 536, total of stream counts 347 (65%)
Other servers showed a similar pattern.
What would explain the difference between the two numbers? And which one is the actually correct one? Is the difference caused by connections not being properly closed and still streaming for a while despite no longer being associated with a stream?
And, I guess most importantly, is there anything we can/should change in the set up to eliminate this discrepancy or is it part of how the server works? It seems this is reducing the capacity of our servers by 33% which would mean we’d have to run 50% more. If we can change something in the set-up to avoid that, that would of course be very welcome.
Thanks,
Peter