Page 1 of 2 12 LastLast
Results 1 to 10 of 14

Thread: more CPU cores get weaker performance than less CPU cores

  1. #1

    Default more CPU cores get weaker performance than less CPU cores

    Hi,
    Iím currently tuning performance of wowza 4.0.3, with following hardware and test situation.
    2*Intel(R) Xeon(R) CPU E5-2650 0 @ 2.60GHz(total 32 logic processors)
    48G RAM
    Only one vod file, 5Mbps, about 5 minutes, codec:H264, profile:High, level:3.1, frameSize:1280x720, displaySize:1280x720, frameRate:23.980000
    2*10GbE NIC
    Clients use HLS to access the VOD video file, hafe from NIC 1, and hafe from the other NIC 2, so the NIC is not the bottleneck.
    Change cpu mode to "performance" by command "for CPUFREQ in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do [ -f $CPUFREQ ] || continue; echo -n performance > $CPUFREQ; done"

    We have tested some case as following:
    1、When we use all the 32 cpu cores, wowza only can support about 1600 connections, if we increase more connections, clients will get timeout, that is to say, clients can not download ts file in 10 seconds (each ts file is 10 second), the phenomenon is the same as "http://www.wowza.com/forums/showthread.php?37266-DirectRandomAccessReader-read()-and-seek()-generate-performance-bottleneck", VisualVM shows that DirectRandomAccessReader.read() and seek() used the most cpu time;
    2、When we use only 16 cpu cores, wowza can support about 1700 connectins, and when increase more connections, clients will get timeout, but different from above, VisualVM shows that nio.SocketIoProcessor$Worker.run() used the most cpu time. We looked into the GC log, and found that with 1700 connections, each Eden Area GC left object need about 50M-100M to store in survivor area (by gc log's "age 1" keyword), but with 1800 connections, each Eden Area GC left much more objects, need 500M-2G memory in survivor area.
    3、When we use only 8 cpu cores, wowza can support about 1900 connectins, and when increase more connections, clients will get timeout, VisualVM and gc log is similar to 16 cpu cores.
    4、Base on the results of case 1 and case 3, we start 2 wowza on this server, each using 8 cpu cores, then first start 1800 connections to the first wowza, the wowza works well, and then we start about 200 connections to the second wowza, the clients connected to the first wowza get timeout, VisualVM and gc log is similar to case 2.


    Base from above test cases, something looks like strange:
    1、Why with 8 cpu cores can achieve more connections than 16 cpu cores and 32 cpu cores? They have the same momory, same configrations.
    2、Why when the connection increase to the limits, Eden area GC left much more alive objects? We are sure that clients create connectons and get ts evenly.
    3、In case 4, why the seconds wowza influence the first one? They use different cpu cores.

    With 1Mbps vod file to test, wowza can achieve 4500 connections by using 8 cpu cores, so it seem that the configration is not the problem.

    Following is one of our startup command, and we using command "taskset" to limit wowza only can use some cpu cores, for example, "taskset 0x0000FFFF java " will limit the java only can use cpu core 0 to 15, can not use cpu core 16 to 31.

    java -server -Xms20g -Xmx20g -XX:PermSize=512m -XX:MaxPermSize=512m -XX:NewSize=12g -XX:MaxNewSize=12g -XX:+UseParNewGC -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=3 -XX:+UnlockDiagnosticVMOptions -XX:ParGCCardsPerStrideChunk=32768 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=20 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:-OmitStackTraceInFastThrow -Xloggc:/home/lid/cms_gc1935.log -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.wowza.wms.runmode=standalone -Dcom.wowza.wms.native.base=linux -Dcom.wowza.wms.AppHome=/usr/local/WowzaStreamingEngine -Dcom.wowza.wms.ConfigURL= -Dcom.wowza.wms.ConfigHome=/usr/local/WowzaStreamingEngine -cp /usr/local/WowzaStreamingEngine/bin/wms-bootstrap.jar com.wowza.wms.bootstrap.Bootstrap start


    We want to achieve 20Gbps one server, and in our opinion, if you can help us to solve above problems, we can achieve our goal. Thanks.
    Last edited by lidabnu@126.com; 07-09-2014 at 02:08 AM.

  2. #2

    Default

    And if necessary, we can provide you the gc log and wowza's access logs.

  3. #3
    Join Date
    Sep 2011
    Posts
    1,931

    Default

    Hi,
    This is now being handled in a support ticket (96994).

    Jason

  4. #4

    Default

    Hi,

    From what you are describing, it may be the network adaptors that are causing the bottleneck. It looks like the outgoing streams may be only using one of the adaptors, even though the requests are coming in on both. The numbers you are quoting support this.

    5mbps video file.
    1600 x 5mbps = 8gbps
    1700 x 5mbps = 8.5gbps
    1900 x 5mbps = 9.5gbps

    4500 x 1mbps = 4.5gbps

    This would normally be the case if you haven't taken steps to utilize both adaptors for outgoing traffic. The best way is to use bonding which will split the load evenly between both adaptors.

    There are different bonding modes. Some are for failover and some for load balancing. In this case, you need to use a load balancing mode.

    Roger.

  5. #5

    Default

    Thanks, Roger, but I am sure that the network adapters are not the bottleneck, because I used command "sar -n DEV " to monitor the network traffic, and it shows that outgoing traffic are distributed to the two network adapters.


    And after I started this thread, I make another test case, I found that, when I using 1Mbps file to test and using 16 cpu cores, wowza can only support 2500 connections. It's strange.
    Last edited by lidabnu@126.com; 07-10-2014 at 10:17 PM.

  6. #6

    Default

    Hello, Regre, Are you still tracking this issue? We don't have much time. Thanks.

    Our goal is 20Gbps or more per server, we have high performance server with 32 cpu cores, and we can extend the NICs, but we really need your help to overcome current software's limit. Thanks.

  7. #7

    Default

    Hi,

    If it isn't your network configuration which is causing the issues then I am not sure. We don't have any way of reliably generating that amount of traffic to be able to test properly.

    I'm not trying to shift blame but can you guarantee that it isn't the way that you are generating the traffic that is causing the issue or maybe some other part of the network, rather than the actual server you are testing.

    What I don't understand is why people insist on wanting to use a single huge machine to do their streaming when multiple smaller machines would be more cost effective and provide automatic redundancy. If you one 20gb machine goes down, you lose 20gb of traffic. If you have 5 4gb machines and one goes down, you only lose 4gb of traffic.

    An i7 or E3 based server with 16GB ram and quad 1GB nics bonded will reach its network limit very easily. A pool of these and associated switches would be a lot more cost effective than a single larger machine and 10gb switches.

    Roger.

  8. #8

    Default

    Hi, Roger, our competitor's streamer product can streaming 30Gbps per server, that's why we need at least 20Gbps per server, we hope wowza can support much more than 20Gbps in fact. And with high streaming density, the operator need less power, less room space.

    From your reply, I guess that Wowza were designed for "small and cheap hardware, acceptable but not excellent performance", is it right? And do you have a plan to provide a version of wowza to support higher streaming capacity (for example, 30Gbps) in the near future? And what's the max streaming capacity at present, is there suggested hardware configuration for this max capacity?

    Thanks.
    Last edited by lidabnu@126.com; 07-14-2014 at 07:15 AM.

  9. #9

    Default

    I'm not sure that if you know our company have ordered some licenses of wowza recently, and if wowza can achieve 20Gbps-30Gbps streaming capacity per server, I think our company need much more licenses.
    Thanks.

  10. #10

    Default

    Hi,

    We do have customers that are using hardware similar to yours with single 10gb nics but I'm not aware of anyone using multiple 10gb nics.

    That being said, it may be possible but you will have to test different configurations and find one that works. As I mentioned already, we don't currently have the resources to generate the amount of traffic to test this type of configuration.

    I would suggest the following. You may have already done some of this.

    Confirm that it isn't the player side causing the issues. If generating test type connections, make sure the testing servers aren't getting overloaded. Remember that real connections don't all come from a handful of locations. Try to replicate real world situations.

    Make sure your OS is tuned properly to work with the nics. Most OS's aren't specifically tuned to utilise these. Monitor the actual traffic at the nic level to make sure it isn't something there that is causing the issues.

    On the Wowza side, you may need to look at different garbage collectors. The alternative for the Oracle JVM is the G1 garbage collector. It does improve the pausing issues seen with the concurrent garbage collector. You may also need to look at a commercial JVM such as Azul Zing. If garbage collection pauses is the issue then the say their JVM will work a lot better.

    Also look at the thread pool sizes. You should monitor the server thread pools with visualVM to make sure there are not too many threads in monitor state. If so, increasing the thread pool and processor counts may help. The handler pools are used for internal processing and the transport and processor pools are used to handle the actual network side of the streaming.

    There is already a ticket open for this so if you need any further assistance want someone to look at test results, please use this ticket.

    Roger.

Page 1 of 2 12 LastLast

Similar Threads

  1. Problems with CPU performance when upgrade Wowza Engine
    By hungnguyen88 in forum Performance and Tuning
    Replies: 7
    Last Post: 08-21-2014, 01:51 AM
  2. High Cpu usage AMD Opteron Dual CPU Server.
    By djcenk in forum Performance Tuning Discussion
    Replies: 1
    Last Post: 11-26-2013, 08:11 AM
  3. CPU Dual Core CPU 8GB RAM
    By peuapeu in forum General Forum
    Replies: 4
    Last Post: 10-03-2013, 03:17 AM
  4. performance issue about cpu utilization rate
    By chocho in forum Performance Tuning Discussion
    Replies: 10
    Last Post: 02-04-2012, 09:09 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •