Performance Testing -- Part One: Resources
Whilst this topic is not explicitly about the performance of Livelink, it is applicable to those who are tasked to host Livelink and wish to understand the rough capacity and performance characteristics of their service. I'm going to discuss, at a pretty high level, the testing we are performing on our new hardware and data centre deployment and that means that the services being tested are based on Drupal and not Livelink but the procedure is similar.
Before we jump in, however, it is worth making mention of the general difficulty involved in performance testing of web sites, along with the special considerations necessary for dynamic web sites. This is definitely going to be a multi-part post, just how many parts I'm not yet sure. Let's call this particular one an introduction.
Resource Intensive
The biggest problem one faces when doing performance testing is to find enough and appropriate resources at hand; client resources, network resources, and server resources. You want to make sure that you have sufficient resources to perform the tests that you are attempting else you may well find that you aren't testing the performance of your web site/application but the performance of your client and/or network.
Server resources are probably what you are testing so we'll ignore that for this post except to say that some calculations must be made to decide ahead of time what is possible and what is not. There is little point in banging your proverbial head against the data wall, computers can't go 101% let alone 125% (now you be quiet you overclockers, I wasn't talking to you).
Client resources are often the toughest ones to estimate, they so often do far less than one might expect :-) While rendering the content is only rarely involved during load testing, parsing the content to some degree almost always is in order to provide the data needed to determine times and other statistical data. But more than the actual work involved in the content are the OS resources required to "mimic" hundreds or thousands of users.
The CPU of a client is usually maxed out by the testing process with almost all the activity down in the kernel (system level) ... there is a lot of work to do. But RAM is not usually something you need much of in a client, there is nothing to do with the data once it arrives, not even write it to a file. If there is any parsing of data, it is at a very high level (essentially looking for links) so the memory footprint is almost always very modest.
Best rule of thumb I can provide is that a single client using a 2.5GHz single-core CPU is capable of generating about 10 transactions per second in ideal circumstances for a sustained period of time provided the server is capable of keeping up with the load. (if the server can't keep up then the backlog may well affect adversely the client).
Now, you might think that a dual-core CPU could, therefore, do 20 transactions/second and a quad-core 40 transactions / second but that is very unlikely to be the case in the real world -- consider, 40 x 50,000 bytes = 2,000,000 bytes / second (or 2000Mbs, or 2Gbs, or 2 Gigabits) ... that is more data than twinned, dual-Gigabit NICs can provide and it takes a very well balanced motherboard, backplane, and operating system to keep up with a load like that!
Network resources are so often the bottleneck, IP is a protocol that tries hard to assure a service is available and will quickly and effectively sacrifice performance of any particular "connection" for the overall health of the network. The Internet Protocol and the Transmission Control Protocol (TCP) layer built on top of IP were designed during the times of the Cold War and they were designed to ensure that communication would continue between all points within a connected network even if/when certain components of that network are not available...if it was physically possible to move the data between hosts then the IP protocol protocol provides a dynamic method of determining a path or "route. IP also will always try to negatively change the working parameters of all connections into a network in order to accommodate attempts to create new connections, degrading performance to a point where only simple messages (like text-based email or today's "tweets") are at all able to effectively use the network.
TCP is actually more responsible for performance than the IP protocol it was built on: it was built for resilience, making guarantees to both end-points that data will arrive intact and in order, but without virtually anything to say about "in time". It has two problematic parameters that often play a significant part in attempting performance testing. The "slow start" and "collision avoidance" algorithms are designed to ensure that TCP connections play nice with the IP network so that they all have a "fair" chance at resources, both parameters affect the TCPwindowsize, how much data a single TCP packet can send.
This post has already gone too deep into technology, this information is available in great detail all over the place. All I want to say is that if you are trying to test SERVER performance, you should ensure that your network resources never exceed 70% utilization else it will degrade performance.
What's next?
My next post will talk about the software used for performance testing. I'll look at how it is used and how it is abused. And then I'll follow that up with what we actually did to test the Village along with some thoughts as to how that could be applied to Livelink.


