How resilient is your cloud? Part 1

7:26 PM
How resilient is your cloud? Part 1 -

Some of the questions are asked most often by cloud administrators:

  • How many resources you can create in a cloud managed by Citrix Cloud Platform?
  • How far can I stretch my cloud and yet provide quality of service for my clients available?
  • How many virtual machines can I use a set of hosts or edit?
  • How many accounts can be managed?
  • How many areas can I have?
  • can handle How many VPCs Cloud Platform?
  • What is the reaction time to list say 10,000 Virtual Machines over 500 hosts distributed?

And so on. You get the idea.

This blog series will address some of these issues in the course of several installments. The idea is to set up advice and information for cloud administrators the cloud so that Cloud Platform can orchestrate resources efficiently and at the same time address incoming API requests with an acceptable response time.

The first part focuses on the performance of the most common and basic use cases in a scaled environment. Some examples are implementing virtual machine and response time for important and frequently ListAPI queries in a scaled environment up to 00 hosts.

configuration

For a high scaled setup with about 00 hosts used, it is obviously impractical to organize such an infrastructure. Therefore, the most sensitive way to test a cloud of this magnitude for performance is the built Cloud Platform simulator to be used. The simulator can be used to mock resources, including hosts, storage pools, virtual machines, etc., and behaves the same as the actual resources in most cases.

Insofar as concerns the Cloud Platform management server, so there is no major difference between an actual resource and a simulated resource. For most tests, which are independent hypervisor, this is our goal.

The configuration here is considered to be a scaled environment with approximately 00 Simulator hosts and more than 4,000 accounts. Let us consider the redundant virtual router deal, so we. Two routers per network

Use Case 1

, the above configuration, which will be given the time to implement a virtual machine?

I had a total of 100 Simulator virtual machines in use and monitors the time taken to use for the first VM to the last VM to the VMs. This test is conducted on 4.3.0 version of Cloud Platform and uses the following configurations for the management server:

Metric: Time to implement Virtual Machine

This is the taken time to finish the job deploy VM Async the virtual machine to run state and to bring.

Here is a chart showing the trend of the times is taken 100 for provisioning of virtual machines. It shows the Time in seconds Taken to deploy virtual machines - from the first start up 100th

deployvm_time

As can be seen above from the diagram, the management Server. takes about 5-10 seconds to select a deployment target and implement the VM. The peaks in the early part of the graph is (elected two routers per network since RVR deal) for the virtual router made in any network to implement time to see a share.

The other observation is evident that over 10K VMs that time is higher as compared to the first few VMs. This is to be expected, since most of the hosts are already running full of virtual machines and the management server spends some time looking for available hosts. And the result is established earlier in all the baselines

Metric :. Time for deployVirtualMachine API response

I also have the response time of deploy VM API measured. This is different from the job Async reaction time in the sense that the API reaction time is made substantially the time for the management server to do the initial processing of the API and react with a job ID.

Here is a diagram showing that to see response time in seconds for deployVirtualMachine API from the first 12,000 to VM

responsetime As evident from the results is, the majority of the APIs take between 0 and 1 tick to process the request and take a few of them for about 5 seconds.

Use Case 2

Another important application in a scaled environment is the time for the various ListAPIs taken to give a response. This is also the UI performance directly affects as this is the most common API that is triggered obtained if the user can see the user interface.

A number of important list APIs were considered for this test and here is the result.

The graph below shows response time in seconds for visits to various list API.

listapi_responsetime

The above mentioned data for different values ​​of page size is according to the listAPI query.
, for example, there were 12K VMs, 4K accounts, 8K router, 20K events, 2K hosts, 4K user and 12K volumes in the test setup. The list visits were made without limit for the page size, so that it fetches all the objects.

In order to achieve a cloud of this magnitude with Cloud Platform setup, it is important to note that there are only a few configurations that must be coordinated so that Cloud Platform can orchestrate the cloud effectively.

Few of the tuning parameters are mentioned below.

  • had, for example, the test setup 3 management server, any management server is a 16G RAM server with dual-core 4-processors and a total of 3 of these management servers. It is recommended to add a management server for each 6-7k VMs, we also take into account load balancing at every turn overs take should
  • The cloud database on a remote server with 32G of RAM is hosted and 8 processor server
  • the database buffer pool size (innodb_buffer_pool_size) was set at 80% of the RAM
  • also note that as many hosts and VMs and the management server to implement these many resources orchestrate, we need the Java heap size is set to at least 8 GB.
  • cloud.maxActive compounds was placed in db.properties 1000

This brings us to the end of the first part of this blog series. Look out on the next parts that have more metrics on the latest Cloud Platform Performance, Tuning and Notes Your Cloud to make a better performance!

Previous
Next Post »
0 Komentar