Performance best practices

Transcript Performance best practices

vSphere Performance Best Practices

Rob Moran

Premier Services Engineer – VMware Global Support Services – Cork, Ireland

Global Support Services and Customer Advocacy

Burlington, Canada Palo Alto, CA Cork, Ireland Broomfield, CO Tokyo, Japan Bangalore, India Global Coverage 24x7, 365 days/year 6 Support Centers 1000+ Support Engineers Support offices Local language support

Spanish, Portuguese, French, German, Japanese, Chinese

Follow-the-sun Support for Severity 1 Issues Support Relationships with 100% of the Fortune 100; 99% of Fortune 500

Customer Support Day Events

• •

Coming to a location near you: sharing of VMware best practices!

Support Days are a collaboration between VMware Support, Sales and customers – you learn directly from the experts • • • • • Topics are driven by customer input, and typically include: Best practices Tips/tricks Top issues Product roadmaps/demos Certification offerings http://www.vmware.com/go/supportdays 3

Overview What a performance problem sounds like:

• “My VM is running slow and I don’t know what to do!” • “I tried adding more memory and CPUs but the problem got worse!”` • “My VM is slow on one host but fast on another!”

What to look for? Where to start?

We will explore some of the most common performance-related issues that our support centers receive cases for

A word about performance….



Troubleshooting methodology must define:

• How to find root cause • How to fix the problem 

Must answer these questions:

How do we know when we are done? 2.

Where do we start looking for problems? 3.

How do we know what to look for to identify a problem? 4.

How do we find the root-cause of a problem we have identified? 5.

What do we change to fix the root-cause? 6.

Where do we look next if no problem is found? 5

Agenda



Benchmarking & Tools



Best Practices and Troubleshooting



The 4 “food groups”

• Memory • CPU • Storage • Network 6

BENCHMARKING & TOOLS

Benchmarking



Consistent and reproducible results



Important to have base level of acceptable performance

• Expectation vs. Acceptable 

Determine baseline of performance prior to deployment

• Benchmark on a physical system if applicable 

Avoid subjective metrics, stay quantitative

• “The system seems slower” • “This worked better last year” 8

Benchmarking



Benchmarking should be done at the application layer

• Use application-specific benchmarking tools and load generators • Check with the application vendor 

Isolate variables, benchmark optimum situation before introducing load



Understand dependencies

• Human interaction • Other “food groups” • Compare apples-to-apples 9

Tools – vCenter Operations

 Slide 10

Aggregates thousands of metrics into Workload, Capacity, Health scores



Self learns “normal” conditions using patented analytics



Smart alerts of impending performance and capacity degradation



Identifies potential performance problems before they start

Tools – vCenter Operations

Slide 11 11

Tools – esxtop



Valuable tool built in to vSphere hosts



View or capture real-time data

• View or playback data later • Import data in 3 rd party tools 

vSphere Client performance graphs get their data from the kernel and VSI

• Presentation/unit may be different (e.g. %RDY) 12

MEMORY

Memory – Overhead



A VM’s RAM is not necessarily machine RAM

• vRAM + overhead = maximum machine RAM 14 Source: vSphere 5.1 Resource Management Guide • Note: These are estimated values

Memory – Transparent Page Sharing

Memory – Host Memory Management Occurs when memory is under contention



Ballooning



Compression



Swapping

Memory – Ballooning

Memory – Compression

Memory – Swapping

Memory – VM Resource Allocation

Memory – Resource Pool Allocation

Memory – Ballooning vs. Swapping



Ballooning is better than swapping



Guest can surrender unused/free pages



Guest chooses what to swap, can avoid swapping “hot” pages

Memory – Rightsizing



Generally it is better to OVER-commit than UNDER-commit



If the running VMs are consuming too much host/pool memory…

• Some VMs may not get physical memory • Ballooning or host swapping • Higher disk IO • All VMs slow down 24

Memory – Rightsizing



If a VM has too little vRAM …

• Applications suffer from lack of RAM • The guest OS swaps • Increased disk traffic, thrashing • SAN slow down as a result of increased disk traffic 

If a VM has too much vRAM…

• Higher overhead memory • Possible decreased failover capacity • Longer vMotion time • Larger VSWP file • Wasted resources 25

Memory – Troubleshooting



Wrong resource allocation

 May not notice a limit, e.g. VM or template with a limit gets cloned  Custom share values 

Ballooning or swapping at the host level

• Ballooning is a warning sign, not a problem • Swapping is a performance issue if seen over an extended period 

Swapping/paging at the guest level

• Under-provisioned guest memory 

Missing balloon driver (Tools)

Memory – Best Practices



Avoid high active host memory over-commitment

• No host swapping occurs when total memory demand is less than the physical memory (Assuming no limits) 

Right-size guest memory

• Avoid guest OS swapping 

Ensure there is enough vRAM to cover demand peaks



Use a fully automated DRS cluster

• Use Resource Pools with High/Normal/Low shares • Avoid using custom shares 27

CPU

CPU – Overview



Raw processing power of a given host or VM

• Hosts provide CPU resources • VMs and Resource Pools consume CPU resources 

CPU cores/threads need to be shared between VMs



Fair scheduling vCPU time

• Hardware interrupts for a VM • Parallel processing for SMP VMs • I/O 29

CPU – esxtop



Interpret the esxtop columns correctly



%RDY - The percentage of time a VM is ready to run, but no physical processor is ready to run it which may result in decreased performance



%USED – Physical CPU usage



%SYS – Percentage of time in the VMkernel



%RUN – Percentage of total scheduled time to run



%WAIT – Percentage of time in blocked or busy wait states



%IDLE – %WAIT- %IDLE can be used to estimate I/O wait time

CPU – Performance Overhead & Utilization



Different workloads have different overhead costs (%SYS) even for the same utilization (%USED)



CPU virtualization adds varying amounts of system overhead

• Direct execution vs. privileged execution • Non-paravirtual adapters vs. emulated adaptors • Virtual hardware (Interrupts!) • Network and storage I/O 32

CPU – vSMP



Relaxed Co-Scheduling: vCPUs can run out-of-sync



Idle vCPUs incur a scheduling penalty

• configure only as many vCPUs as needed • Imposes unnecessary scheduling constraints 

Use Uniprocessor VMs for single-threaded applications

CPU – Scheduling

Over committing physical CPUs

VMkernel CPU Scheduler

CPU – Scheduling

Over committing physical CPUs

VMkernel CPU Scheduler X X

CPU – Scheduling

Over committing physical CPUs

X X X X VMkernel CPU Scheduler

CPU – Ready Time



The percentage of time that a vCPU is ready to execute, but waiting for physical CPU time



Does not necessarily indicate a problem

• Indicates possible CPU contention or limits 37

CPU – NUMA nodes



Non-Uniform Memory Access system architecture



Each node consists of CPU cores and memory



A CPU core in one NUMA node can access memory in another node, but at a small performance cost

NUMA node 2 38 NUMA node 1

CPU – Troubleshooting



vCPU to pCPU over allocation

• HyperThreading does not double CPU capacity!



Limits or too many reservations

• can create artificial limits.



Expecting the same consolidation ratios with different workloads

• Virtualizing “easy” systems first, then expanding to heavier systems •

Compare Apples to Apples

• Frequency, turbo, cache sizes, cache sharing, core count, instruction set… 39

CPU – Best Practices



Right-size vSMP VMs



Keep heavy-hitters separated

• Fully automated DRS should do this for you • Use anti-affinity rules if necessary 

Use a fully automated DRS cluster

• Test that vMotion works • Use Resource Pools with High/Normal/Low shares • Avoid using custom shares 40

STORAGE

Storage – esxtop Counters



Different esxtop storage views

• Adapter (d) • VM (v) • Disk Device (u) 

Key Fields:

• DAVG + KAVG = GAVG • QUED/USD – Command Queue Depth • CMDS/s – Commands Per Second • MBREADS/s • MBWRTN/s 42

Storage – Troubleshooting with esxtop



High DAVG: issue beyond the adapter

• bad/overloaded zoning, over utilized storage processors, too few platters in the RAID set, etc.



High KAVG: issue in the kernel storage stack

• Driver issue • Full queue 

Aborts: GAVG exceeding 5000 ms

• Command will be repeated, storage delay for the VM 43

Storage – Benchmarking with iometer

Storage – Storage I/O Control



Allows the use of Shares per VMDK



Throttling occurs when datastore reaches latency threshold

• Higher share VMDKs perform IO first 

vCenter monitors latency across all hosts

• Not effective if datastore shared with other vCenters 45

Storage – Storage DRS



Datastore clusters

• Maintenance mode • Anti-affinity rules 

vCenter monitors for latency and disk space

• Migrate VMDKs for better performance or utilization 

Not effective with automated tiering SANs

• Check HCL to confirm these features are compatible 46

Storage – Troubleshooting



Snapshots



Excessive traffic down one HBA / Switch / SP can cause latency

• Consider using Round Robin in conjunction with ALUA • Always be paranoid when it comes to monitoring storage I/O 

Consider your I/O patterns

• Peak time for storage IO?

• Virus scans, database maintenance, user logins 

Always consult with array vendor

• They know the best practices for their array!

Storage – Best Practices



Use different tiers of storage for different VM workloads

• Slower storage for OS VMDKs • Faster storage for databases or other high-IO applications 

Use the Paravirtual SCSI adapter

• Reduced overhead, higher throughput 

Use path balancing where possible, either through 3 rd party plugins / Round Robin and ALUA, if supported.



Use Storage DRS with SIOC

• Balance for both free space and latency • Simplified datastore management 48

NETWORK

Network – Load Balancing



Load balancing defines which uplink is used

• Route based on Port ID • Route based on IP hash • Route based on MAC hash • Route based on NIC load (Load Based Teaming) 

Probability of high-bandwidth VMs being on the same physical NIC



Traffic will stay on elected uplink until an event occurs

• NIC link state change, adding/removing NIC from a team, beacon probe timeout… 50

Network – Troubleshooting



Check counters for NICs and VMs

• Network load imbalance • 10 Gbps NICs can incur a significant CPU load when running at 100% 

Ensure hardware supports TSO

• Use latest drivers and firmware for your NIC on the host 

For multi-tier VM applications, use DRS affinity rules to keep VMs on same host

• Same vSwitch / VLAN, rules out physical network 

If using Jumbo Frames, ensure it is enabled end-to-end

Network – Best Practices



Use the vmxnet3 virtual adapter

• Less CPU overhead • 10 Gbps connection to vSwitch 

Use the latest driver/firmware for the NICs on the host



Use network shares

• Requires Virtual Distributed Switch 4.1



Isolate vMotion and iSCSI traffic from regular VM traffic

• Separate vSwitches with dedicated NIC(s) • Most applicable with Gigabit NICs 52

How to measure the network?



scp from/to ESXi host is not valid check!



With scp we will involve underlying storage on source and destination VM/host



CPU can affect the test, scp will encrypt/decrypt the network flow



Copy to ESXi host can give false result as the management interface has very limited resources

How to check network performance?



VM – VM on same ESXi host. This will exclude physical network problems



VM –VM on different ESXi host. This will involve physical NICs and switch as well



Physical – VM. Will also test physical devices but we can focus on one VM



Physical – Physical: this will give us some number about what to expect



Use iperf/jperf/netperf. Free tool for network test

Iperf



Windows and Linux version



Will not use storage



We can use different option for test (UDP/TCP)



Automatically calculates bandwith

In conclusion…

Key Takeaways – Performance Best Practices



Understand your environment

• Hardware, storage, networking • VMs & applications 

Advanced configuration values do not need to be tweaked or modified

• In almost all situations 

Use fully automated DRS



Use Paravirtual hardware

Important Links

Performance best practices

Transcript Performance best practices

vSphere Performance Best Practices

BENCHMARKING & TOOLS

MEMORY

CPU

STORAGE

NETWORK

In conclusion…

Directory