How to build a rocket accelerator for PowerCLI scripts 

Sooner or later, any VMware system administrator gets to automate routine tasks. It all starts from the command line, then comes PowerShell or VMware PowerCLI.

Suppose you’ve mastered PowerShell a bit further than starting ISE and using standard cmdlets from modules that work with some kind of magic. When you start counting hundreds of virtual machines, you will find that scripts that helped out on small scales work noticeably slower on large ones. 

In this situation, 2 tools will help out:

  • PowerShell Runspaces - an approach that allows you to parallelize the execution of processes in separate threads; 
  • Get-View is a basic PowerCLI function, an analogue of Get-WMIObject on Windows. This cmdlet does not drag objects related to it, but receives information in the form of a simple object with simple data types. In many cases, it comes out faster.

Next, I will briefly talk about each tool and show examples of use. We will analyze specific scripts and see when one works best, when the second. Go!



First Stage: Runspace


So, Runspace is designed for parallel processing of tasks outside the main module. Of course, you can start another process that will eat some memory, processor, etc. If your script runs in a couple of minutes and spends a gigabyte of memory, you probably will not need Runspace. But for scripts on tens of thousands of objects it is needed.
You can start the development from here: 
Beginning Use of PowerShell Runspaces: Part 1

What gives use of Runspace:

  • speed by limiting the list of executable commands,
  • parallel tasks
  • safety.

Here is an example from the internet when Runspace helps:
« – , vSphere. vCenter , . , PowerShell.
, VMware vCenter .  
PowerShell runspaces, ESXi Runspace . PowerShell , , ».

: How to Show Virtual Machine I/O on an ESXi Dashboard

In the case below, Runspace is no longer in business:
“I'm trying to write a script that collects a lot of data from the VM and, if necessary, writes new data. The problem is that there are a lot of VMs, and it takes 5-8 seconds for one machine. ” 

Source: Multithreading PowerCLI with RunspacePool

Get-View is needed here, let's move on to it. 

Second stage: Get-View


To understand the usefulness of Get-View, remember how cmdlets work in general. 

Cmdlets are needed to conveniently obtain information without having to study API reference books and reinvent the wheel. What in the old days was written in a hundred or two lines of code, PowerShell allows you to do one command. For this convenience we pay speed. Inside the cmdlets themselves, there is no magic: the same script, but of a lower level, written by the skilled hands of a master from sunny India.

Now, for comparison with Get-View, take the Get-VM cmdlet: it accesses the virtual machine and returns a composite object, that is, attaches other related objects to it: VMHost, Datastore, etc.  

Get-View in its place does not screw anything extra into the returned object. Moreover, it allows you to rigidly indicate exactly what information we need, which will facilitate the object at the output. In Windows Server in general, and in Hyper-V in particular, the Get-WMIObject cmdlet is a direct analogue - the idea is exactly the same.

Get-View is inconvenient in routine operations on point features. But when it comes to thousands and tens of thousands of objects, he has no price.
Read more on the VMware Blog: Introduction to Get-View

Now I’ll show everything on a real case. 

We write a script for unloading a VM


Once my colleague asked me to optimize his script. The task is a normal routine: find all VMs with a duplicate cloud.uuid parameter (yes, this is possible when cloning a VM in vCloud Director). 

The obvious solution that comes to mind:

  1. Get a list of all VMs.
  2. Somehow parse the list.

The original version was such a simple script:

function Get-CloudUUID1 {
   #    
   $vms = Get-VM
   $report = @()

   #   ,     2 :    Cloud UUID.
   #     PS-   VM  UUID
   foreach ($vm in $vms)
   {
       $table = "" | select VM,UUID

       $table.VM = $vm.name
       $table.UUID = ($vm | Get-AdvancedSetting -Name cloud.uuid).Value
          
       $report += $table
   }
#   
   $report
}
#     

Everything is extremely simple and clear. It is written in a couple of minutes with a coffee break. Screw on the filter, and it's done.

But measure the time:





2 minutes 47 seconds when processing almost 10k VM. A bonus is the lack of filters and the need to manually sort the result. Obviously, the script is asking for optimization.

Ransepses are the first to come to the rescue when you need to get host metrics with vCenter at one time or you need to process tens of thousands of objects. Let's see what this approach will give.

We turn on the first speed: PowerShell Runspaces

The first thing that comes to mind for this script is to execute the loop not in series, but in parallel threads, collect all the data in one object and filter it. 

But there is a problem: PowerCLI will not allow us to open many independent sessions to vCenter and will throw a funny error:

You have modified the global:DefaultVIServer and global:DefaultVIServers system variables. This is not allowed. Please reset them to $null and reconnect to the vSphere server.

To solve it, you must first pass session information into the stream. We recall that PowerShell works with objects that can be passed as a parameter to at least a function, at least to ScriptBlock. Let's pass the session as such an object bypassing $ global: DefaultVIServers (Connect-VIServer with the -NotDefault key):

$ConnectionString = @()
foreach ($vCenter in $vCenters)
   {
       try {
           $ConnectionString += Connect-VIServer -Server $vCenter -Credential $Credential -NotDefault -AllLinked -Force -ErrorAction stop -WarningAction SilentlyContinue -ErrorVariable er
       }
       catch {
           if ($er.Message -like "*not part of a linked mode*")
           {
               try {
                   $ConnectionString += Connect-VIServer -Server $vCenter -Credential $Credential -NotDefault -Force -ErrorAction stop -WarningAction SilentlyContinue -ErrorVariable er
               }
               catch {
                   throw $_
               }
              
           }
           else {
               throw $_
           }
       }
   }

Now we implement multithreading through Runspace Pools.  

The algorithm is as follows:

  1. Get a list of all VMs.
  2. In parallel threads we get cloud.uuid.
  3. We collect data from streams into one object.
  4. We filter the object through grouping by the value of the CloudUUID field: those where the number of unique values ​​is more than 1, and there are the desired VMs.

As a result, we get the script:


function Get-VMCloudUUID {
   param (
       [string[]]
       [ValidateNotNullOrEmpty()]
       $vCenters = @(),
       [int]$MaxThreads,
       [System.Management.Automation.PSCredential]
       [System.Management.Automation.Credential()]
       $Credential
   )

   $ConnectionString = @()

   #     
   foreach ($vCenter in $vCenters)
   {
       try {
           $ConnectionString += Connect-VIServer -Server $vCenter -Credential $Credential -NotDefault -AllLinked -Force -ErrorAction stop -WarningAction SilentlyContinue -ErrorVariable er
       }
       catch {
           if ($er.Message -like "*not part of a linked mode*")
           {
               try {
                   $ConnectionString += Connect-VIServer -Server $vCenter -Credential $Credential -NotDefault -Force -ErrorAction stop -WarningAction SilentlyContinue -ErrorVariable er
               }
               catch {
                   throw $_
               }
              
           }
           else {
               throw $_
           }
       }
   }

   #    
   $Global:AllVMs = Get-VM -Server $ConnectionString

   # !
   $ISS = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
   $RunspacePool = [runspacefactory]::CreateRunspacePool(1, $MaxThreads, $ISS, $Host)
   $RunspacePool.ApartmentState = "MTA"
   $RunspacePool.Open()
   $Jobs = @()

# ScriptBlock  !)))
#      
   $scriptblock = {
       Param (
       $ConnectionString,
       $VM
       )

       $Data = $VM | Get-AdvancedSetting -Name Cloud.uuid -Server $ConnectionString | Select-Object @{N="VMName";E={$_.Entity.Name}},@{N="CloudUUID";E={$_.Value}},@{N="PowerState";E={$_.Entity.PowerState}}

       return $Data
   }
#  

   foreach($VM in $AllVMs)
   {
       $PowershellThread = [PowerShell]::Create()
#  
       $null = $PowershellThread.AddScript($scriptblock)
#  ,      
       $null = $PowershellThread.AddArgument($ConnectionString)
       $null = $PowershellThread.AddArgument($VM)
       $PowershellThread.RunspacePool = $RunspacePool
       $Handle = $PowershellThread.BeginInvoke()
       $Job = "" | Select-Object Handle, Thread, object
       $Job.Handle = $Handle
       $Job.Thread = $PowershellThread
       $Job.Object = $VM.ToString()
       $Jobs += $Job
   }

#  ,     
#      
   While (@($Jobs | Where-Object {$_.Handle -ne $Null}).count -gt 0)
   {
       $Remaining = "$($($Jobs | Where-Object {$_.Handle.IsCompleted -eq $False}).object)"

       If ($Remaining.Length -gt 60) {
           $Remaining = $Remaining.Substring(0,60) + "..."
       }

       Write-Progress -Activity "Waiting for Jobs - $($MaxThreads - $($RunspacePool.GetAvailableRunspaces())) of $MaxThreads threads running" -PercentComplete (($Jobs.count - $($($Jobs | Where-Object {$_.Handle.IsCompleted -eq $False}).count)) / $Jobs.Count * 100) -Status "$(@($($Jobs | Where-Object {$_.Handle.IsCompleted -eq $False})).count) remaining - $remaining"

       ForEach ($Job in $($Jobs | Where-Object {$_.Handle.IsCompleted -eq $True})){
           $Job.Thread.EndInvoke($Job.Handle)     
           $Job.Thread.Dispose()
           $Job.Thread = $Null
           $Job.Handle = $Null
       }
   }

   $RunspacePool.Close() | Out-Null
   $RunspacePool.Dispose() | Out-Null
}


function Get-CloudUUID2
{
   [CmdletBinding()]
   param(
   [string[]]
   [ValidateNotNullOrEmpty()]
   $vCenters = @(),
   [int]$MaxThreads = 50,
   [System.Management.Automation.PSCredential]
   [System.Management.Automation.Credential()]
   $Credential)

   if(!$Credential)
   {
       $Credential = Get-Credential -Message "Please enter vCenter credentials."
   }

   #   Get-VMCloudUUID,    
   $AllCloudVMs = Get-VMCloudUUID -vCenters $vCenters -MaxThreads $MaxThreads -Credential $Credential
   $Result = $AllCloudVMs | Sort-Object Value | Group-Object -Property CloudUUID | Where-Object -FilterScript {$_.Count -gt 1} | Select-Object -ExpandProperty Group
   $Result
}

The beauty of this script is that it can be used in other similar cases, simply replacing the ScriptBlock and the parameters that will be transferred to the stream. Exploit it!

We measure time:



55 seconds. Already better, but still faster. 

We pass to the second speed: GetView

We find out what is wrong.
First and obvious: the Get-VM cmdlet takes a long time to complete.
Second: the Get-AdvancedOptions cmdlet runs even longer.
First, let's deal with the second. 

Get-AdvancedOptions is convenient on individual VM objects, but very slow when working with many objects. We can get the same information from the virtual machine object itself (Get-VM). It's just that it is well buried in the ExtensionData object. Armed with filtering, we accelerate the process of obtaining the necessary data.

With a flick of the wrist this is:


VM | Get-AdvancedSetting -Name Cloud.uuid -Server $ConnectionString | Select-Object @{N="VMName";E={$_.Entity.Name}},@{N="CloudUUID";E={$_.Value}},@{N="PowerState";E={$_.Entity.PowerState}}

Turns into this:


$VM | Where-Object {($_.ExtensionData.Config.ExtraConfig | Where-Object {$_.key -eq "cloud.uuid"}).Value -ne $null} | Select-Object @{N="VMName";E={$_.Name}},@{N="CloudUUID";E={($_.ExtensionData.Config.ExtraConfig | Where-Object {$_.key -eq "cloud.uuid"}).Value}},@{N="PowerState";E={$_.summary.runtime.powerstate}}

The conclusion is the same as Get-AdvancedOptions, but it works many times faster. 

Now to Get-VM. It is not executed quickly, since it deals with complex objects. A logical question arises: why in this case do we need extra information and a monstrous PSObject when we just need the name of the VM, its state and value of the tricky attribute?  

In addition, the brake in the face of Get-AdvancedOptions has left the script. The use of Runspace Pools now seems overkill, since there is no longer any need to parallelize a slow task in streams with squats when transferring a session. The tool is good, but not for this case. 

We look at the output of ExtensionData: it is nothing but a Get-View object. 

Let's call the ancient technique of the PowerShell masters: one line using filters, sorts and groupings. All previous horror elegantly collapses into one line and is executed in one session:


$AllVMs = Get-View -viewtype VirtualMachine -Property Name,Config.ExtraConfig,summary.runtime.powerstate | Where-Object {($_.Config.ExtraConfig | Where-Object {$_.key -eq "cloud.uuid"}).Value -ne $null} | Select-Object @{N="VMName";E={$_.Name}},@{N="CloudUUID";E={($_.Config.ExtraConfig | Where-Object {$_.key -eq "cloud.uuid"}).Value}},@{N="PowerState";E={$_.summary.runtime.powerstate}} | Sort-Object CloudUUID | Group-Object -Property CloudUUID | Where-Object -FilterScript {$_.Count -gt 1} | Select-Object -ExpandProperty Group

We measure time:



9 seconds for almost 10k objects with filtering according to the desired condition. Fine!

Instead of a conclusion An

acceptable result directly depends on the choice of tool. It is often difficult to say for sure what exactly should be chosen to achieve it. Each of the listed script acceleration methods is good within the limits of its applicability. I hope this article will help you in the difficult task of understanding the basics of process automation and their optimization in your infrastructure.

PS: The author thanks all the members of the commune for their help and support in preparing the article. Even those with paws. And even who has no legs, like a boa constrictor.

All Articles