Friday, 22 September 2017

A PRTG Sensor for the Pure Storage FlashArray






 


Why A PRTG Sensor?


I was in a meeting with a good customer of Pure Storage when I noticed that their operational panel on the floor was showing a PRTG dashboard. The PRTG Network Monitor tool from Paessler is a very nice agentless monitoring solution that is able to utilise a wide range of methods to monitor any accessible system or application on your network. It is a Windows based solution but supports all required methods to gain remote insights into your services such as SNMP, SSH, Scripts, REST, WMI, packaet sniffing, etc.


I am familiar with PRTG as I use it at home to monitor my home network and various components. This is possible due to the fact that you can run 100 sensors on a free license. The wonderful world of home automation and online media have made my network as critical to my household as it is for corporatations own services. Plus it is also fun to work with solutions like this, not to mention that it provides some interesting insights into your home network when combined with a Ubiquiti setup. 

Anyway back to the task in hand, I thought that looking at how we could monitor Pure FlashArray's with PRTG could both help me to familiarise myself with the Purity API. It turns out we do have a couple of options at the Pure OpenSource community Pure/Code within the  Python and PowerShell script packs written by some clever dudes in Pure (nod to Barkz). The only consideration for these was that they are focused on the array level performance metrics, where I wanted to see what else we can keep an eye on. Also, as a new Pure employee, as stated prior, it also gives me the chance to work with the Restful API of Purity.

I can say that writing this sensor was a bit of fun and the Restful API of Purity has to be one of the most logical I have worked with. Very simple in structure, versioned to allow for supportability across a range of Purity versions, and a rich capability around those supporting actions such as filtering. It took no time at all to get comfortable with the structure, limits were only my own and those discovered within PRTG.








So What Does It Do?

I wanted to provide a good overview of the current operational health of the arrays monitored while ensuring that the reporting was not overly complicated. As a first release my focus is on array level conditions covering:
  • Array capacity status
  • Array performance metrics
  • General hardware status of Controllers, Shelves and Chassis
  • General drive health
Array object sensor summary page
I have plans to add further granularity to cover 'volume performance', 'host performance' and 'protection compliance' in upcoming releases, but with the current method there is a limit of 50 channels to one sensor so still determining the best way to meet the number of instances required to monitor at that level and how to cater for lifecycle of objects that are subjected to CRUD operations (well create, update, delete anyway.

With the current script, the sensor reads the metrics from the array via a PowerShell script that directly queries each arrays Restful API. Although there is only a single script, it supports the multiple sensors through the 'Scope' parameter provided within the arguments section of each sensor (more details in installation instructions).

Each sensor can contain one or more channels, a channel can be thought of as a single value aligned to the sensor such as 'capacity consumed' within the 'capacity' sensor. Then as appropriate limits are assigned to the channel that define the thresholds that a channels value goes either above (maxlimit) or under (minlimit) to raise a warning or error condition for the sensor. Some channels have explicit limits set such as those related to capacity and health but others, such as the performance metrics, are left to self determine when an exceptional condition occurs based on operational patterns already recorded. For example latency may stay steady under a millisecond for normal operations but if suddenly it increases to 20 milliseconds for multiple recordings, PRTG will determine this to be an exception and raise a warning condition.

Array capacity sensor

Because each channel needs to provide a value that supports numeric comparisons to determine if limits are met, textual values returned from the array had to be converted to a representative numeric value. The textual values such as for Hardware components status to be reported as 'healthy', is enabled through the use of lookup tables included in the pack. Currently there are two included which provide lookup values for the channels in the 'Drive Health' and 'Hardware Health' sensor status values

Array Hardware Sensor
Where the Array performance and capacity sensors provide a collection of metrics related to the sensors scope of the whole array, the health sensor for hardware provides summary graphs for each major component (Chassis, Controller, Shelf) and for the Drive Health sensor, a status for each drive.

Hardware Health Sensor


For the Hardware Health sensor, summary values were required as reporting each individual hardware component that can be monitored exceeds the 50 channel to a sensor limit (not enforced but supportability and reliability can be compromised). I open to looking at alternative methods that can allow for all components to be individually reported in the future.


Drive Health Sensor


Installation


Requirements

  • First, get the scripts and associated lookup files from GitHub at Pure Storage Sensor Module. This will provide a Zip file of the required files but you can also download it directly from the GHitHub site  at 'https://github.com/davlloyd/purestorage-prtg' 
    • This set contains 4 files including:
      • Get-PureFA-Sensor.ps1 - The main PowerShell script
      • prtg.standardlookups.purestorage.drivestatus.ovl - Drive status lookup file
      • prtg.standardlookups.purestorage.hardwarestatus.ovl - Hardware status lookup file
      • readme.md - Markdown file with Installation instructions 
    • The Powershell script and lookup files need to be copied onto the PRTG server and then associated sensors created accordingly.
  • Ensure the PRTG server (or PRTG probe if running sensor from a probe)is running PowerShell 4.0

Script Installation

  1. Copy the script Get-PureFA-Sensor.ps1 to the directory 'C:\Program Files (x86)\PRTG Network Monitor\Custom Sensors\EXEXML' 
  2. Copy the two lookup files prtg.standardlookups.purestorage.drivestatus.ovl and prtg.standardlookups.purestorage.hardwarestatus.ovl to the directory 'C:\Program Files (x86)\PRTG Network Monitor\lookups\custom'
  3. Restart the Windows service 'PRTG Core Server Service'. This step is required to have PRTG read in the new custom lookup files

Sensor Creation


  1. Create a new device in PRTG with the address (IP or FQDN of the FlashArray you want to monitor
  2. Now, select <Add Sensor>. On the search field, type "Script Advanced" and then select the sensor type <EXE/Script Advanced Sensor> from the result list. 
  3. On the Add Sensor to Device screen, enter the following: 
    1. The sensor's name and tags (optional): There are currently four sensors to create. I have been naming them as follows:
      1. Capacity
      2. Performance
      3. Drive Health
      4. Hardware Health
    2. Under Exe/Script, use the Drop-down to select the script 'Get-PureFA-Sensor.ps1' from the list.
    3. Set the parameters for Array access and sensor scope as follows
      1. [ArrayAddress] (use the PRTG variable '%host' to inherit from device entry
      2. Either of the following combinations for security authentication:
        1. [username] and [password]  - This is ok for testing purposes but as password is stored in clear text not recommended for production. This will generate or read the APIKey for the account
        2. [apikey] - preferred access method. The APIKey is generated from within the Purity Console for the preferred account. You do not need to enter account details if specifying an APIKey
      3. [Scope] to set what is monitored for this sensor. Scope option values are:
        • capacity
        • performance
        • hardware
        • drive
      4. By default the sensors will be run every 60 seconds, adjust accordingly. You can get a view of time for execution by running the scripts in debug mode but ensure that it is disabled when running the script as a sensor.
-arrayaddress '%host' -scope 'hardware' -apikey '3bdf3b60-f0c0-fa8a-83c1-b794ba8f562c'
Sensor Setup

You are now ready to go. For a look at how to set a sensor up watch the video on this blog.



Sensor Screen

Performance Chart