With consolidation and virtualization of infrastructure, many benefits can be achieved; however they should not require increased management or complex processes to maintain through the life of the asset and solution.  One key to this is automation – and notably, the ability for tasks normally requiring administrative intervention or complex policy/script writing to be automated and initiated on-demand or scheduled without user analysis and execution.


Today’s storage arrays come with a wide array of software management capabilities that allow for intelligent, automated provisioning capabilities which traditionally needed to be manually performed:  RAID Parity, Striping, LUN creation, device discovery, etc.   However the most notable ones in todays generation of storage solutions are wide striping, thin provisioning, and dynamic data placement/caching.

Wide Striping places data within a LUN (Volume, Filesystem) across as many disks as possible.  Sometimes this is limited to a Pool of disks, other times it is across the entire set of disks.  The objective is to create as much parallel IO as possible across as many devices as makes sense to drive IO rates up across all co-located data.  In some cases, this capability is tied to a defined set of disks, grouped together in a pool or aggregate.  This can be good for manageability or bad, depending on your perspective, and your particular storage vendor will have their own opinions they’re more than happy to share.  Typically you will hear wide striping talked about at the “array”, “pool”, or “aggregate” level, depending on a particular vendors implementation.

Generally, wide striping is enabled hand-in-hand with Thin Provisioning, another feature commonly available at the same array, pool, or aggregate level.  Thin provisioning allows storage to be allocated across a set of disks as needed, not preallocated ahead of time, greatly improving storage efficiency and enabling administrators to provision for the life of the asset, not initial requirements – saving future administration activities and related impacts to application availability, reducing overtime to implement a change window activity.

Thin Provisioning implies a certain amount of block level virtualization, which is why wide striping is discussed at the same time – however other functions that work at the block level can also be implemented since the logic to virtualize a block’s location has already been implemented.  These can include encryption, compression, and data deduplication, features that generally improve security and storage efficiency.  Notably, block level deduplication of shared blocks can be of great benefit in environments when those blocks end up being served from cache frequently – it means less cache is required to improve application performance for a large number of workloads, without creating IO spikes.  This is very noticeable in such applications as Virtual Desktops, particularly boot and anti-virus storms which tend to occur with regularity.

Dynamic Data Placement is a capability that enables you to put data on the right tier at the right time – at the block (or sub-LUN, in some cases) level.  One of the issues that storage tiering has is that the granularity is at the Volume level.  Sometimes it is not possible to determine what tier the data should be on prior to deployment, and not easily separated into multiple Volumes in different tiers should only part of a workload require higher performance (and thus more costly) storage to perform well.  Dynamic Data Placement lets the intelligence in the storage array figure this out based on real-time statistics gathering, and on a scheduled or dynamic basis will move data to the appropriate tier.

This is typically implemented, again, at the array, pool, or aggregate level, and done across multiple tiers – SSD, SAS/FC, and SATA – generally with different and sometimes multiple RAID types for each (although this may not always be supported).

Associated with this type of data movement is Dynamic Data Caching.  Rather than actually move a block’s location between tiers, it is instead cached in FLASH or SSD.  The benefit to this approach is that an array’s read cache becomes much larger, and works well for larger working sets (such as virtualization), without actually having to relocate all data onto expensive SSD.  As data ages in the cache, it will fall out and be replaced by more active data automatically.  The negative is that you can’t pin data in cache, like you could if it was placed on SSD, and that this performance improvement is generally targeted at READ data only – write speed is not improved.  Notably, if this is combined with a storage solution that caches writes first, than de-stages in full stripes to new (virtualized) locations on disk (i.e., RAID3 or RAID4), write performance will generally still be very good when there are a large number of drives in the RAID group (or pool, or aggregate).  The typical write penalty for RAID is eliminated due to the fact that all writes normally go to new disk locations, eliminating the read-calcualate-write data/parity activity entirely.

Having gotten this far, you must be asking yourself – does my storage array have any of these features?  Well, maybe.  But probably not, unless you recently purchased it, but whether the features are licensed or in use is another matter altogether.  All too often features get marketed but never sold or properly implemented; sometimes due to budgeting, but more often they just don’t work as advertised and get turned off prior to deployment into production.

If you find yourself in this situation, or are looking at options to replace, upgrade, augment, or properly deploy what you already have, give us a call.  Sometimes it’s just a matter of better using what you already own – or replacing it when it’s lifecycle ends, with something capable of delivering the value you’re looking for in a consolidated, virtualized storage infrastructure to support your application and business services.

Michael Traves


Technorati Tags: , , , , ,