Shredded Storage is a new feature introduced in SharePoint 2013. It deals with the way that SharePoint stores documents/files in Binary Large Objects (BLOBs) in SQL Server to improve both storage utilization and I/O performance.
Although ‘shredded’ sounds a bit ominous, it means that SharePoint now turns each file into a number of shreds and writes these shreds to SQL as separate BLOBs. It includes an index that keeps track how the various shreds fit together to make sure that a file can be reassembled when it is requested.
Now this in itself will obviously not improve storage or I/O. However, let’s go in a bit more depth by looking at a document library in SharePoint which has versioning enabled.
In SharePoint 2010, updating a file in a version enabled library created a new version record for that file including a new BLOB which held the entire updated file. Updating meta-data information such as title or contact or any other field was also considered to result in a new version. SharePoint would then again create another BLOB for that file even though the file itself was not changed.
So if you had a 1MB file in a versioned library for which the meta-data was changed 9 times, 10 identical BLOBs with a total of 10MB would be stored in SQL, one for the latest version and 9 for the previous versions.
The way shredded storage improves on this process when updating files is by working only with those shreds that actually changed. In case of a versioned library, a new version record is still created with the latest meta-data however the only BLOBs that will be added are the ones that correspond to those shreds that actually changed. The shred-index for this version record which is needed to recreate the full file will be a combination of entries that point to the unchanged shreds of the previous version(s) and the entries that point to the newly added changed shreds. In case of an update that only includes meta-data changes, the number of changed shreds is basically zero which means that no new BLOBs are written to SQL and the shred-index for the new version is the same as the previous one.
As you can imagine this could dramatically reduce storage for versioned document libraries and also improve the I/O performance as only the changed shreds have to be written to SQL. Because of the way Microsoft Office documents are now structured (XML) and the use of a technology called Cobalt (introduced in SharePoint 2010 to make sure that only changes are sent when editing documents on SharePoint directly in Microsoft Office), Microsoft Office documents will benefit even more from the new shredded storage technology.
Another thing you should know about shredded storage is that it can be enabled (default) or disabled on a web application level. The desired setting can be chosen by manipulating the FileOperationSettings property of the WebService member of the web application (http://msdn.microsoft.com/en-us/library/microsoft.sharepoint.administration.spwebservice.fileoperationsettings.aspx).
This property has 3 possible settings:
- UseWebSetting (=0)
- AlwaysDirectToShredded (=1)
- NeverDirectToShredded (=2)
The FileWriteChunkSize property can subsequently be used to change the maximum size that will be used for each shred (default=64kB). (http://msdn.microsoft.com/en-us/library/microsoft.sharepoint.administration.spwebservice.filewritechunksize.aspx)
As the first setting for FileOperationSettings suggests, it should also be possible to control the setting on a per sub site basis. This would be done by manipulating the EnableDirectToShreddedStorage property of the SPWeb class but unfortunately this is currently not (yet) documented at http://msdn.microsoft.com/en-us/library/microsoft.sharepoint.spweb_properties.aspx.
Something else that is worth mentioning is that shredded storage is a per document feature. So if two exactly the same documents are stored in two different libraries, these two documents will still have their own set of shreds which will take up twice the space of each individual document.
The only solution that would actually save on storage for these kinds of situations is to enable Remote BLOB Storage (RBS) using an implementation that supports de-duplication. This way BLOBs are stored outside of SQL on a file system and the RBS layer will make sure that identical BLOBs only take up space once. If shredded storage is enabled in combination with RBS, each shred is stored on the file system as a separate BLOB. If the chosen RBS solution supports de-duplication, identical shreds will only take up space once, even if they belong to different files in SharePoint.
Although shredded storage is enabled by default, it only applies to documents from the first time they are uploaded or changed. For a new site this would mean that all new documents will be stored as shreds. However when you upgrade a site, the already existing files will not be stored as shreds during the upgrade. Only when the existing files are changed, they will be ‘shredded’.
Based on the information discussed so far, the conclusion would be that shredded storage potentially lowers the amount of storage required for storing files that are available in SharePoint. On the subject of I/O performance reduction, Cobalt (introduced in SharePoint 2010) takes care of reducing the amount of data that is sent when users update Microsoft office documents in Office 2010. Shredded storage adds to the I/O reduction by reducing the amount of data that is sent to the SQL server when users update documents in SharePoint.
But what about read I/O performance?
Because shredded storage breaks files in multiple chunks when they are added to SharePoint, reading such a file would be more complex as SharePoint needs to get the correct set of shreds to be able to assemble the original file which should take longer than reading the entire file at once as was the custom in SharePoint 2010.
During the SharePoint Conference 2012 in Las Vegas, Dan Holme (Intelliem) and Jeremy Thake (AvePoint) presented some test results with regards to the impact of shredded storage on read I/O (and data storage). For the test a SharePoint site was used that contained 24GB of files. Tests were done with and without the use of RBS. For the base test, both shredded storage and RBS were disabled. In this case the database size was around 24GB and the RBS storage size was 0GB. Read time for this base test was measured and amounted up to 1477ms. After RBS was turned on, the database size shrunk to a couple of MBs and the RBS storage size was about 23GB. Read times with the added RBS layer increased with approx. 25% to 1882ms.
The next test was done with shredded storage enabled (default chunk size of 64KB) and RBS disabled again. In this case the database shrunk with approx. 75% to 6GB. Read times increased to 2471ms which is an increase of approx. 67% compared to the base situation where shredded storage and RBS were disabled.
After RBS was enabled for this situation, database size shrunk to a some MBs again (although a bit more than with the first test) and RBS storage size was about 6GB. In this case read times again increased to 3502ms which is approx. 86% longer than the test without shredded storage and even 137% longer than the base test (no shreds & no RBS).
As a last step, the same situations were tested with larger chunk sizes (1MB and 1GB). This obviously reduces the number of shreds that need to be gathered for each document which resulted in better read times but they never came down to the 1477ms read time that was the result of the base test.
What the above described tests confirm is that shredded storage can definitely impact the amount of required storage in a positive way. Based on the rest of the story, it should also impact write I/O in a positive way. However with regards to getting data out of SharePoint, there’s an increase in the response times.
So all in all shredded storage is a new feature that could definitely be beneficial to SharePoint but not necessarily at all times so, as with all things, it depends on what you are looking for. For instance a site where most or all of the files are read-only might not be a good candidate for shredded storage. Also a site with lots of small files might not benefit that much. Whatever situation seems best for your particular situation, always make sure that you test it to verify whether you get the expected results.