COM Structured Storage in .NET
Written by Harry Fairhead   
Wednesday, 03 February 2010
Article Index
COM Structured Storage in .NET
Opening structured storage
Using IStorage
Reading a stream
Reading JPEG data

Structured Storage is used in many applications to create compound documents. Although .NET doesn't offer any way of working with it, it can be done with a little help.



.NET rules but the old technologies linger on!

Take for example structured storage AKA OLE 2 compound documents. This is a COM technology that essentially lets you store the equivalent of a directory structure within a single file.

Under operating systems that support FAT filing systems the software simulates multiple streams which are available under NTFS. You can use multiple streams in NTFS without bothering with the complexities of structured storage and given that structured storage is a COM based technology you might be thinking why bother with it at all?

The answer is that this technology is so deeply embedded within Windows and Windows applications that you might not be able to avoid it. For example, Office documents (.doc) are stored in OLE compound file format and, despite the fact that the latest versions of Office have an "open" XML file format, compound documents are going to be a nuisance for some time to come.

HTML help is also stored using structured storage – so if you want to read .CHM files you need to master structured storage. Another example is the Thumbs.db file that Windows generates to store all of the thumbnail images used to display graphics files in a directory. It might well end in "db" but it isn't a standard database file – it's structured storage.

There are no structured storage classes within the .NET framework and so to work with it you'll have to implement your own, and this means dealing with several COM Interfaces. Initially this looks very difficult but, with some help, it begins to seem more manageable.


Access in the information in Thumbs.db provides an excellent example of working with structured storage. In most cases you are going to want to read and extract the information within a structured storage file and reading the Thumbs.db file demonstrates just how to do this.

It also is an excellent example of generally how to work with COM interfaces in C#. As well as Interfaces there are also a lot of "helper" functions which can be called using pInvoke.

For example, before you start processing a file as if it was structured storage you should use the StgIsStorageFile function to test that it really is. The definition of this function is simply:

static extern int StgIsStorageFile(
string pwcsName);

To use it to test if a file is structured storage you would use:
string file = @"path\Thumbs.db";
int result = StgIsStorageFile(file);

The result is 0 if the file is a structured storage file, 1 if it isn't and 0x80030006L if the file isn't found.

It should return a 0 if you specify a Thumbs.db file that exists. Remember that Thumbs.db files are system files and hidden unless you use the Tools,FolderOptions command to make them visible. You also have to deselect the "Do not cache thumbnails" option to make sure that Windows generates the Thumbs.db file in that directory.

PInvoke the easy way

This is easy enough but it is even easier if you know about PINVOKE.NET – a Wiki dedicated to providing a complete library of Dllimport, interface, constant and struct definitions needed to work with the Windows API from a managed environment.

All you have to do is enter the name of the API function. Interface, constant or struct you are interested in and you will most often be presented with a ready-made definition.

Sometimes the definitions are clearly well used and tested, as evidenced by the availability of examples and user comments. Other, less well trodden paths, are also evident with just brief definitions and a plea to add some sample code and notes. In these areas you do have to read and check that the definitions make sense – surprisingly they usually do! You might also want to change the way some are defined to fit in with how you want to use them. In short this isn't a complete solution but it saves a huge amount of time and if you do check that something works please do contribute to the Wiki.

It also raises the question of why Microsoft hasn't done as much to make Pinvoke easier to use. In the rest of this article the definitions available at PINVOKE.NET are used and only changes to or specific uses of parts of the definitions will be detailed.

You can also see the entire code by downloading the ZIP file from the CodeBin.





Last Updated ( Thursday, 04 February 2010 )