On a current project we’re integrating Documentum 6.5 and SharePoint 2010 through the EMC product EDRSMS 6.6, which is short for EMC Documentum Repository Services for Microsoft SharePoint. A mouthful, in the project we just call it Repository Services.
As part of the project, we’ve been educating the client project members on how this integration works. In this blog post, we’ll share some of that. That includes thoughts on how to make it work in a global implementation of SharePoint that is connected to a central instance of Documentum.
In itself that is not such a rare situation. Typically, the Documentum content server is implemented after careful considerations in a central location and additional measures can be taken to speed things up on the global hubs. One of them can be using Branch Office Caching Servers (BOCS). The problem is that BOCS can’t be used as part of the Repository Services solution. This doesn’t make sense in the typical usage scenario: SharePoint users work in and with SharePoint and somewhere down deep, data is not stored in a SharePoint Content Database (as part of the SQL server instance or cluster) but cached and partially forwarded into Documentum. More on this later.
With SharePoint the stories varies a little more. Some say it grows organically from a departmental implementation to corporate level. Some start with a central corporate instance and spin out to all regions. The stories about well thought designs on farms, site collections, and content types somehow don’t hit the surface that I see. Maybe I’m looking in the wrong direction. Then again, from talking to colleagues, project members and others in the Documentum-SharePoint playground, it appears as if this is an after-thought that is getting more and more attention. Well the point is that when you start using Repository Services, you better think beforehand!
Why do you have to think carefully about the SharePoint farms when using Repository Services?
Imagine that you need content from SharePoint that has been journaled (meaning: moved) into Documentum and is no longer directly available to SharePoint. Repository Services takes care of retrieving that content from Documentum and making it available to SharePoint. It does so by putting it in a temporary cache. This temporary cache is a location in the file system that only exists at farm level.
Meaning: for the whole farm, there is just one (1).
Now imagine a farm located in New York with also users in Sydney, Bangalore, London and Chicago. They probably don’t care since they’re on the global high performance network between these hubs. Imagine users in Wellington (NZ), Vancouver and Madrid. They might not be such happy users because their content needs to come from down town New York even if that content is only created and used by them.
Of course, more factors apply, but the key is: think about multiple farms when contents needs to travel around the global where users are only working on it in a single country or region.
Why do you have to think carefully about the SharePoint Site Collections when using Repository Services?
There are two simple reasons:
1.The current version of Repository Services is granular up the level of a Site Collection. This means that either a complete Site Collection is under control of Repository Services or not. The effect is that even if you only need the content of one single site to be journaled into Documentum, all content from the Site Collection is moved out of the SharePoint Content Database. The content that you don’t need to be journaled into Documentum is kept in the Performance Cache, a location in the file system.
2.There is only a single Performance Cache for each Site Collection
Again, imagine the global usage scenario as explained before. SharePoint allows you to have multiple Content Databases and with some careful tweaking, you can force a Site Collection to be stored in a particular Content Database. As part of a SQL Server cluster, you could bring the content near the user by creating regional Content Databases.
With Repository Services, the content is not stored in the regional Content Database but in the Performance Cache of the Site Collection, a location in the file system that can be anywhere on the globe.
The key is, that like creating multiple farms, you have to consider using multiple Site Collections and decide which sites – and thus which user population – becomes part of which Site Collection. Creating regional Site Collections, allows you to have a regional Performance Cache and keep the content as close to the user as possible.
Another key consideration: put content that can live on its own, and doesn’t need to be under control of repository Services, in a separate Site Collection.
This may have some implications on the user experience (user have to work with distinct Site Collections) but you also may not be too keen on large volumes of content in the Performance Cache that will never be journaled into Documentum.
It’s not a matter of right or wrong. It’s a matter of taking a well thought decision.
Why do you have to think carefully about the SharePoint content types
By definition (MSDN), a content type is a reusable collection of metadata (columns), workflow, behavior, and other settings for a category of items or documents in a Microsoft SharePoint Foundation 2010 list or document library. Content types enable you to manage the settings for a category of information in a centralized, reusable way.
When working with Repository Services, the metadata bit is what we’re looking for.
This may sound trivial, but way too often, modifications to a document library in SharePoint, are not based on the definition of content types. As part of good practice in Document Management, you first think about the information (attributes, properties, metadata – you name it) you need to know about content as part of your data model, before you think about how it is presented in your presentation layer, in this case a SharePoint document library.
Key is, to think from a document management mindset.
As said, with Repository Services, content is journaled into Documentum. This is done via Journaling rules. These rules select content based on… metadata values! So, even though all content of a Site Collection, is moved into the Performance Cache, only the content that is selected through a Journaling rule, is moved into Documentum. It goes without saying that the more metadata is available to select from, the more granular content can be selected for journaling.
Next to the actual content, a convenience copy of the metadata in XML format is also stored in Documentum. Although this is just a copy (changes are not propagated back into SharePoint), it can be used for further actions inside Documentum. Think of a scenario where you need to put content under Records Management. The basic metadata that resides as attributes on the Documentum doctype for this journaled content is limited and maybe not enough to make the proper RM decisions.
Key is to consider the usage of the content way beyond it has left SharePoint and reflect those requirements in the SharePoint content type.
In summary.
Using Repository Services to bridge the SharePoint and Documentum worlds can give you great user experience (SharePoint) and great document management (Documentum). It can also give you headaches if you forget about document management core values and information architecture design.
And if you have such a headache… there is aspirin available.
Ed Steenhoek
ECM Solution Principal