I’ve been away from this blog for a while, busy on projects for clients.
I learned something on one of these projects that I thought was worth sharing.
In a nutshell
Importing BIG files to or exporting them from Documentum is a challenge, but you can get around the out-of-the-box limits
Here is what happened
I was asked by a client with an existing Documentum system to help them with document import/export. They were unhappy with the solution that the previous contractor had built, using Taskspace and UCF. They complained that import often failed. They also wanted to add the ability for external systems to automatically import and export documents.
I asked about the kinds of documents they are storing and they turned out to be somewhat a-typical for a Documentum system. I my experience most Documentum systems are filled with documents of kilobytes to megabytes in size, with 1Gb being considered very big. For my customer, most files were between 10 and 50 Gb, with some as big as 500 Gb. That’s BIG.
Documentum has no problem storing files of that size. The challenge is in getting the files from the client to the server and back.
Since they asking for import/export functionality for interactive clients as well as back-end integration with other systems, I proposed to create a webservice using the Documentum DFS (webservices framework).
Now DFS has several options for content transfer:
- BASE64: This will include the content as part of the reply message to the webservice client. This is the easiest, but also the most restrictive. Only advisable for very small content files.
- UCF: This is Documentum’s proprietary content transfer method. It has many cool features for xml files and virtual documents and such, but it had proven unreliable in my customer’s environment with the BIG files they have
- MTOM: The Message Transmission Optimization Mechanism is a W3C standard especially meant to reliably send binary data in SOAP webservice calls.
MTOM looked promising but I had run into boundaries using MTOM for big files in a previous project. When exporting several big files simultaneously, the App Server running the web services would run into Java memory issues. That previous project had considered 10Mb big, so we were sure to run into the same boundaries here.
I solved this by cutting the content transfer up in pieces.
Exporting a file now goes like this:
- The web service client starts an export by specifying which file it wants to receive. The web service returns an export token (a unique ID for this export request).
- The webservice client the calls the web service again, supplying the token and the maximum number of bytes he wishes to receive (the default being 1Mb). The web service returns part of the content file using MTOM.
- The web service client keeps calling until the full content file is transferred.
This very simple protocol turned out to work like a charm, even when simultaneously transferring files of many Gb . We did advice the client to use a separate DFS server machine, so the Documentum content server is not congested with all the disk- and network traffic the big files are causing and TaskSpace can keep running smoothly for the users.
For the interactive clients we did one more trick so they can use the new export/import webservice.
normally you would have a component on the TaskSpace application server that acts as a web service client, but that would mean that the content would be sent to the application server and the application server would then send it to the user’s browser. That would mean that the big files are sent over the network twice, causing unnecessary delays.
Documentum has a feature called Accelerated Content Services (ACS), but we could not use that in this project.We did find a way to get the content from the DFS server directly to the user’s browser:
Let me know what you think