I was staring at the impossible in early 2004: Two folders named “Reports” in the same place. That’s just not supposed to happen. (Refresh your memory on why with Part I.) After some idql and iapi snooping, here were the facts:
- Both folders did indeed have identical object names without leading or trailing whitespace.
- They did NOT have any values in common between their i_folder_ids, so they weren’t really in the same place. But Desktop Client though otherwise.
- One had r_folder_path values with trailing whitespace in one of its containing folders.
e.g. “/cabinet/folder1 /folder2/folder3” — Hard to eyeball the space after “folder1” unless you’re looking for it!
After that, I couldn’t reproduce the problem. This docbase was well known for having “issues”, so nobody (including Documentum) was surprised that something impossible and inexplicable was happening. I mentally filed it under “Documentum moving in mysterious ways”, gave the business administrators a tedious but effective work-around, and got back to more urgent concerns.
The next year I ran into this problem again from the engineering side instead of the business unit. As a particular project wrapped up or budget ran out, the company passed me around to different groups. I’d worked in engineering back in 2001, and my knowledge of the company’s custom environment made letting me go difficult.
I ended up helping out with the 4.2.8/5.2.5 upgrades by developing a triage procedure and fixing some of the more dire problems. Ever see those pictures of tumorous, sooty lungs the Cancer Society uses to scare people away from cigarettes? That’s what some of these docbases looked like on the inside. Not fun. Not pretty.
After triaging a hundred docbases, I discovered some of them also had corrupt r_folder_paths, although in smaller numbers than the original-problem docbase. Thankfully it didn’t interfere with the upgrade procedure. I made a few more notes before continuing onto my next engineering task, the arduous quest to get Documentum to fix the horribly-broken-under-Sybase content replication process.
Another year passed, I was passed along to another business unit, and a new folder-related problem emerged. Business users (in webtop) reported empty folders and missing documents. The document management group (in desktop client) reported the documents were right where they filed them. If you guessed those empty folders had corrupt r_folder_paths, you were right. Webtop uses a slightly different query to determine folder contents than desktop client and as a result couldn’t “see” the contents of such a corrupt folder.
There Will Be Pain
You may want to take some ibuprofen before continuing …
This particular client as a rule doesn’t allow anybody to talk directly to the Sybase DBAs. Don’t ask why. Healthy docbases require attentive, knowledgeable DBAs working with developers and docbase administrators. I would probably have figured this problem out by year one if I’d had that kind of healthy relationship, but I didn’t.
Some suggested I master Sybase. Frankly, that’s wasted effort since it’s such a small part of Documentum’s market. If this were Oracle or SQL Server, sure. Thing is, the client desperately needs DBAs cross-trained in Documentum if they were going to tame the growing performance and stability problems. Despite getting a few people in management to shake their heads yes when I said this, nothing ever came of it. Sigh.
So I found the next-best thing to a current Sybase DBA, an ex-DBA in my part of the organization. I started asking questions, and he casually mentioned the “obvious” fact that the Sybase API strips all leading and trailing whitespace from VARCHARs passed to it. I was dumbfounded.
Call me a purist, but this is data corruption by design. If I programmatically pass a string to a database, I expect the exact same string back when I ask for it. That’s just common sense–unless you live on Planet Sybase–but it didn’t really explain the problems.
Most of the affected docbases either had custom scripts or Input Accel adding content, and they all had several layers of folders that related to content metadata. This was the second big clue; both methods of adding content had to sometimes create several layers of new folders to file the first document for a particular client, counterparty, or deal. That’s still didn’t explain it all.
The DMCL caches metadata for newly-created, fetched, and modified objects. That includes folders. It turns out that the cache doesn’t automatically refresh from the database after a save. Now we have the three components needed for the impossible to happen. Let’s walk through the process.
- Open iapi and connect to a docbase.
- Create a new folder “B ” whose name has a space at the end (OCR, bad typing, etc.), link it into cabinet “A”, and save it. The cache has object_name as “B ” and r_folder_path as “/A/B “. Sybase stores them as “A” and “/A/B”.
- Create another new folder “C” in the same session, link it to “B”, and save it. It fetches “B”‘s r_folder_path from the cache and generates its own r_folder_path as “/A/B /C”. Note the space after “B”, and also note that sybase doesn’t have anything to trim because the space is within the string.
- Put some stuff in “C” and close this session.
- Navigate down through the new folders in desktop client and “C” has content.
- Navigate down through the new folders in webtop and “C” appears empty.
Both client applications use “r_folder_path” in the query to find a folder’s content. One remembers and builds the path with some whitespace trimming as it goes along, the other uses literal values from r_folder_path. Can you figure out which is which? That difference is the root of the disappearing content mystery.
The duplicate folders mystery arises from the fact that the DMCL won’t remove corrupt r_folder_path entries when relinking a parent folder triggers a cascade update on its children. That junk stayed in the r_folder_path of the moved “Reports” folder but didn’t block scripts from creating another “Reports” folder there since the validation is on i_folder_id. Both end up with the same corrupt path, as would a third after moving the second, and so on.
There you go. A combination of the DMCL cache, Sybase trimming whitespace, a DMCL application building several levels of folders, one of the non-leaf folders having trailing whitespace, and relinking predecessor folders causes data corruption that manifests itself in several different ways. Fighting other fires, Sybase strangeness, and the improbability of it all kept everybody guessing for years.
By the way, I couldn’t retrieve the API code from the client that demonstrates the problem, but savvy readers should be able to reproduce it.
What should developers do about this?
In custom DMCL code, either trim the whitespace yourself before setting the folder’s name or do a revert/refresh on each folder object right after saving it. I seem to recall that was advice often ignored in the early days of DMCL programming.
In Input Accel, if you can’t hack it with VB or something to do the same, it’s time for a dm_job running dm_fix_folders as often as necessary. Running it regularly would be a good thing for everybody to do actually.
I think the DFC didn’t manifest the problem, and that may explain why I couldn’t reproduce the problem with webtop or Desktop Client–that or those apps trim whitespace along with other client-side validations.
What should Documentum and Sybase do about this?
It’s probably unreasonable to expect Sybase to simply stop trimming whitespace since it may break many systems depending on the behavior. They could however add configuration options to turn off trimming at the database and session levels. Then Documentum could provide options for both on their end, maybe in server.ini. Some installations may depend on standard Sybase behavior when sharing or exposing registered tables to other Sybase instances, so a session option would be polite.
Until and unless Sybase gives the docbase server or DBA some way to deactivate the offending behavior, Documentum could do the same things a DMCL programmer would do explicitly in the DMCL. I still can’t stand the idea of an API trimming whitespace, so I’d prefer they force a cache update on a folder object after a save. The representation in the persistent store should take precedence over a cache to assure consistency with other processes that may have pulled a fresh copy of their own.
Does SQL Server have the same problem?
I never got around to testing SQL Server. Being a Sybase derivative, I wonder. Oracle definitely doesn’t have the problem, but I’d appreciate if somebody with a SQL Server docbase would test this out and post a comment.