Word to XML, Then and Now

The Scream by Edvard Munch
The Scream by Edvard Munch

I was lucky that last month’s XML Philly meeting didn’t trigger my post-traumatic stress syndrome. Quark’s presentation on their XML Author product took me back to the front lines, having done something similar with Word and SGML over a dozen years ago.  Quark says it always produces valid XML for any schema.  I can testify that it’s no small feat if true:  Although Word now produces XML directly, it’s a generic schema that represents formatting, not semantics.  Wasn’t this the schema Microsoft wanted to patent as a part of their contribution to “Open Standards”?  Anyway, this is still a hard problem with no obvious solution.

Their secret is that the plug-in completely replaces the implementation of the Word data model.  XML is always valid because users are always working in XML; there is no messy conversion between the flat, unstructured Word model and the deep, structured XML model.  What XML Author gets from Word is the familiar GUI and a clear list of features to support, like Track Changes.  In theory, this gets around several common XML acceptance problems:  Users don’t have to learn a new interface, and business owners don’t have to pay for two separate word processors on everybody’s desktop.

Both justifications fall apart under closer scrutiny. Authoring XML changes how users work due to structural requirements; in particular, cut-copy-paste between vanilla Word or different schemas requires skill and patience because of the always-on validation.  Although users won’t have extra icons on their desktops, the business will have to cough up significant licensing fees that will feel like having two separate, high-end products installed.  Quark was also pushing their professional services for getting things up and running–both an added cost and an indication that things aren’t as simple as they seem.

Then there’s the question that always comes up at these meetings:  What if you share XML documents with people outside your company? There might be something webbie in the future, but for now let’s not even go there.

We didn’t get a live demo of the product, and an acquaintance who evaluated it warns that it’s not ready for prime time if your business depends on complex XML or heavy-duty Word features.  I would also be wary of the product constantly lagging behind Word features because it is essentially a reverse-engineered product, and it’s an acquisition that Quark’s still trying to fit into its existing product line.  Still, it’s easier than trying to mimic, maintain, and synchronize XML structures in actual Word documents.  I have the scars to prove it.

The Origin of Species of Information

Happy Birthday Chuck!  You've given me so much!    Happy Birthday Chuck! You’ve given us so much!

Last night’s Philadelphia XML Users Group was a pleasant mix of the old and the new: Jim Caine of Jaquette Consulting revisited an earlier talk on content reuse that touched on DITA and Documentum among other things.

Named for today’s birthday boy, the Darwin Information Typing Architecture (DITA) is a simple XML application (in the xml sense) that models information around authoring units like topics and references instead of publishing like documents and books.  It’s meant to be extensible (in the OO sense) rather than definitive.  Somehow DITA never crossed my path until a few months ago, but it represents another step towards the Grail of structured authoring/publishing that I worked on 15 years ago.

Jim’s project involved moving an insurance institute’s learning resources into a single repository and allowing them to create a variety of products (real books, eLearning, flash cards, etc.) from the same content.  The project started last year; Jim first presented on the project back then and gave the group a look at how practice deviated from theory.  He did some really smart things to facilitate reuse like referencing XML wrappers for external entities like images. This allows reuse of data and the metadata.  Kudos to WordPress for a similar albeit not XML approach to images and galleries. I’ll post a link to his presentation when it hits the web.

Turns out that authoring structured content is still the hard part.  The original plan involved a Word plug-in to allow authors to create valid structured content at the very beginning.  This good idea hit some bumps because of vendor support issues and was the hardest conceptual change to make in the whole process.  Authors used to writing a single document now wrote up to a dozen separate learning objects, a subtype of topic.  Deja vu all over again.

A very few actors in the content creation process have a very lively editorial cycle.  We’re talking major rewrites, not “you missed a comma here” kinds of things.   This wasn’t a problem back on RDMS: We dealt more with multiple authors and a review process than the more traditional author/editor interaction going on here.  Even in legal review and approval, I’m used to all actors being subject matter experts, often getting more experty the further along in the lifecycle you go.  Not so in this case–and publishing in general I’d guess.

Here comes more deja-vu-all-over-again:  The plugin couldn’t handle the actors’ heavy dependence on Word collaboration features like Track Changes.  It’s easy to get lulled into a false sense of security by an oh-so-pretty model for the final product of the authoring process. That Emerald City architectural view of content hides all the information and processing necessary to get to that end.  This particular problem has sparked some heavy flirtation between authoring, wikis, and DITA happening in my head, just in time for Valentine’s Day.

Jim’s use of XML Applications (in the Documentum sense) worked well with DITA’s topics and maps.  No big surprise there, but the marriage of DITA maps and Documentum virtual documents came with the usual toilet-seat-down relationship problems, especially because of webtop’s weak handling of virtual documents.  A post-editorial staff using XMetaL bears the brunt of the bickering, so authors are  left to worry about intellectual property, not scaffolding, as it should be.

Most of my work lately has centered on document dumping grounds.  Records management, eDiscovery, and transactional content management don’t concern themselves with the processes of actually making content.  It was great to see what’s happening on the other side again, and I’ve been stupid for not attending this group sooner.  Such is the life of a freelance.

One special note: The Users Group had brownies for Valentine’s Day.  Mmm, tasty!  I suggested that publicizing food at meetings might be some great marketing.  It might also require a bigger conference room for several reasons!


SEC Nomination for FINRA’s Shapiro

1999_penny_dime_mule_obvA former client may soon lose its top officer to the coalescing Obama administration.  Mary Shapiro, current CEO of the Finance Industry Regulatory Authority (FINRA), has been nominated to chair the Securities and Exchange Commission (SEC). Who is FINRA?

FINRA is the largest independent regulator for all securities firms doing business in the United States. We oversee nearly 5,000 brokerage firms, 172,000 branch offices and 665,000 registered securities representatives. Our chief role is to protect investors by maintaining the fairness of the U.S. capital markets. — finra.org Homepage

This is good news to me, just like Nobel-winning physicist Steven Chu’s appointment to head the Energy Department.  Shapiro promises to be a tough enforcer, something sorely lacking these last eight years.  I’d happily authorize her to use water boarding and extraordinary rendition on these billionaire market fundamentalists whose financial terrorism has left the entire world in chaos.

This may also be good news for FINRA.  I sincerely hope there’s a broader role  for them in the post-market-fundamentalism finance ecology, regulating a broader range of investments and having a bigger stick by forwarding cases to a more aggressive SEC.  They should also get some of the funds being flung around to expand their capabilities.  That’s a better investment than dropping $25 billion on Bank of America, no questions asked.

BrilliantLeap – EMC Layoffs: Is there really no better alternative?

Ballmer and Tucci?

For those living under a rock …

BrilliantLeap – EMC Layoffs: Is there really no better alternative?

It’s hard to say what layoffs at EMC mean for Documentum.  Where does EMC expect the greatest shrink in revenue: hardware or software?  The lion’s share of their business is still disk, but  I can come up with arguments both ways.  That won’t matter if the process is politically driven which would be bad for smaller, acquired former-companies like Documentum.

Here’s a horrible thought: Another way to improve the bottom line is to sell off assets.  Maybe we’ll finally see Documentum end up in the hands of Microsoft, a player that still has plenty of cash and a hunger for acquisitions.  EMC is big enough and the economy bad enough that I don’t see Microsoft going all boa constrictor and swallowing it whole, but it’s a strange time. Just rumor-mongering here, mind you.

Optimists out there might imagine EMC being inspired by Jason Calacanis and his well-reasoned preemptive layoffs at Mahalo.  It’s superficially apples and oranges, comparing a large public company to a small startup, but everybody needs vision and adaptability to survive the next 2-3 years.  Calacanis makes a solid case for his layoffs in “How to Handle Layoffs” by explaining it in terms of managing burn rate and building the core product.  Will EMC give us a similarly candid rationale?

OT: Aside from shrewd business advice, Calacanis often does a fabulous impression of John C. Dvorak on This Week in Tech.

A company that flails around to meet arbitrary expectations set by a market of speculators (instead of actual investors–a vanishing breed) is going to fail in these tough times.  Will EMC be another casualty of a knee-jerk response to the Market Fundamentalism at the heart of the current crisis?

Beer Is Never Off-Topic: The Case for RSS, Part 1


My Personal Favorite
My Personal Favorite from DFH

A Perl Mongers tradition is that beer is never off-topic, but here’s a case where sudsy pleasure and geeky relevance collide.  I subscribe to Beer Advocate’s RSS feed and came across a particularly useful post:

Dogfish Head 2009 Beer Release Schedule – BeerAdvocate

Dogfish Head is a great craft brewer, and their brewpub in Gaithersburg is something I miss about working in Rockville, MD. Philly Beer Week (soon, soon) is the only time I’m tied into the beer scene enough to get news like this directly.  That’s why feeds from places like Beer Advocate and Joe Six Pack are so useful; the news comes to me rather than me having to go out and get it.  I hope DFH will follow my suggestion and publish their site’s news items as a feed, including posts about the schedule and as each beer becomes available.

UPDATE: Mariah from Dogfish tells me that RSS is already in the works. Please follow them on Twitter at dogfishbeer.  Also note that Twitter and RSS have common benefits and potential synergies to be discussed later.

I’m always championing RSS as TiVo for the Internet.  Living without RSS now is as unthinkable as no cash machines or using a regular mobile phone.  Go to one place (Google Reader for me because it spans all of my devices) and everything is there waiting for me.  Think of it as a custom-built newspaper if you’re so Luddite that you’re wondering what TiVo is.

The core idea here is familiar to OOP geeks:  The Observer Pattern.  It defines “a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically” (Design Patterns, Gof4).  While RSS mechanics are a little different than the traditional implementation of the pattern, the intent is the same:  One things changes, and many (only the interested, not necessarily all) know about it.

Companies can use RSS to deal with communication issues like email overload and collaboration site proliferation.  This isn’t a silver bullet to replace these technologies, but it can tame otherwise unruly beasts.  In subsequent posts I’m going to talk about using RSS as publisher, as subscriber, and then dive deeper into how RSS can make for a better communication architecture.

10 Mobile Social Networks to Check Out – ReadWriteWeb

Wikipedia - Trilobite

Putting aside enough time to honor your New Year’s resolutions?  A good extra New Year’s resolution might be to avoid everything on this list from ReadWriteWeb:

10 Mobile Social Networks to Check Out – ReadWriteWeb 

The prevailing wisdom (Folklore?) is that the value of a network is proportional to the square of the nodes: See Metcalfe’s law and the less-familiar, more optimistic Reed’s law.  That might be true of the potential of the entire network, but what about the value to a particular node, i.e., individual?  Mobile networks being geocentric will have much-smaller pockets of useful nodes due to population density and market popularity–think Orkut in Brazil.

They’re all competing for the same thing–my time–with often-redundant features.  Cross-posts helps: This blog goes to Linkedin and Twitter; Twitter, Facebook, and Brightkite cross-post to each other.  However, I often ignore cool features to cater to the least common denominator among my network of networks. There’s also the occasional SNAFU like when a blog bulk edit pushes dozens of bad posts to each.  Diminishing returns rapidly give way to value lost with too many active networks, cross-posting or not.

Right now we’re in the Cambrian Explosion of social networking; such booms rarely end well for the participants.  How many of ReadWriteWeb’s 10 will only be remembered in the fossil record of such articles?

Like Comparing Apples and Content Addressable Storage Arrays

Don’t blame me!  Brilliant Leap baited me into talking about Apple with her post and subsequent tweet about Rob Enderle’s article in Enterprise Storage Forum: Apple Could Learn A Lot From EMC.  Oh?

Let’s deal with the underlying issue here:  The ESF article is talking about the other 99% of EMC that bought Documentum to sell disk.  EMC friends, I’m just kidding!  I can joke, right?  Since storage is not my expertise beyond some hacking around with Centera as primary content stores, I suppose EMC might really knock their customers’ socks off in the storage arena, but I doubt it.

Network disk isn’t something you see until it isn’t there, just like any good support technology.  It’s not sexy.  Apple’s products are shameless show-boaters meant to hog the spotlight.  They are meant to be seen, to be touched, and–dare I admit–even licked. Maybe that’s not a recommended way to unlock an iPhone, but what else can you do about winter, thick gloves, and a touch screen?

Would the average EMC storage user know the EMC logo on sight?  Would the average Apple user?

apple_logo emc_logo.jpg

OK, so the EMC logo actually says “EMC” in it.  You get the point though, right?

Enderle’s article talks about quality, metrics, and customer loyalty.  All those things are important to Apple, although the often-excellent quality of Apple products is marred on a regular basis with things like incendiary power supplies and the worst product launch ever: iPhone 3G + MobileMe.  WORST!  LAUNCH!  EVER!

Only Starbucks matches Apple’s skill at selling Lifestyle. The synergy of Apple’s well-designed, well-integrated components had this caged bird singing gaily a few posts ago despite a healthy fear of monoculture coming from a science background.  That all misses the point, the one reason the whole discussion is apples and oranges:  Innovation.

Apple sets the bar for technology after technology:  operating systems, mp3 players, online music retailing, and of course smart phones.  The integration, the cool, the marketing are all icing on the cake because Apple does something better than anybody else:  They innovate, and they do it where they can define the market rather than chase it.

I don’t think Big Disk lives or dies on such radical innovation. In fact, their customers probably fear change more than most.  Change is not good for 24/7 availability.  I can hear the compliance officers, archivists, and system admins shrieking in terror at the thought of something that might as likely store their entire repositories on a postage stamp or burst into flames if looked at the wrong way.

There is a relentless integrity of concept, simplicity, and message spanning all Apple products that likely has a single source, Steve Jobs.  Such single-mindedness is what makes big, ambitious, risky, not-for-the-faint-of-heart products succeed or fail spectacularly.  Apple’s done both regularly.  It allows org-wide turn-on-a-dime changes, something that another industry titan *cough*Bill Gates*cough* executed brilliantly after completely missing the Internet as the Next Big Thing.

That conceptual integrity, that vision from the top is also why Apple clung to its single-button mouse a decade too long and why the iPod Touch and iThingThatWillNotBeNamed are missing the aesthetically unpleasing extra two or three buttons needed for touch-without-sight operation.  Because that’s how Steve Jobs sees it, end of story.

I’m sure Joe could give Steve a few helpful hints on running disk farms for MobileMe or handling eDiscovery for the next options scandal, but that’s not the point.  It’s what Jobs teaches his successor and if that successor has the Right Stuff to wield Apple as a single instrument of innovation, lest Apple repeat the recent catastrophes of their rivals to the North.