Blog

Serious Software Engineer

Adam AntThe latest post on Brilliant Leap asks “What is a developer anyway?” Wikipedia has some interesting things to say about programmers and software developers; the article on software engineering says this:

The term software engineer is used very liberally in the corporate world. Very few of the practicing software engineers actually hold Software Engineering degrees from accredited universities. In fact, according to the Association for Computing Machinery, “most people who now function in the U.S. as serious software engineers have degrees in computer science, not in software engineering.”

I call myself a software engineer (when not compelled to bear the also-liberally-used title “Software Architect“) and have a degree in Computer Science. Do I function as a serious software engineer? Me, serious? I might be desperate, but I’m not serious. Let’s save the semantic cage match between programmers, engineers, and architects for another time.

The Brilliant Leap article is more concerned with commerce than convention: What do customers expect to get from a software developer, and how much are they willing to pay? After a few decades, the business world still can’t gauge developers on individual merits. They’re still fumbling around with college degrees, certifications, and “reputable” consulting firms, none of which assures top talent or good fits. The problem may no longer be just a search issue.

Welcome to the decline and fall of the IT developer. IT software development is a costly support function; it may be important to the business, but so are fax machines and photocopiers. Businesses love to reduce or eliminate costs, and it’s easy for a CEO cruising at 40,000 feet above actual business processes to see anything not directly contributing to the bottom line as taking away from it. Lacking metrics to assess the bottom-line benefit of new software systems only bolsters that impression as cost without profit.

Goose That Laid the Golden EggsMany companies actually make their money by selling code or silicon. Developers are the golden-egg-laying geese for companies like Apple, Google, and Bioware. These companies thrive on innovation and creativity, something that generates a real market for talent. Such companies also draw talent directly to themselves. Like many of my coworkers at Commodore, I was drawn to the company by being an enthusiastic user first. That, a little luck, and a good friend led to the best three years of my professional life. Maybe not the highest paying, but imagine that you work at Apple and somebody whips out their iPod Touch and starts raving about it. You contributed some part, however small, to something that is inherently cool and generates real enthusiasm–fanaticism even. How do you think that would feel? Job satisfaction can be such a wonderful thing.

I’ve noticed some trends over the last few years that support the notion of a feedback loop causing the decline and fall of the IT developer, especially in the Documentum world. Each one deserves some serious discussion, so I’ll elaborate on each in separate posts:

  • Fewer interesting projects
  • More mediocre developers
  • Poor management decisions

Back on the Market in March

I’ve decided I’m in no rush back to work because of a side effect of being a W-2 contract employee. A W2CE doesn’t submit quarterly tax estimates; each paycheck bases my yearly income off that period’s gross pay. This also leads to the annual raping of the bonus many wage slaves experience; you’re taxed on that period’s earnings and bonus as if you make that every single paycheck. It’s actually a pretty clever way to avoid withholding too little because it’s based on actuals instead of estimates. (I’ve been hanging around with accountants too much lately.) My actual annual income is always lower because of unpaid time off, like being between. The result this year is a fat refund because I took the last three months of 2007 off. Half of that goes to the mortgage, the other half to my discretionary fund.

With a little belt-tightening, that boost to the discretionary fund means more time off and a few road trips I’ve been meaning to take. Going to SF this week, reconnecting with friends out there, has me thinking about traveling along the East Coast to catch up with more friends and spend quality time with the Civic and my mobile technologies. If this unseasonably warm weather continues, I may even temporarily lift my “not above 72nd Street in Manhattan” winter prohibition after that oh-so-cold winter in Marlborough, MA.My domestic agenda should be clear by the end of February. I’ll ramp up the search in March; something starting after 16 March 2008 would be ideal. I also have plenty of draft posts to finish like part 2 of the Duplicate Folders mystery. I’m trying to secure the original test code for that article before posting it. Maybe a kind soul will run it against SQL Server and we’ll learn if Microsoft snipped out some junk DNA inherited from its common ancestor with modern day Sybase.

I say “modern day” with tongue firmly in cheek because I think of Sybase as the Neanderthal database that just doesn’t know it’s extinct yet. To be fair, my opinion is about Documentum running atop Sybase; Sybase itself might be fine for some things. Some people think there’s value there and even wonder why Sun bought MySql instead of Sybase [Computer World: Why Sun should’ve – and probably could’ve – bought Sybase]. I’m just glad Sybase isn’t my problem anymore.

Hell Freezes Over As I Update My Resume

Whip It!I’ve enjoyed my time off, but it’s time to get back out there–after I get back on the 12th from a birthday trip to San Francisco of course. A certain whip-cracking recruiter made me update my resume (pdf) in the middle of the night. The result isn’t pretty, but it stopped the beatings. I hate writing resumes, and I keep thinking I need to solve the underlying mechanical problems with some kind of technology and document management know-how.

The first problem is targeting. I’m not exactly a young pup anymore and my updated, pared down resume is still seven pages. When doing the hiring, I hate long resumes. Anything over three pages starts with a strike against it. Putting a resume on Jennie Craig by targeting (industry, technology, project type, etc.) helps the reader concentrate on relevant facts, something the whip-cracker emphasizes with supersonic booms. Thing is, it’s hard to know when some seemingly unrelated item will interest or inspire a reader. While I agree with the whip-cracker about targeted resumes, I think it’s important to provide a way for the curious to find out more on their own.

The second problem is repurposing. Some people like pretty resumes printed on fancy paper. Some companies require submitting your whole life in a single plain-text box on a website. Some unscrupulous sorts will take an electronic resume, doctor it up, and misrepresent themselves as your agent. Some resume formats play better to certain industries or nationalities. Even if you’ve narrowed down your content to just what the audience wants to see, how you present it is at least as important.

A perl monger friend makes an interesting case for making code look pretty (visually well-organized), and it applies to resumes as well. He says that the visual parts of our brain have had much more time to evolve than the parts responsible for language, reading, and symbolic processing. A well-formatted piece of code can convey plenty of meaning just by how it looks without all the overhead of reading words and constructing that pretty picture in your head. My favorite personal examples of this are “make versus ant” and “DTD versus XML Schema”:

XML Schema is vastly superior to DTDs for expressing complex document structure; data typing and constraints alone are worth their weight in golden angle brackets. For java development, the same is arguably true for ant over make. The thing is, XML Schema and ant files are barely human readable: The hierarchical structure doesn’t convey any useful context–there’s no grammar/narrative like you’d see in a make file or DTD. You have to read the tags and figure out their “part of speech” (operator, expression, value, etc.) which entails a much higher cognitive load. All that increased capability comes at a price in human readability. Compare this ant build file from an Apache Tutorial:

<project>
	<target name="clean">
		<delete dir="build"/>
	</target>
	<target name="compile">
		<mkdir dir="build/classes"/>
		<javac srcdir="src" destdir="build/classes"/>
	</target>
	<target name="jar">
		<mkdir dir="build/jar"/>
		<jar destfile="build/jar/HelloWorld.jar" basedir="build/classes">
			<manifest>
				<attribute name="Main-Class" value="oata.HelloWorld"/>
			</manifest>
		</jar>
	</target>
	<target name="run">
	<java jar="build/jar/HelloWorld.jar" fork="true"/>
	</target>
</project>

To this hackish, slapdash makefile I just created in vi(m):

BUILDDIR=./build
CLASSPATH=./build

compile: ${BUILDDIR} HelloWorld.java
	javac -d "${BUILDDIR}" HelloWorld.java

${BUILDDIR}:
	mkdir build

run:
	java -C "${CLASSPATH}" HelloWorld

jar:
	cd ${BUILDDIR}; jar cvf HelloWorld.jar *

clean:
	rm -rf ${BUILDDIR}

Even though we’re on ant’s home turf, the makefile is easier to understand at a glance if you know a little about make grammar. They don’t do exactly the same thing, but I hacked this up in five minutes. There are lots of hidden benefits with ant understanding java intimately; the downside of that is most developers can remain ignorant to what’s really going on. The makefile is easier to understand and faster to write while the ant file is much more powerful but is best created in an IDE that groks ant.

I only use XML Schema and ant when I have tools that take those unreadable piles of markup and translate them into something more visually organized and pleasing. For quick-and-dirty projects that I need to roll by hand from the shell prompt, make and DTD are faster to write and easier to debug. Several hundred million years of evolution behind recognizing shapes and colors trumps at few hundred thousand (at best) of linguistics and abstract thinking. Project complexity plays a part here too–simpler tools and languages for simpler problems. If I need constraints in my xml for instance, then it has to be XML Schema.

So I’ve been thinking for a few years now that my resume’s grown to a level of complexity where a single resume, even a small set of targeted resumes, just doesn’t work anymore. I can envisage a web application over a database (relational or xml would do fine) that breaks down and tags all the content of my resume in a way that allows easy targeting and repurposing. It’s not all that different from what we did for international drug submissions in RDMS, just with much smaller documents going in and coming out.

Anybody seen or worked on something like this? Part of me thinks there’s a business opportunity here, and the other part thinks it’s obvious enough that somebody’s already done it. I just don’t update my resume enough to make this a personal project, especially with repeat clients. They already know what I can do and just dump the resume into my compliance record without a glance; that didn’t inspire me to agonize or self-flagellate over my resume four years ago.

Really, how many resumes actually get studied beginning-to-end? I keep wondering if we need to move back to a cover-letter-like summary that points to a URL instead of clogging the hiring channels with more paper and bytes. In any event, I’d want any published resume to point back to a site that breaks down the whole database by company, project, technology, location. I just need to find somebody willing to pay me to write a system to manage my resume!

Macbook Air and Kindle Annoyances

Mesh Network on pingdom.com

My Worst Nightmare — from royal.pingdom.com via WebbAlert

I hate cables. The overgrown yellow “mesh network” above would hospitalize me if encountered in person. Cables offend my minimalist sensibilities while triggering my obsessive-compulsive need to always have the perfect cable on hand. I hide most of my dirty little secrets in big Rubbermaid tubs in the closet–all those combinations of type, length, color–but guests bear witness to the public shame that my workstation has become:

The Public Shame of Cable Clutter

I’m not a huge fan of wireless technologies either; they’re slower, less secure, less robust, and often require batteries. Being saturated by a hundred low-power radio transmitters just doesn’t seem like a good idea. Bluetooth and the alphabet soup of 802.11 can be handy, but they always betray me when I need them most. That’s why I still have a land line with at least one regular, corded phone attached at all times. That’s also why I laid gigabit Ethernet and fiber all around the apartment after the walls came down.

Along comes the Macbook Air. I drooled like a teen-aged fanboy at first, but it’s really starting to annoy me. Others find fault with the lack of an optical drive and no removable battery; understandable concerns for road warriors and jet setters. The average laptop battery barely makes it to cruising altitude, and now there’s all this nonsense about not carrying on extra batteries. As if I needed another reason to never fly again. How much worse of an experience can flying become?

What personally ticks me off is another Apple laptop without anything resembling a docking station or unified connector. I attach my computers via a KVM to a big monitor and a full-sized natural keyboard; the laptop’s poor ergonomics would reduce me to a cramped, gnarly mess of digits and vertebrae in days. So the trade-off is physical health for mental, having to look at a cramped, gnarly mess of cables all day long.

Another trade-off is treating the laptop like a desktop. I end up lugging around my old Powerbook15 instead of my smaller, lighter, more powerful Macbook because I dread having to cold connect/disconnect the eight cables that integrate the Macbook into my workstation. I’ll go through that hassle if I’m doing a weekly commute, but it’s too much work to throw into the backpack in case I have a few spare cycles while wandering the city.

Even Leopard’s getting on my bad side lately: Stability as a whole took a hit in this release, but I’ve had much more trouble with my KVM. The Macbook stops recognizing the keyboard; even fiddling with the USB cables won’t fix it. Reboot time. Imagine how much worse those issues are going to be when crowding everything into the Macbook Air’s one USB port instead of the two USBs and one firewire on a Macbook.

Maybe I’m being a bit paleolithic here. A lighter laptop with faster 802.11 appeals to people who lug their laptops everywhere. I’ve wanted to do that since I first held a TRS-80 Model 100, but it’s just not practical for a computer professional who needs tons of screen real estate and will be grafted to his tech for hours at a time. I would really rather have seen Apple release a jumbo iPod Touch or uber-Newton. A 5×7 multi-touch glass slab that wirelessly melds with my computer when in range would be perfect. It would also close the casket on eBook readers like the Kindle, another recent annoyance. There are two things that nobody in the eBook community seems to get:

First, this is the age of convergence. A device that does one and only one thing is a step backwards for a generation with phones that also take pictures, play music, and make coffee. Books are just another form of media, so give us a media player with enough real estate to make print (and video) as convenient as audio. Podcast pundits with Kindles have even been saying that they prefer reading books on their iPhones, devices they’d always have with them anyway. A 5×7 high-res multi-touch display in landscape mode could show two swipeable, pinchable side-by-side pages as well as playing movies in a space bigger than a postage stamp. The Touch is at the lower end of tolerable as far as video real estate goes.

Second, the average fiction junkie doesn’t need to carry around a hundred harlequin romances. The real market for a stand-alone reader with lots of capacity is somebody like me, a freelancer that travels for work and a techie who lugs around a massive technical library. I’d also want my books available on my other devices: Don’t give me a single device that hoards my stuff. Give me a system that handles my entire physical library (video, print, records, etc.) like iTunes handles my music across multiple computers and iPods. My inner minimalist quivers with delight at the mental image of bookshelves devoid of everything but that glass slab on a plate stand.

My initial infatuation with the Macbook Air has faded. It’s back to waiting for the tablet that’s a media device, eBook reader, portable home directory, and espresso maker. Just don’t make it a phone too. Apple’s involvement with AT&T and that whole plague-ridden industry has tainted its products and tarnished its reputation.

Macbook Air (and other things) Announced

TUAW reports the Keynote is done. Steve announces the Macbook Air, world’s thinnest laptop, with pictures and movies now on apple.com. The ad has one slipping out of a manila folder. Drool drool drool. It sure is thin and pert-ty.

TUAW was right about no ethernet port; it’s really banking on 802.11N and networked devices like Time Capsule and other computers. Only four ports–USB2, micro-DVI, MagSafe Power, and a headphone jack–and three are in a somewhat fragile looking flip-thingie. 80GB regular hard drive or 64GB solid state. Large multi-touch trackpad with some interesting new gestures. One in particular allows paging through things (swiping), something I got used to doing on my Touch and sometimes erroneously try to do on my Macbook with the mouse and cover flow. Five hour battery life. Starting at $1799. Alas the Apple Store itself is unavailable so it’s hard to get a feel for the range of options and prices. I’m glad the store’s down…I feel woozy. Do we finally have a full-featured subnotebook?

The other announcements were mostly ho-hum, although the Touch is finally getting the full suite of iPhone apps (tiny Yay). Movie rentals might be better than expected with HD and improved AppleTV functionality. It wasn’t clear if it’ll also get the user-configurable homes that iPhone’s getting end of February. TUAW didn’t mention anything about 3G for the iPhone–not good.

Time to humanize, grocery shop, and try not to obsess over the Air. I really hope to come home and find a new iTunes waiting to be downloaded that’s less buggy.

My MacWorld Wish List

  1. A subnotebook, tablet, or uberNewton. Let it be more like a giant iPhone or iPod Touch than some of the horrors circulating around the internet [Yuck 1, Yuck 2]
  2. A Less Buggy iTunes. This latest version is stuttering and crashing with PC-like frequency; it seems especially bothered by video. Smart Playlists have been broken for months with some conditionals not working and the match AND/OR bug. The lack of attention to easy fixes for so long lends some credence to a revamped iTunes to handle rentals. Ho hum.
  3. A wired remote accessory for my iPod Touch. The one big flaw of having to look at it to control it is so annoying when I’m roaming.
  4. A non-AT&T 3G iPhone. I could be tempted away from my Blackberry, but this is more of a “peace on Earth” wish for everybody, not just me.
  5. A real surprise. The sheer mania Steve’s keynote brings won’t be complete without something that completely blindsides the pundits. I want for them what the pollsters and media got served in New Hampshire.

The Duplicate Folders Mystery, Part I

The Folder Is a Lie

To paraphrase Douglas Adams: Documentum is big. You just won’t believe how vastly, hugely, mind-bogglingly big it is. All that bigness can lead to some unexpected behavior as gears deep in the belly of the clockwork monstrosity grind against each other. Sometimes the impossible happens.

I’m going to tell you a story–it’s a mystery about folders and databases and conclusion jumping and the relativism of obviousness, but to appreciate the story requires some understanding of how folders really work. Be warned that things underneath look completely different than how they appear on the surface.

Documents and folders make up the bulk of the visible universe in Documentum. One thing they have in common is the need to “be somewhere”, inside a folder or a cabinet. Here’s where things start going Looking Glass: An object keeps a list of the folders and cabinets that contain it. Most people would expect parents (cabinets or folders) to keep lists of their children (folders and documents), but no! There’s no mammalian parental love here; children must fend for themselves.

That list is stored in a repeating attribute called i_folder_id. It contains one or more of Documentum’s internal identifiers, each being the r_object_id of one of its parents. (Every Documentum object has an r_object_id including plenty of “dark matter” objects the casual users never see.) Object IDs are those sixteen digit hexadecimal numbers. They’re great for the system because they’re guaranteed unique and never get reused, but not so nice on human eyes. In fact, the “i_” prefix here is Documentum’s shorthand for saying this is an internal attribute and people really shouldn’t look at it–and they absolutely shouldn’t ever try to change it themselves. Unless they’re mad, totally mad! Bwah-hahahaha! But I digress.

There are some good reasons to do it this way. Repeating attributes can really kill performance if they get too big, so it’s better to have a bunch of small lists on the children than a few really long lists on the parents. It also makes more sense to deal with containment on the child when you start thinking about other behaviors like change tracking, permissioning, and versioning. This is also why the folder metaphor really falls apart when things start getting interesting in a document management sense.

It does create some problems, the biggest of which is that it’s very hard to use a single query to walk back up this list or find things at an arbitrary depth. Walking back up a reverse linked list is an iterative (procedural) process, something that (functional) query languages don’t do very well–hence database procedures by the way. Documentum can’t make the folder metaphor work at all without something besides i_folder_id.

The solution was for each folder to keep a list of its own explicit paths–one kind of location in current webtop speak–in another repeating attribute called r_folder_path. Unlike i_folder_id, this is something you can see very easily by choosing “View > Locations” and looks like “/John Kominetz/Private Documents/World Domination Plans”. If you know that you’re looking for things in that exact location, you can write a query to find things in a snap by adding where folder(‘/yadda/yadda/yadda’). No arcane object ids or iterative processes required. It’s even what makes it easy to “do a descend” and find everything inside all the other folders inside the folder at “/yadda/yadda/yadda”.

This works only if each explicit path is unique like a phone number or a mailing address. No two child folders in the same parent can have the same name. Documents don’t care; you can (and people often do) have hundreds of “report.xls” documents in the same folder. They also don’t have r_folder_paths–only folders and cabinets do–which is why they can get away with that. (If you think they should, then you need to think about versioning and how enforcing unique document names would make things really unpleasant.) So Documentum makes sure you can’t have two folders in the same location with the same name. Not unless something goes horribly, horribly wrong.

I mentioned both attributes are repeating–they’re lists of values–which means that one thing can be in more than once place at a time. It’s more like the UNIX idea of a hard link than a Windows Shortcut. The latter is really a separate file that points to another file which is really only in one place. (Documentum does have something like a shortcut, but it’s for pointing to objects in different docbases.) Here again the traditional folder metaphor breaks down and leads to confusion. Some users say “link” and mean the additional locations they put something. Except for one very special case during object creation, every location is a link and no one link is more significant than another.

There’s another consequence to these two lists that will again seem irrational to the uninitiated. Most people would expect that both lists would have the same number of values. If i_folder_id tells me this folder is linked to two parents, then it should have two folder paths, right? Wrong. Let’s say folder (A) is in two folders (B and C) and each of those folders is in two cabinets (D and E, F and G respectively).

Here’s what i_folder_id looks like on folder A:

  1. B.r_object_id
  2. C.r_object_id

Here’s what r_folder_path looks like on folder A:

  1. /D/B/A
  2. /E/B/A
  3. /F/C/A
  4. /G/C/A

This gets back to the fact that i_folder_id is the independent variable, the true representation of where something is. Documentum derives r_folder_path from i_folder_id. When you save a folder, it populates r_folder_path by getting all the r_folder_paths of its immediate parents and stapling its name onto the end of each. Assuming the r_folder_paths on its immediate parents are correct, it’s a great optimization to avoid having to walk up who-knows-how-many levels of that reverse linked list. Caveats and aphorisms about assumptions do apply.

The server also verifies during save that there aren’t any other folders in the same place with the same name. The save is an atomic operation, meaning that it either completely succeeds or fails. There’s no danger of having the folder left in some horrible transitional state like a Brundle folder. The server does this work, and it tells clients when things like saves fail so they (and their users) have a chance to correct the problem and try again.

One final point here. Have you ever noticed how long it takes to save a cabinet or high-level folder with lots of children when you change it’s name or link/unlink it? That’s because the save has to update all the r_folder_path strings of all of the folders it contains as well as its own r_folder_paths. That cascade update is also hitting a repeating attribute, and repeating attributes are notoriously unforgiving on performance when poked en masse like this. (Nested groups had a similar problem until Documentum deprecated dm_group’s equivalent to r_folder_path, i_all_users_names.) It takes so long because you’re not just updating that one object, you’re updating every single folder it contains! Either be patient or get the name right before filling it up.

Congratulations! You now have seen the man behind the curtain. This is how Documentum creates the illusion of folders and why sometimes the metaphor breaks down. Pretty clever really, but that’s exactly why it took me three years to solve the case of the duplicate folders, coming in Part II.