Category: Development

Todd Motto is maintaining a categorized list of public JSON APIs here. As of right now he’s listed 322 endpoints in 26 categories. There are 174 that don’t need authentication, 63 that require OAuth, 80 that take an API key, and 245 that support HTTPS.

It’s a great list, though certainly there must be more.

If you’d like to contribute, Todd has kindly provided some instructions.

Development Web APIs

My friend Brent Simmons has recently written a series of blog posts—seven parts so far—on How Not to Crash, for Cocoa and iOS developers. Brent is an experienced and thoughtful programmer, and these are well worth a read. Most are probably useful even to programmers working in other languages.

Check them out!

How Not to Crash #1: KVO and Manual Bindings
How Not to Crash #2: Mutation Exceptions
How Not to Crash #3: NSNotification
How Not to Crash #4: Threading
How Not to Crash #5: Threading, part 2
How Not to Crash #6: Properties and Accessors
How Not to Crash #7: Dealing with Nothing

Update: Brent added two more How Not to Crash posts since I originally wrote this:

How Not to Crash #8: Infrastructure
How Not to Crash #9: Mindset

… and wrapped them all up in this post on inessential.com.

CocoaDev Development Uncategorized

TomcatToday I needed to start figuring out how to install an open source analytics package on my dev machine. It’s implemented in Java, and needed Tomcat. I groaned. “Great. Another complicated dependency to install,” I thought.

Turns out that installing Tomcat on a Mac is actually pretty easy. I ended up following Wolf Paulus’ tutorial here.

Nice write-up, Wolf. Thanks!

Development

Version control and I go back a long way.

Back in the late 1990’s, I was working in the QA team at Sonic Solutions, and was asked to look into our build scripts and source code control system, to investigate what it would take to get us to a cross-platform development environment—one that didn’t suck.

At the time, we were running our own build scripts implemented in the MPW Shell (which was weird, but didn’t suck), and our version control system was Projector (which did suck). I ended up evaluating and benchmarking several systems including CVS, SourceSafe (which Microsoft had just acquired), and Perforce.

In the end we landed on Perforce because it was far and away the fastest and most flexible system, and we knew from talking to folks at big companies (Adobe in particular) that it could scale.

Recently I’ve been reading about some of the advantages and disadvantages of Git versus Mercurial, and I realized I haven’t seen any discussion about a feature we had in the Perforce world called change lists.

Atomic commits, and why they’re good

In Perforce, as in Git and Mercurial, changes are always committed atomically, meaning that for a given commit to the repository, all the changes are made at once or not at all. If anything goes wrong during the commit process, nothing happens.

For example, if there are any conflicting changes between your local working copy and the destination repository, the system will force you to resolve the conflict first, before any of the changes are allowed to go forward.

Atomic commits give you two things:

First, you’re forced to only commit changes that are compatible with the current state of the destination repo.

Second, and more important, it’s impossible (or very difficult) to accidentally put the repo into an inconsistent state by committing a partial set of changes, whether you’re stopped in the middle by a conflicting change, or by a network or power outage, etc.

Multiple change lists?

In Git and Mercurial, as far as I can tell there is only one set of working changes that you can potentially commit on a given working copy. (In Git this is called the index, or sometimes the staging area.)

In Perforce, however, you can have multiple sets of changes in your local working copy, and commit them one at a time. There’s a default change list that’s analogous to Git’s index, but you can create as many additional change lists as you want to locally, each with its own set of files that will be committed atomically when you submit.

You can move files back and forth between change lists before you commit them. You can even author change notes as you go by updating the description of an in-progress change list, without having to commit the change set to the repository.

Having multiple change lists makes it possible, for example, to quickly fix a bug by updating one or two files locally and committing just those files, without having to do anything to isolate the quick fix from other sets of changes you may be working on at the time.

Each change list in Perforce is like its own separate staging area.

So what’s the corresponding DVCS workflow?

While it’s possible with some hand-waving to make isolated changes using Git or Mercurial, it seems like it would be easier to accidentally commit files unintentionally, unless you create a separate local branch for each set of changes.

I understand that one of the advantages people think of philosophically with distributed version control systems, is that they encourage frequent local commits by decoupling version control from a central authority.

But creating lots of local branches seems like a pretty heavy operation to me, in the case where you just need to make a small, quick change, or where you have multiple change sets you’re working on concurrently, but don’t want to have to keep multiple separate local branches in sync with a remote repo.

In the former case cloning the repo to a branch, just to make a small change isn’t particularly agile, especially if the repo is large.

In the latter case, if you’re working on multiple change lists at the same time, keeping more than one local branch in sync with the remote repo creates more and possibly redundant work. And more work means you’re more likely to make mistakes, or to get lazy and take riskier shortcuts.

But maybe I’m missing something.

What do you do?

In this situation, what’s the recommended workflow for Git and Mercurial? Any experts care to comment?

Development

Dave and I released some updates to Manila.root, the version of Manila that runs as a Tool inside the OPML Editor.

Instructions and notes are on the Frontier News site:

If you’ve been following me for the last couple of months, may have noticed that I’ve been spending some time looking at Manila again.

Recently, I completed a set of updates to bring Manila up to speed when running in the OPML Editor, and with Dave Winer’s help, that work is now released as a set of updates to the Manila.root Tool.

If you’re one of the people who still runs websites with Manila, I’d love to hear from you. Leave a comment here and say “Hi!”, or if you run into any problems with Manila as a Tool in the OPML Editor, please ask on the frontier-user mail list / Google group. 🙂

Development Manila

java-os-x-yosemiteI’m looking at various Mac options for JavaScript / Node.js IDEs, and decided to try out the Eclipse-based Aptana Studio 3 (now part of Appcelerator). But I ran into a problem when trying to run it—I kept getting an error saying:

The JVM shared library “/Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk” does not contain the JNI_CreateJavaVM symbol

After much searching and reading of Stack Overflow posts, I decided after reading this, to completely uninstall the jdk and browser plugin from my machine, and start fresh with a clean install of Java for Yosemite.

Now I don’t know if it’s just me, but oddly the page on the Apple Support site that comes back blank. (I’ll assume there’s no conspiracy for the moment, and that this is just a bug.)

So I plugged the URL to the page into Google and loaded up the cached version. There I found the direct download link to the installer here: http://support.apple.com/downloads/DL1572/en_US/JavaForOSX2014-001.dmg

After running that installer, Aptana Studio (and also Eclipse) now launch just fine. Phew.

I’m posting this here to help others who are running into the JNI_CreateJavaVM error, and so I can find it again the next time I need to set up a new machine with Eclipse or Aptana.

ps. See also comment thread on Facebook.

Development

In the last post on this topic, I discussed some of the differences between Manila and WordPress, and how understanding those differences teased out some of the requirements for this project.

In this post I’m going to talk about the design and implementation of a ManilaToWXR Tool, some more requirements that were revealed through the process of building it, and a few of the tricky edge cases I had to deal with.

A little history first…

Frontier website headerAmong the more interesting things I did while I was a developer at UserLand, was to build a framework we called the Tools Framework, which brought together many different points of extensibility, and made it easy for developers to customize the environment.

In Frontier, Radio UserLand, and the OPML Editor, a Tool is a collection of code and data in a database, which extends or overrides some platform- or application-level functionality. It’s sort of analogous to a Plugin in the WordPress universe, but Tools can also do things like run code periodically (or continuously) in the background, or implement entirely new web applications, or even customize Frontier’s native UI.

For example, you could implement a Tool that hooks into the windowTypes framework and File menu callbacks to implement a new document type corresponding to a WordPress post. Commands in the File menu call the WordPress API, and present a native interface for editing your blog—probably in an outline. Radio UserLand did exactly this for Manila sites, and it was fantastic. (More on that later.)

Another example of a Tool is one that implements some new XML-RPC endpoints (RPC handlers in Frontier) to provide a programmatic API for accessing some content in a database on your server.

For my purposes, I’m not doing anything nearly so complicated. The main thing I wanted comes from the Tools > New Tool… menu command. This creates a new database and pre-populates it with a bunch of placeholders for things like its menu, a table for data and preferences, and of course a table where my code will live.

It gives me an easy, standard way to create a database with the right structure, and the hooks into the menu bar that I wanted to make my exporter easy to use.

Code Components

Now some of this may sound pedantic to the developer-types who are reading this, but please bear with me on behalf of our non-nerd cohorts.

Any time you need to write a lot of code, it makes sense to break the work down into small, bite-sized problems. By solving each of those problems one at a time, sometimes in layers, you eventually work your way towards a complete solution.

Each little piece should be simple enough that you can compartmentalize it and separate it from the other pieces. This is called factoring, and it’s good for lots of reasons including readability, maintainability, debug-ability, reuse. And if you miss something, make a mistake in your design, or discover that some part of your system doesn’t perform well, it’s far easier to rewrite just one or a couple of parts than it is to de-spaghettify a big, monolithic mess.

Components and sub-components should have simple and consistent interfaces so that other code that talks to them can in turn be made simple and consistent. Components should also have minimal or no side-effects, meaning that they don’t change data that some other code depends on. And components should usually perform one or a very small number of tasks in a predictable way, to keep them small, and make them easy to test and debug. If you find yourself writing hundreds of lines of code in one place, you probably need to break the problem down into smaller components.

So with these concepts in mind, I set about coming up with a component-level design for my Tool. I initially came up with four types of components that I would need, and each type of component may have a specific version depending on the type of object it knows about.

Iterators

First, I’m going to need an easy way to iterate across posts, stories, pictures, and other objects. As my code iterates objects in my site, the tool will create a fragment of XML that will go into a WXR file on disk.

By separating the iteration from everything else, I can easily change the order in which objects are exported, apply filters for specific object types, or only export objects in a given date or ID range. (It turned out that ranges and filters were useful for debugging later on.)

Manila stores most content in its #discussionGroup in a sub-table named messages. User information is in #membershipGroup, and there’s some other data scattered around too. But the most important content—posts, pages, pictures, and comments—is all in the #discussionGroup.

Initially I’d planned to make multiple passes over the data, with one pass for each type of data I wanted to export. So first export all the posts, next the pages, next pictures, etc. As it turned out however, in both Manila and WordPress, a post, a page, and a picture have more in common than not in terms of how they’re stored and the data that comes along with them. Therefore it actually made more sense to do just one pass, and export all the data at one time.

There was one exception, however: In WordPress unlike Manila, comments are stored in a separate table from other first-class site content, and they appear in a WXR file as children of an <item> rather than as their own <item> under the <channel> element:

In the end I decided to write two iterators. Each of them would take the address of the site (so they can find other required metadata about a person for instance), and the address of a function to call for each object as it goes along:

wxr.visit.messages – iterates over all of the messages in my site’s #discussionGroup, skipping over deleted items and comments, since they won’t be exported as an <item> in my WXR file.

wxr.visit.commentsrecurses over responses to a message to generate threaded comment information.

It turned out later on that I needed two more iterators—one for categories, and one for “Gems” (non-picture files), but the two above were a great starting point that would give my code easy access to the bulk of the content.

Data Extractors

Next I needed some data extractors. These are type-specific components will pull some data for a post, picture, comment, etc out of the database, and normalize it to a native data structure that can then easily be output to XML for my WXR file.

The most important data extractor is wxr.post.data, which takes the address of a message containing a blog post that’s in my site’s #discussionGroup—and returns a table (struct) that has all of the data elements that will go into an <item> in the exported WXR file.

Because the WordPress importer expects the comments as <wp:comment> sub-elements of <item> the post data extractor will also call into another data extractor that generates normalized data representing a comment.

For other types of objects I’ll need code that extracts data for that type as well. So I’ll need code to extract data for a picture, code to extract data for a page (story), and code to extract data for a gem (file).

Here’s part of the code that grabs the data for a comment:

There are a few interesting things to point out here:

  1. I chose to capture comment content even if it’s not approved. Better to keep the content than lose it, just in case I decide to approve it later.
  2. The call to wxr.comment.parent gets the ID of the comment’s parent. This preserves the threaded nature of the conversation, even if I decide not to have threaded comments in my WordPress site later on. It turns out that supporting both threaded and unthreaded comments was the source of some pain that I’ll explain in a future post.
  3. The call to wxr.string.processMacros is especially important. This call emulates what Manila, mainResponder, and the Frontier website framework do when a page is rendered to HTML. Without this capability, Frontier macro source code would leak through into my WordPress site, and possibly many internal links from #glossary items would not be broken. Getting this working was another source of pain that took a while to work through—again, more in a future post.
  4. All sub-items in the table that gets returned have names that start with “wp:”, which I’ll explain below…

Encoders

Once I had some structured data, I was going to need to use it to encode some XML. It turns out that this component could be done in a very generic way that would work with any of my data extractors.

Frontier actually does have somewhat comprehensive XML capabilities. But the way it’s implemented requires very verbose code that I really didn’t want to write. I had done quite enough of that in a past life. 😉

So I decided to write a much simpler one-way XML-izer that I could easily integrate with my data extractors.

The solution I came up with was to recurse over the data structure that an extractor passed to it, and generate an XML tree whose element names match sub-items’ names, and whose element content were the contents of each sub-item.

There were three features I needed to add in order to make this work well:

Namespaces: Many elements in a WXR file are in a non-default namespace—either wp: for the WordPress-specific data, or dc: for the Dublin Core extension. This feature was easy to deal with by just naming sub-items with the namespace prefix, i.e. an element named parent in the wp: namespace would simply be called wp:parent when returned by the data extractor.

Multiple elements: Often I needed to create multiple elements at a given level in the XML file that all have the same name. <wp:comment> is a good example. The solution I came up with here is similar to the one Frontier implements in its native XML verbs.

A compiled XML table in Frontier has sub-items representing elements, which have a number, a tab character, and the element’s name. The Frontier GUI hides the number and the tab character when you view the table, so you can see multiple same-named elements in the table editor. When you click an item’s name, the number and tab character are revealed, and you can edit them if you want. That said, you’re supposed to use the XML verbs, xml.addTable or xml.addValue to add elements.

Most of this is not particularly well documented, and personally I don’t think it was the most elegant solution, but it was effective at working around Frontier’s limitation that items in tables had to have unique names, whereas in XML they don’t.

I wanted something simpler, so I decided instead to simply strip anything after a comma character from the sub-item’s name. This way whenever my data extractor is adding an item, it can just use table.uniqueName with a prefix ending in a comma character, and then add the item at that address. Two lines of code, or one if we get just a little bit fancy:

XML attributes: The last problem to solve was generating attributes on XML elements, for example <guid isPermalink="false">...</guid>. It turns out that if there were an xml.addAttributeValue in Frontier, it could have handled this pretty easily, but that was never implemented. Instead I’d have to add an /atts sub-table, and add the attribute manually—which takes multiple lines of code just to set a single attribute. Of course I could implement xml.addAttributeValue, but I don’t have a way to distribute it, so nobody else could use it! 🙁

In addition, I really didn’t want big, deeply-nested data structures flying around my call-stack, since I’m going to be creating thousands of tables at run-time, and I was concerned about memory and performance.

In the end I decided to do a hack: By using the | character to delimit attribute/value pairs in the name of table sub-elements, I could include the attributes and their values into the element name itself. So the <guid isPermalink="false"> element would come from a sub-item named guid|isPermalink=false.

Normally I would avoid doing something like this since hacks have a tendency to be fragile, but in this case I know in advance what all of the output needs to look like, so I don’t need a robust widely-applicable solution, and the time I save with the hacky version is worth it.

Utility Functions

Then there’s a bunch of miscellany:

  • A way to easily wrap the body of a post with <![CDATA[…]]> tokens, and properly handle the edge case where the body actually contains those tokens.
  • A non-buggy way to encode entities in text destined for XML. (xml.entityEncode has had some bugs forever, which weren’t fixed because of Rule 1.)
  • Code to deal with encoding various date formats, and converting to GMT.
  • Code to convert non-printable characters into the appropriate HTML entities (which in turn get encoded in XML).
  • Other utility functions dealing with URLs, calculating permalinks, getting people’s names from their usernames, etc.

The Elephants in the Room

At this point there were a few more things I knew I would need to address. I’ll talk about these along with handling media objects in my next post. In the meantime, here’s a teaser:

  1. Lots of stuff in Manila just doesn’t work at all unless you actually install the site, with Manila’s source code available.
  2. The macro and glossary processors aren’t easy to get working unless the code is running in the context of a real web request.
  3. What should I do about all the incoming links to my site? Are they all going to simply break?

I’ll talk about how I dealt with these and other issues in the next post.

More soon…

Development Manila WordPress

In my earlier post about porting from Manila to WordPress, I covered some basics around how and why I decided on the approach I took, and some of the requirements for the new site.

I’ve made a ton of progress—what you’re reading right now is coming from WordPress 4.0, hosted on my own server—but I’ve been remiss on follow-up posts. Fortunately I took lots of notes during this process, since I knew I wanted to write more about it. Probably too many notes in fact. 😉

I also found myself falling diving into the rabbit hole: I’ve been debugging the WordPress importer plug-in, while slowly and osmotically learning PHP, and discovering the wonders … um … fun that is XDebug, Eclipse PhpStorm, and MAMP. (PhpStorm is great so far, but still very unfamiliar.) Why? Two reasons, one of which I touched on before:

  • I get to learn about WordPress internals, PHP, and debugging PHP sites—and learning is always a Good Thing™
  • It turns out that Manila, a product developed over the better part of a decade, is quite complicated (duh), and I get to figure out how to re-simplify my legacy websites

I know Manila better than almost(?) anyone, so even years after developing in that environment full-time, its nooks and crannies are mostly familiar to me. Manila is an old friend, and we have a relationship complicated by the history of our mutual growth. Because Manila and I learned the Web organically over the last decade or so, we share shall we say, breadth. 😉

It’s a valuable trait that makes us both very flexible, but it also means that we’re sometimes hard to understand. And in doing this project, it’s likely I would need to make some difficult trade-offs, or else suffer endless debugging and long-term maintenance complexity, both with rapidly diminishing returns.

I’ll give you a few examples, and will tease out requirements as I talk through them:

Home Pages vs News Items

In its original incarnation, Manila only understood one “Home Page” per day. You could write as much as you want, add as many links as you want, and format however you want. But the content for a given day had no set structure or order.

√ Requirement: Ability to link to a day-archive in my WordPress site, not just to a post

Relatively early in Manila’s product lifetime, Brent Simmons implemented News Items in Manila, which enabled Manila sites to have the same kind of structure we think of today as a Blog—a series of reverse-chronological posts, usually with a title, and sometimes with a link. As I recall, Manila’s News Items were inspired by Slashdot’s format which was essentially blogs+categories—but the reverse-chronological collection of posts was key.

As the platform grew, News Items eventually had other data associated with them too: They supported per-item comments and trackbacks (like WordPress posts), and they separated the concept of “last update” from “published” though differently than WordPress does.

For the purpose of this project, it’s important to understand that a “news-day” post and “news item post” are different things, and needed to be dealt with accordingly.

√ Requirement: Handle both day-post-style sites and item-post-style sites

√ Requirement: Translate News Item departments into WordPress categories

To make matters more complicated, the Managing Editor (admin) of a Manila site could switch between News Items and Home Pages at will. So some days might have a single, monolithic post while other days may have many separate posts.

√ Requirement: Support both per-day and per-post styles within a single site

For content on JakeSavin.com this won’t be much of an issue since it’s always been a News Items (per-post) style site, and I’ve rarely made more than one post per day. But in the long run I also want to bring in content from Jake.EditThisPage.Com—years worth of content that I don’t want to lose—and it’s one of these mixed sites with some day-page style content, and some blog post style content, and a mix that sometimes included many posts each day.

⇒ Insight: I don’t need to deal with day-type sites right now, but I shouldn’t design myself into a corner that precludes them.

Permalinks, GUIDs, and IDs

So what the heck are these things? I mean I’ve heard of a permalink but a GUID? I get what an ID is, but why do I need to understand it?

Permalink

The permalink to a post is a URL which doesn’t change over time, which goes straight to the post. It’s important to preserve these links, since every time someone links to a post on your site, the place they’re linking to (ideally) is your post’s permalink. If that URL changes then all of those incoming links will break, and The Web will be just a tiny bit more lonely: On The Web, broken links == sadness.

It turns out that by default, WordPress and Manila format blog post URLs quite differently. Moreover, WordPress pages typically live at only one URL (really two—one by its link [path], and one by its ID), whereas in Manila, “Stories” (Pages in WordPress) and sometimes even individual posts can live at any number of URLs, some of which are generated, and some of which may be added by the user.

For example a blog post (news item) in Manila is most often accessed via a calendar-style URL off of the root of the site, like http://example.com/2014/10/01#1234, but it may also appear at any the following (or more) URLs:

  • http://example.com/discuss/msgReader$1234 — note the $ delimiter
  • http://example.com/stories/storyReader$1234 — if promoted to a story [page]
  • http://example.com/my-super-awesome-post — user-entered path
  • http://example.com/awesome/firstPost — another user-entered path
  • http://example.com/newsItems/departments/superAwesome/2014/10/01#1234 — from department (category)

If I want to preserve my site’s existing web presence, then I should do whatever I can to make sure that incoming links continue to work. And while I control all the domains involved, I also don’t want to have to maintain a giant list of redirects…

√ Requirement: Support at least one of Manila’s canonical URLs for transferred content

√ Stretch-goal: Support all URLs for a given bit of content, including user-generated ones

GUID

A post’s GUID is its canonical and unchanging identifier that signals to feed readers (RSS, Atom, etc), that if it sees this post again, it doesn’t need to show it to users, since they’ve already seen it.

But if the post’s URL ever changes, a well-behaving content management system should remember the original GUID and not change it, so that folks who subscribe to the site in a feed reader don’t get blasted with a whole lot of repeat posts.

There are other potential uses for a post’s GUID. Some systems might use it to identify a post when accessing it via an API. Some (like Manila) use a combination of the site’s URL and a post’s ID instead for API access.

Sometimes it’s easy to generate a GUID just by reusing the value of the post’s permalink. In this case you could add an attribute called isPermalink and make its value true to signal to consuming apps that the GUID actually points at a real web resource. (WordPress doesn’t do this, even when the permalink and GUID are the same.) This could be especially useful if the post has a link which is not a link to the post itself.

Then there’s the ID. Manila and WordPress both have sequential IDs for the super-set of posts and pages. Unlike WordPress though, Manila also keeps comments in the same “table” as posts and pages, whereas WordPress treats comments completely separately. Going from Manila to WordPress then shouldn’t create any issues, since there are no inherent ID conflicts.

Data Hierarchy: What’s the same, what’s different?

Among the reasons I picked WordPress instead of some other platform, is that WordPress and Manila actually have a great deal in common:

  • They both separate content from layout by flowing content through a Theme
  • They both use a database to store the content
  • They both have posts, media, and pages (in Manila, News Items, Pictures & Gems, and Stories)
  • The table used for posts, stories, and media is the same (in Manila it’s the site’s Discussion Group)
  • Both systems use the filesystem for blob storage for media files

But there are some differences:

  • In a Manila site, you can have threaded discussions that aren’t attached to a post or page. Not so in WordPress.
    • This could be faked up with private posts/pages in WordPress, but depending on the site this may not be worth the extra development effort.
  • In Manila, comments are stored in the same table as posts, pages, and media objects, but in WordPress, commentsare stored separately.
    • In theory this shouldn’t be an issue, since as long as I build the WXR file such that WordPress understands it, comment content will import just fine.

I was thrilled to discover that WordPress supports threaded discussions. Though it’s not an issue for JakeSavin.com since it’s always had flat comment threads, when I get around to porting over my other sites, I will want to preserve threaded discussions.


That’s it for this post. In the next post, I’ll talk about the code that I wrote, how I tested and debugged it, and what kind of crazy edge cases I found continue to find.

Development Manila WordPress

wordpress-logo-sm.png:
About a week ago I started a project to port this site from
Manila to WordPress. While there are probably very few Manila users still out there who might want to do this, I thought it would still be a useful exercise to document the process here, in case anything I’ve learned might be useful.

This is something I have been wanting to do in my spare time for many months now — probably two years or more. But with family and work obligations, a couple of job changes, and a move from Woodinville to Seattle last fall, carving out the time to do this well was beyond my capacity.

Now that I’m recently between jobs, the only big obligation I have outside of spending time with my wife and son is to find another job. Some learning can’t hurt that effort. Plus I seem to have a surplus of spare time on my hands right at the moment.

Managed or Self-hosted?

The first question I needed to answer before even starting this process is whether I want to host on a managed service (most likely WordPress.com), or if I should self-host. There are trade-offs either way.

The biggest advantages of the managed option come from the very fact that the servers are run by someone else. I wouldn’t have to worry about network outages, hardware failures, software installation and updates, and applying an endless stream of security patches.

But some of the same features which are good if I were to go with a hosted solution, are also limiting. I would have limited control over customization. I wouldn’t be able to install additional software along-side of WordPress. I would be limited to the number of sub-sites I was willing to pay for. I wouldn’t necessarily have direct access to the guts of the system (database, source code, etc).

Most importantly, I wouldn’t be in control of my web presence end-to-end — something which has been important to me ever since I first started publishing my own content on the Web in 1997.

There’s one more advantage of self-hosting which is important to me: I want to learn how WordPress itself actually works. I want to understand what’s actually required to administer a server, and also start learning about the WordPress source code. The fringe benefit of this is also learning some PHP, which while some web developers prefer alternate languages like Ruby, Python, or Node.js, the install-base of WordPress itself is so enormous, that from a professional development perspective, learning some PHP is a pretty sensible thing to do.

I decided to go self-hosted, on my relatively new Synology DS-412+ NAS. It’s more than capable of running the site along with the other services I use it for. It’s always on, always connected to the Internet, and has RAID redundancy which will limit at least somewhat, the risks associated with hardware failure.

Develop a Strategy

The next thing I needed to work out was an overarching plan for how to do this.

Aside from getting WordPress installed and running on my NAS, how the heck am I going to get all the data ported over?

First, I made a list of what’s actually on the site:

  1. A bunch of blog posts (News Items) sitting in a Frontier object database
  2. Comments on those posts
  3. A small amount of discussion in the threaded discussion group
  4. User accounts for everyone who commented or posted in the discussion group
  5. A bunch of pictures and other media files
  6. A few “stories”
  7. Some “departments” that blog posts live in
  8. A site structure that put some pages on friendlier URLs
  9. Logs and stats that I don’t care much about
  10. A sub-site I never posted much to, and abandoned years ago

For the most part, there aren’t any types of information that don’t have an allegory in WordPress. News items are blog posts, comments are comments, stories are pages, pictures are image attachments, departments are categories. The stats and logs I’m happy to throw away. Not sure what to do with the site structure, but if push comes to shove, I can just use .htaccess files to redirect the old URLs to their new homes.

Next I needed a development environment — someplace where I can write and refine code that would extract the data and get it into WordPress.

On the Manila side, I did some work a little over a year ago to get Manila nominally working in Dave Winer’s OPML editor, which is based on the same kernel and foundation as UserLand Frontier, over which Manila was originally developed. The nice thing about this is that I have a viable development environment that I can use separately from the Manila server that’s currently running the production site.

On the WordPress side it makes sense to just host my development test sites on my MacBook Air, and then once I have the end-to-end porting process working well, actually port to my production server — the Synology NAS.

Data Transmogrification

Leaving the media files and comments aside for a moment, I needed to make a big decision about how to get the blog post data out of my site, and into my WordPress site. This was going to involve writing code somewhere to pull the data out, massage it in an as-yet unknown way, and then put it somewhere that WordPress could use it to (re-)build the site.

It seemed like there were about five ways to go and maybe only one or two good ones. Which method I picked would determine how hard this would actually be, how long it might take, and if it’s even feasible at all.

Method 1: Track down Erin Clerico

A bunch of years ago, Erin Clerico (a long-time builder and hoster of Manila sites in the 2000’s) had developed some tools to port Manila sites to WordPress.

As it turned out, a couple years back I’d discussed with Erin the possibility of porting my site using his tools. Sadly he was no longer maintaining them at that time.

If I remembered correctly, his tools used the WordPress API to transmit the content into WordPress from a live Manila server — I have one of those. It might be possible, I thought, to see if Erin would share his code with me, and I could update and adapt it as necessary for my site, and the newer versions of WordPress itself.

But this was unknown territory: I’ve never looked at Erin’s code, know very little about what may have changed over the years in the WordPress API, and don’t even know if Erin still has that code anywhere.

Method 2: Use the WordPress API

I could of course write my own code from scratch that sends content to WordPress via its API.

This would be a good learning exercise, since I would get to know the API well. And the likelihood that WordPress will do the right thing with the data I send it is obviously pretty high. Since that component is widely used, it’s probably quite well tested and robust.

This approach would also work equally well, no matter where I decided to host the site — on my own server or whatever hosted service I chose.

But there potential problems:

  • Manila/Frontier may speak a different dialect on the wire than WordPress — I haven’t tested it myself.
  • Client/server debugging can be a pain, unless you have good debugging tools on both sides of the connection. I’ve got great tools on the Manila side, but basically no experience debugging web services in PHP on the WordPress side.
  • It’s likely to be slow because of all the extra work the machines will have to do in order to get the data to go between the “on-the-wire” format and their native format. (This will also make debugging more tedious.)

Method 3: Use Manila’s RSS generator

Of course Manila speaks RSS (duh). And WordPress has an RSS import tool — Cool!

In theory I should be able to set Manila’s RSS feed to include a very large number of items (say 5,000), and then have WordPress read and import from the feed.

The main problem here is that I would lose all the comments. Also I’m not sure what happens to images and the like. Would they be imported too? Or would I have to go through every post that has a picture, upload the picture, and edit the post to link to the new URL?

I’m less worried about the images, since I can just maintain them at their current URLs. It’s a shame not to have real attachment objects in my WordPress site, but not the end of the world.

Loss of the comments however would be a let-down to my users, and would also limit the export tool’s potential usefulness for other people (or my other sites).

Method 4: Make Manila impersonate another service

In theory it should be possible to make Manila expose RPC interfaces that work just like Blogger, LiveJournal, or Tumblr. WordPress has importers that work with all of these APIs against the original services.

Assuming there aren’t limitations of Frontier (for example no HTTPS, or complications around authentication) that would prevent this from working, this should get most or all of the information I want into WordPress.

But there are limitations with some of the importers:

  • The Tumblr importer imports posts and media, but I’d lose comments and users (commenters’ identities)
  • The LiveJournal importer seems to only understand posts
  • The Movable Type and TypePad importer imports from an export/backup file, and understands posts and comments, but not media

The only importer that appears to work directly from an API, and supports posts, comments, and users is the Blogger importer. (It doesn’t say it’ll pick up media however.)

In the Movable Type / TypePad case, I’d have to write code to export to their file format, and it’s not clear what might get lost in that process. It’s probably also roughly the same amount of work that would be needed to export to WordPress’ own WXP format (see below), so that’s not a clear win.

When it comes to emulating the APIs of other services (Blogger, Tumblr, LiveJournal), there’s potentially a large amount of work involved, and except for Blogger, there would be missing data. There’s also the non-trivial matter of learning those APIs. (If I’m going to learn a new API, I’d rather learn the WordPress API first.)

Method 5: Make Manila pretend to be WordPress

While researching the problem, I discovered quickly that WordPress itself exports to a format they call WXR, which stands for WordPress eXtended RSS. Basically it’s an XML file containing an RSS 2.0 feed, with additional elements in an extension namespace (wp:). The extension elements provide additional information for posts, and also add comments and attachment information.

On first glance, this seemed like the best approach, since I wouldn’t be pretending to understand the intricacies of another service, and instead would be speaking RSS with the eXtended WordPress elements — a format that WordPress itself natively understands.

Also since I’m doing a static file export, my code-test-debug cycle should be tighter: More fun to do the work, and less time overall.

Method 6: Reverse-engineer the WordPress database schema

I did briefly consider diving into MySQL and trying to understand how WordPress stores data in the database itself. It’s theoretically possible to have Manila inject database records into MySQL directly, and then WordPress wouldn’t be the wiser that the data didn’t come from WordPress itself.

This idea is pretty much a non-starter for this project though, for the primary reason that reverse-engineering anything is inherently difficult, and the likelihood that I would miss something important and not realize it until much later is pretty high.

Time to get started!

I decided on Method 5: Make Manila pretend to be WordPress. It’s the easiest overall from a coding perspective, the least different from things I already know (RSS 2.0 + extensions), and should support all of the data that I want to get into WordPress from my site. It also has the advantage of being likely to work regardless of whether I stick with the decision to self-host, or decide to host at WordPress.com or wherever else.

Implementing the Blogger API was a close second, and indeed if Manila still had a large user-base I almost certainly would have done this. (There are many apps and tools that know how to talk to Blogger, so there would have been multiple benefits from this approach for Manila’s users.)


In the next post I talk about some differences between the way Manila and WordPress store data, and some requirements that surfaced while investigating how to export data from Manila to WXR.

Development Manila WordPress

A few weeks ago, “Uncle Bob” Martin put up a great post on the 8th Light blog, about the value of disciplined software development being just as important, if not more important, for start-ups as for established companies. – The Start-Up Trap:

“As time passes your estimates will grow. You’ll find it harder and harder to add new features. You will find more and more bugs accumulating. You’ll start to parse the bugs into critical and acceptable (as if any bug is acceptable!) You’ll create modules that are so fragile you won’t trust yourself, or anyone else, to modify them; so you’ll work around them. You’ll build a festering pile of code that, with every passing week, requires more and more effort just to keep running. Forward progress will slow and falter. It may even reverse as each release becomes buggier and buggier, and less and less stable. Catastrophes will become more and more common as errors, that should never have happened, create corruptions and damage that take huge traunches of time to repair…

“… If you want to go fast. If you want the best chance of making all your deadlines. If you want the best chance of success. Then I can give you no better advice than this: Follow your disciplines! Write your tests. Refactor your code. Keep things simple and clean. Do Not Rush! You hold the life-blood of your start-up in your hands. Don’t be careless with it!

If you’re an indie developer, working on a startup, or come to think of it if you’re doing any software engineering at all, go read it.

And while you’re at it, have a listen to David Smith’s interview with Brent on Developing Perspective, where he talks about technical debt in NetNewsWire as the iPhone and iPad versions’ deadlines loomed. (Bonus: Identical Cousins innagural episode, Software Is Hard.)

Development