The Great Data Migration, Part 2
Once more, with feeling
From the beginning, it was clear that data migration was going to be this redesign’s biggest, most cumbersome task, as the site was growing from 600-some blog posts to untold thousands. I assumed that reformatting the mountain of data arriving in disparate configurations from over a dozen external sources (as described in my previous post) would be the lion’s share of the work, and it would be smooth sailing from there. How wrong I was! Reformatting content previously published on my own site—and merging it with the newly reformatted external data—introduced some thorny complications and wound up taking me an entire month.
Part of this was due to the old posts moving into a richer environment, with some bits of metadata not included in the previous version of the site needing to be added manually (like location data for posts about travel). Other work resistant to automation involved cleaning up somewhat haphazardly assembled figure includes and bits of hybrid HTML/Markdown for article furniture that I never got around to componentizing (like tables of content for longer posts). But it’s all tidied up now. In the seven years (almost to the day!) since I last redesigned the site, I’ve become much more adept at working with structured data, so now virtually every blog post component whose needs exceed the limitations of straightforward Markdown is stored in carefully structured JSON or YAML. It’s a gift to my future self, although I’m sure future me will be disappointed in plenty of other things I don’t currently know I’m doing wrong. C’est la vie!
Anyway, for as much trouble as that stuff was, the biggest content strategy headache turned out to be consolidation. Take Tinnitus Tracker. I first launched it in 2019, but its content goes back to the early 1990s, and many of its backdated posts incorporate bits previously posted elsewhere, like Flickr photos, Instagram videos, and text excerpts from blog posts. In deciding how best to minimize repetition on the site, I had to rethink some of my ideas about how the site is organized and how it addresses the origins of its content.
I had always planned to allow users to browse posts by source; for example, browsing the posts on my site sourced from Instagram would effectively replicate the experience of scrolling through my profile on Instagram. But probably more than half of those Instagram posts are incorporated into Tinnitus Tracker posts, which means if you were browsing my site in some other way, like by date, those duplicate posts would show up right next to each other. I didn’t love it, and it ultimately led me to scrap the browse-by-source feature, which was strangely liberating: Instagram isn’t the source, I am. Posts that originated somewhere other than this version of my site will still say so, and will link back to where they were first published, but browsing by source is dead, and here’s how I handled consolidation challenges:
- As mentioned previously, for anything that was originally cross-posted in multiple places for maximum exposure, only the first post was kept, and the others removed. (This came up a lot when I was reformatting Flickr and Instagram data.)
- I never felt great about my Flickr photos being 1,000+ individual posts, especially since most of them are part of an album on Flickr. Luckily, very few are part of multiple albums, so each album could consolidate all its photos into a single post with a gallery. In many cases, that gallery could then be further consolidated with an existing blog post (like one of my SXSW recaps).
- Tinnitus Tracker posts swallowed a lot of Instagram posts, tweets, and blog posts. Blog posts that were merely excerpted still survive separately in their original form, and the Tinnitus Tracker post makes note of where its excerpt(s) came from, so there will be some repetition on the site, though hopefully it won’t be very noticeable. The Tinnitus Tracker post metadata also notes the URLs of the posts they swallowed, which I may or may not surface on the post page.
In the end, I eradicated over 1,300 duplicate posts!
With the exception of a handful of loose ends I intend to address post-launch, this brings us to the final tally of all the posts of many different shapes and sizes now ready to publish on V7, including this one (drumroll please!): 10,419.
I wasn’t really surprised by that number, but I still wasn’t quite prepared for it. It’s… big. And I’m crossing my fingers very, very hard that in the long, messy journey this massive amalgam of content took to get here, I didn’t make any mistakes big enough for Eleventy to refuse to turn it into a website. We’ll find out soon….