V7: The Great Data Migration, Part 2

Once more, with feeling

September 10, 2024

From the beginning, it was clear that data migration was going to be this redesign’s biggest, most cumbersome task, as the site was growing from 600-some blog posts to untold thousands. I assumed that reformatting the mountain of data arriving in disparate configurations from over a dozen external sources (as described in my previous post) would be the lion’s share of the work, and it would be smooth sailing from there. How wrong I was! Reformatting content previously published on my own site—and merging it with the newly reformatted external data—introduced some thorny complications and wound up taking me an entire month.

Part of this was due to the old posts moving into a richer environment, with some bits of metadata not included in the previous version of the site needing to be added manually (like location data for posts about travel). Other work resistant to automation involved cleaning up somewhat haphazardly assembled figure includes and bits of hybrid HTML/Markdown for article furniture that I never got around to componentizing (like tables of content for longer posts). But it’s all tidied up now. In the seven years (almost to the day!) since I last redesigned the site, I’ve become much more adept at working with structured data, so now virtually every blog post component whose needs exceed the limitations of straightforward Markdown is stored in carefully structured JSON or YAML. It’s a gift to my future self, although I’m sure future me will be disappointed in plenty of other things I don’t currently know I’m doing wrong. C’est la vie!

Anyway, for as much trouble as that stuff was, the biggest content strategy headache turned out to be consolidation. Take Tinnitus Tracker. I first launched it in 2019, but its content goes back to the early 1990s, and many of its backdated posts incorporate bits previously posted elsewhere, like Flickr photos, Instagram videos, and text excerpts from blog posts. In deciding how best to minimize repetition on the site, I had to rethink some of my ideas about how the site is organized and how it addresses the origins of its content.

I had always planned to allow users to browse posts by source; for example, browsing the posts on my site sourced from Instagram would effectively replicate the experience of scrolling through my profile on Instagram. But probably more than half of those Instagram posts are incorporated into Tinnitus Tracker posts, which means if you were browsing my site in some other way, like by date, those duplicate posts would show up right next to each other. I didn’t love it, and it ultimately led me to scrap the browse-by-source feature, which was strangely liberating: Instagram isn’t the source, I am. Posts that originated somewhere other than this version of my site will still say so, and will link back to where they were first published, but browsing by source is dead, and here’s how I handled consolidation challenges:

As mentioned previously, for anything that was originally cross-posted in multiple places for maximum exposure, only the first post was kept, and the others removed. (This came up a lot when I was reformatting Flickr and Instagram data.)
I never felt great about my Flickr photos being 1,000+ individual posts, especially since most of them are part of an album on Flickr. Luckily, very few are part of multiple albums, so each album could consolidate all its photos into a single post with a gallery. In many cases, that gallery could then be further consolidated with an existing blog post (like one of my SXSW recaps).
Tinnitus Tracker posts swallowed a lot of Instagram posts, tweets, and blog posts. Blog posts that were merely excerpted still survive separately in their original form, and the Tinnitus Tracker post makes note of where its excerpt(s) came from, so there will be some repetition on the site, though hopefully it won’t be very noticeable. The Tinnitus Tracker post metadata also notes the URLs of the posts they swallowed, which I may or may not surface on the post page.

In the end, I eradicated over 1,300 duplicate posts!

With the exception of a handful of loose ends I intend to address post-launch, this brings us to the final tally of all the posts of many different shapes and sizes now ready to publish on V7, including this one (drumroll please!): 10,419.

I wasn’t really surprised by that number, but I still wasn’t quite prepared for it. It’s… big. And I’m crossing my fingers very, very hard that in the long, messy journey this massive amalgam of content took to get here, I didn’t make any mistakes big enough for Eleventy to refuse to turn it into a website. We’ll find out soon….

21 posts in this series

January 1, 2020

V7: Introduction

Redesigning my site in public

Part of a series

Welcome to RobWeychert.com V7! There are a number of new things I want to try with my site, from structure to aesthetics to code, and so it’s time to begin a fresh redesign. Inspired by my friends Jonnie and Frank, I’ve decided to do it in public from the ground up. I’m starting with bare-bones HTML and as the design process unfolds, each step will be reflected on the site in real time and documented… See more →

21 posts in this series

January 1, 2020

V7: Introduction

January 4, 2020

V7: The “viewport” meta tag

January 8, 2020

V7: Content priorities

January 14, 2020

V7: Structural challenges

February 9, 2020

V7: Timeline section inventory

March 3, 2020

V7: The timeline is taking shape

June 24, 2020

V7: On dependency

December 5, 2020

V7: Choosing a CMS

January 24, 2021

V7: Beginning data migration

November 25, 2022

V7: Renewed purpose

May 4, 2023

V7: The Procrastination Destination

May 24, 2023

V7: Eleventy it is

June 1, 2023

V7: Expanding scope

August 8, 2023

V7: Metadata structure and sitemap

Metadata structure

July 26, 2024

V7: The Great Data Migration

September 10, 2024

V7: The Great Data Migration, Part 2

September 9, 2025

V7: Launch day

October 13, 2025

V7: Video Killed the Web Browser Star

January 4, 2026

V7: Typographic scales and technical pens

June 22, 2026

V7: Backfilling metadata

July 1, 2026

V7: Say hello to my listening diary