Kairos: It's the Metadata, Stupid

Peter Agelasto is one of the most fascinating people I know. As many great stories go, we met in a chance encounter after a long series of other chance encounters. Some would say this is fate. Others? Chaos theory in action. Whatever the case, my world forever changed the day I sent him an email asking him to join me on a massive website project. He didn't know this at the time, but I would have been 100% screwed had he not accepted. I had zero experience in the industry at that time, and I could have easily botched the project and burned several relationships along the way.

Luckily, "fate" intervened.

Many things made Peter special, but I most appreciated the combination of creativity and conviction. There was also a curious juxtaposition between his profession (web development) and his academic training (archeology). But to him, there was no conflict. The web was all about giving the world a platform to create and share media. Archeology was all about discovering and deciphering artifacts and media from previous cultures.

It was this experience that gave him a long term view while most people were so hyper-focused on the present. To Peter, it didn't matter if we could create content that could be consumed with today's devices for today's business models. What happens when we try to show our children videos from today? What about our grandkids? There are videos games from my childhood that can never be played again because I lack access to a functional computer from that time period.

To truly unlock the media, a system needed to be designed to convert, distribute, and playback at any point in the future.

Conversion, Distribution, and Playback

Conversion: This is critical because a 100GB, 4K video might look amazing on a new smart TV but would completely destroy a low-end smartphone with a weak network connection. Beyond file size, different hardware devices (think the 100 variants in the fragmented, Android ecosystem) require different formats. In short, optimizing for playback on Firefox in 2009 resulted in media that wouldn't even play on any browsers in 2019.

Distribution: This is critical because social media and content platforms fragmentation has exploded. Customers cut their one cable subscription and now subscribe to one to six different al la carte services. For the content to make an impact it had to be able to live where users wanted to find and consume that content... and that meant it had to be everywhere.

Playback: This is critical because this is the moment where the media comes to life and is experienced (be it by humans or by machines within some form of data processing). And as technology changes ever more rapidly, it becomes even more challenging to retain devices from our past that can still play media optimized for a previous generation.

Most of the time, these three processes hinged upon correct and comprehensive metadata (or data about the data). Everything from a video's recording codec and compression rates down to the ownership and licensing information were critical. All told there are up to 200 standard metadata fields for video alone, all of which can provide vital context and valuable information for the processes of conversion, distribution, and playback.

Decoupling Media and Metadata

Unfortunately, most systems still decouple metadata from its content. Content management systems (CMS's) like Drupal have incredibly powerful and flexible content models that would allow you to add external fields that described the data. But why couldn't the media speak on its own behalf? And what happened when a video or image was downloaded and detached from its data?

The scientific journal industry solved this one way through the issuance of a Digital Object Identifier or DOI. Effectively this became a global, public pointer that could link to the current location that described the paper. Journals may come and go, split off, merge, go defunct, etc. However, in theory, the pointers would get updated such that a paper could be cited for decades (centuries?) to come.

While the DOI solution worked for journals, it's not appropriate for all media. Scientific papers are typically laboriously reviewed and infrequently edited once they have been peer-reviewed and published. They are also text heavy with a few graphics and figures that again keep the overall storage size low. And while the number of publications is increasing, the total number of records to track on a public pointer system is relatively easy.

Contrast this with images. On Instagram alone in 2018, over 95 million photos are uploaded per day. In total, over 40 billion (with a 'b') have been uploaded to the platform since its conception. That's an incredible amount of media. And for the most part, as long as Instagram exists, there is no problem. Every photo has some amount of information associated with each photo. We know its author, the time it was published, people in the picture (if they tagged them) and perhaps location information (if it was tagged).

However, imagine a world where Instagram went away (or at least drastically transformed from what it is today). Sounds crazy, I know. However, users were equally surprised when Flickr and Photobucket had a similar fate. Millions of photos, some of which were in the public domain with creative commons licenses, were at risk of deletion. Theoretically, the Internet Archive could take low res snapshots of some of the publicly available information on the site, but what about the information behind paywalls or requiring authentication? And what about in people's private iPhoto libraries? All of this data (and the information about them) is not necessarily stored in the one place that moves with it (e.g. the metadata fields themselves). If in an urgent rush the media had to be moved, would we lose its rich history along with it?

Enter Kairos

With Kairos, we inverted the approach. We believed it was inappropriate to separate the metadata from the source media. To that end, we trained our production team to take the highest resolution version out of Final Cut and describe the content with as much detail and as possible. This way, the information wouldn't be lost during the conversion or distribution processes. All we had to do was map the internal metadata fields to the destinations where playback occurred (e.g., RSS feeds, iTunes, a Drupal CMS, etc.).

It was a thing of beauty! And while the website that we originally built the site for is no longer there, the media in storage still has all the information inside each video asset. So if the business or owners later decide to do something with their media library, they can create a new set of mappings and connections and push to where they can best serve their customers.

A later version of Kairos (now called Starchive) powers the entire 50 year Bob Dylan archive. This was a fantastic feat. And it's smart business given that the content will continue to need to be mined, converted, distributed, and licensed for generations to come.

Sure, it is possible to continually update a content management system (CMS) or digital asset management system (DAMS) every few years to continually keep up with the changing state of the art. Or you can take our approach and store the context within the media itself so that systems can wrap around the media.

Sure, AI and machine learning might be able to take a random hunk of media and start to identify and add context to a metadata less photo or video. But it's unlikely it will ever get the artists perspective and meaning as to why the content was produced and the story they wished to tell.

Kairos: it's all about the metadata. And I have Peter to thank for the fantastic education and experience.

See previous: Create Once, Play Everywhere

haxcms