Return on Human Input in Content
Or maybe a better title would be, The Automation Valuation Framework. The goal here is to think concretely about the decision of automating previously human tasks in content and media. I’ll begin with the disclaimer, this is an ugly work in progress, with the goal of hopefully getting some smarter folks to provide their thoughts or direct me to some good articles.
Also, read on to see me trying and failing at doing Stratechery style drawings.
A Spoken Newsletter?
In a conversation at the (always wonderful) Machines+Media conference, someone mentioned they would love to have our Edge-produced newsletters read out loud. It could potentially be delivered on a Amazon Echo / Google Home, or even potentially pushed to a podcasting platform.
We got excited because we already take great care to write our newsletter blurbs in a conversational voice, and realized this could be a perfect extension of already existing efforts. We have also been wanting to experiment with Amazon’s text-to-speech API, and it felt like the gods had decided our next product test for us.
We began copying in the text of a few newsletter blurbs and it felt promising. There were certainly some quirks: if you write augmented reality as AR and not A.R., it is spoken like a pirate, Arrr. Sarcasm is not remotely discernible; neither is any parenthetical text. Any reference to external articles is difficult to understand. But overall, it wasn’t that bad.
Automating Newsletter Blurbs
This wasn’t the first time we’ve weighed automation versus good old-fashioned human input. The blurbs in our client newsletters are typically written by one of our writers. We’ve done a good amount of testing with AI-powered summarization tools, but received qualitative feedback that our writing style made a big difference in making the newsletters engaging.
In terms of newsletter production, we still ascribe a good deal of value to our machine-learning driven curation, but in reality, personality, humor and insight take them to the next level. The projects featuring original blurbs are always the best performing, but they certainly take a good deal of time and don’t come cheap 🙂
We began talking about ways to overcome the audio quirks from text-to-speech. Could we “invent” or adapt a specific writing style that would translate directly to automated spoken voice? Or could we possibly approach the problem from the other end. There is apparently a Speech Synthesis Markup Language (SSML) that allows you to add a high level of customization to Amazon’s text-to-speech.
The Simple and Boring Truth
As visions of a scalable text-to-voice service danced in our head, someone pointed out the obvious. It would be cheap and quick to have someone read and record the newsletter blurb text, making any needed verbal adjustments effectively on the fly. They could potentially deliver a decent spoken recording in 1, maybe 2 takes, and by “decent” it would be exponentially more engaging and natural than using the text-to-speech API. Someone who is not a professional voice artist, using amateur equipment, could create a spoken newsletter that is light years better than the automated product in less than an hour.
It got us thinking about the business decision behind automating content production, and what is the return on human input? In this case, a small, non-expert amount of human input has an outsized return relative to the automated product. With our newsletter blurbs, it requires a strong writer a few hours to achieve the same incremental increase in quality relative tothe automated summaries. Again, that quality discrepancy is what makes for a great product, but it is not scalable or cheap.
The flipside, or a process that lends itself to automation, would be anything involving a large number of calculations or data processing, where each incremental unit of human input will most probably increase the propensity for mistakes and reduce the overall quality of output.
Graph — Spoken Newsletters
As the proud owner of a Microsoft Surface Pen, I tried to Stratecherize this. The x-axis goes from devoting significant resources to automation on the far left, moving towards a very high level of effort of human input, with no added technological layer on the right. The y-axis measures quality and usability of the product.
Simple plugging in the text-to-speech API makes the newsletter unreadable. As you move to the right, with just a small amount of human input you can dramatically increase the usability and quality of the “spoken newsletter” — it doesn’t need to be NPR-quality, but someone could listen through it. As you increase human effort into the production, you see incremental returns. On the other side, if you invest a good deal (hypothetically) into SSML, maybe you will move into a somewhat ‘listenable’ product.
Graph — Article Blurbs
If you feed article text into a plug and play summarization API, the output won’t be unusable but won’t be very good. If you begin investing more (moving left on the y-axis) into the summarization side, you can improve the product to where it’s not too bad, or usable, for the reader.
Moving to the right, it’s a fairly linear correlation between input (finding a good writer, increasing the time they spend on it) and the quality of the blurb.
Graph — Analyzing Millions of Documents
We’ve done a few data projects that begin with parsing large numbers of documents. This is a situation where it’s effectively impossible with no technological aid, and linearly improves the more time and effort you put into automation.
Graph — Self-Driving Cars
I acknowledge this one doesn’t really make sense, but I’m curious if anyone has any thoughts on what this might look like. Humans driving with no technological aid (I’m thinking any safety mechanism) is a scary thing, but still functional. The more technology you add, it should make for better quality, but at least until Level 5 self-driving is achieved, and there is no human check, things can be usable but not great with full automation.
A 2D visualization might not be the right way to represent these ideas and I’m still thinking through how to represent this. The decision-making behind the automation of existing human processes (especially related to content) is a fascinating one for us, and if you have any direction of resources, frameworks or visualizations that would help me think through this, I’d love to hear from you (ranjan at theeedge.group or @ranjanxroy).