The transfer of content fascinates me. In Claude Shannon’s Information Theory model, he separates it into four components: information source, information channel, signal, and message.
An information source can potentially transmit any message, and strange things happen when you define even a part of that message, since the probability that the next thing you add to the message has become more deterministic. (If you want to read more, this is called the Markov process—predictive text is a good example.)
He also introduces “noise” to the system, and that noise is typically introduced to the message in the signal path. This leads to degradation. Cryptographics and compression can also add noise to the message.
And so when we transfer data, there is often some loss of signal due to noise. The message can still be reconstructed, however, thanks in part to the Markov Process, and the meaning can be conveyed.
Message integrity aside, this interplay between information and loss of information is a fairly profound area of discussion. If you go deeper, you will quite literally end up on the current theoretical edge of modern black hole cosmology. That’s not something I want to go deeper into here, although I’m always happy to chat about it over a few pints. But it sets the scene for a “thought experiment” looking at something closer to home for Streaming Media Europe readers—what we usually think about in terms of information storage. I want to spend some time exploring the very, very long view on that topic, but before I do, I want to add one further piece of contextual thinking.
In these pages, we think about information, message, and signal in terms of telecommunications channels and, more loosely, “streaming.” Over the years, I have been asked to produce a simple description of what “streaming,” or more specifically a “stream” is. I don’t think there is an exact definition, and certainly not a narrow one. It is more a range of technologies that work together to convey a message.
While ill-defined, I feel there is a role for “time” in making a separation between a video-on-demand file transfer and a stream. A file transfer is thought of as a discrete process. Typically the message (the video) is not decoded before being conveyed in entirety. On the other hand, the message in a stream can begin to be decoded long before the complete message is conveyed.
In the streaming world, we often think about time in terms of latency and synchronization. Our timescales are usually short. Typically in the industry we encounter milliseconds, seconds, hours, days, weeks, months, years and, most important for this discussion, the “long tail.”
Long-tail content in modern “cat video” terms means it has been stored on disc or disk (or comparable media) somewhere, typically for many years, and is always available for random access.
Over time, it may well have been stored on a number of different physical media. Indeed, it may have been replicated to many locations, “striped” across multiple devices, and even transcoded and compressed for storage efficiency.
Returning to information theory, all of those processes can be interpreted as “the message being transmitted from an information source to a destination.”
Digital long-tail content is constantly, if slowly, moving from storage media to storage media. At the moment, its longevity is largely an economic factor. Eventually, the economic benefit of storage no longer makes sense in terms of the cost of ensuring its integrity, or even keeping a copy at all. Even looking back via the waybackmachine.org, you get a feel for how quickly the data of websites can stick around in any depth. The older sites are only one or two pages deep.
In a way, long-term storage can be compared to a content delivery network cache, but operating with a time-to-flush timescale of decades instead of hours or days. That cache has certain operating limits. They can be driven by the cost of “live” online performant storage or the cost of cheaper “archive storage,” and compression can increase the availability of the storage to some extent. Those costs are underpinned by the raw materials for the media and the energy requirements to keep the systems available.
Regardless of the fact that we may well find other ways to destroy the data ourselves—or simply fail as a species to keep the energy flowing to sustain the storage media—our cat videos are at the forefront of a decision. We will soon have to decide what data humankind needs to let go of, and what doesn’t make the grade of being “legacy” content for future humans to benefit from..
The 4,000-year-old clay tablets carved with Sumerian cuneiform convey a human information legacy through time to our generation. Clay and rock are good information channels! There are few “inter-generational legacy” systems that have proven as resilient.
I don’t think there is any engineer or streaming practitioner among us who would offer such a futureproof guarantee on the ability of any modern streaming and information systems to preserve our cat videos for 4,000 years. I can’t even play a cassette from my own childhood any more!
But let’s suppose we start to form a practical model for our “4,000-year Guaranteed Intergenerational Legacy Information Channel.” It would presumably have some inherent capacity limitations. There would need to be an energy model, and an economic model for maintaining it. In realistic terms, it is hard to envision such a system lasting 100 years, let alone 4,000 years. Even if we make a series of such systems, we also have to accept that there is a risk of loss as the videos get transferred and transcoded from one to the other, introducing a “finiteness” to the number of times we can even move the messages from one system to another without loss or without extra care (cost, etc).
This means we will have to become selective about what messages we decide to preserve and pass forward in time. Not just to our kids, but down through many generations to our descendants.
Logically, new messages may be more relevant than old, and the long tail may be expired that way, so CCTV videos may be relegated to oblivion sooner than the cat videos. But at least some of those old videos will need to be flagged, somehow, as “important: preserve.” But what is “important,” and to whom? And if storage is limited, won’t the “important” videos eventually fill up all the drives we can ever manufacture or provide energy for, at which point new information will have nowhere to be stored at all without losing some “important” information. (And wouldn’t we then need to classify that lost information as “not quite as important as we originally thought?”)
So considering streaming, file transfer, or even leaving the long tail simply unused and archived as the storage media changes around it over years, there’s much to think about for anyone wanting to leave a video as an intergenerational legacy.
In the 1970s, the Pioneer space launch took a gold disc with some etchings and a phonographic audio recording out of the solar system. That tiny bit of information preserved in gold is intended to be the last bit of human information that our species tried to protect from the inevitability of our “local” future, beyond the point where our sun returns everything in the solar system back to disordered energy. Even if we managed to compress all the long-tail content (and all the short-tail too, for that matter) and pack it off safely on a spaceship, we’d probably have to accept that it would eventually be lost into a black hole.
Until then, we need to keep thinking about the problem of passing on our digital legacy through the generations. Otherwise, the information age may ironically leave behind almost no long-term legacy. In this particular regard, that 4,000-year-old Sumerian clay tablet is clearly superior to anything digital!