Understanding IP Video Transcoding, Packaging and Delivery Requirements for Carrier-Class Providers
Introduction Multi-Screen Strategy
Online and mobile viewing of widely-available, high-quality, IP-delivered video content, including TV programming, movies, sports events, and news, is now poised to go mainstream. Driven by the recent availability of low-cost, high-resolution desktop/laptop/tablet PCs, smart phones, IP-based set-top boxes and now Internet-enabled television sets, consumers are The first set of questions focus on actual multi-screen delivery requirements associated with content preparation, delivery and monetization:
How will source content be ingested and transcoded into appropriate video and audio
container, codec formats, bit rates, etc.?
How many output container, video codec and audio codec formats must a provider be
prepared to supply, given this new world of TV, desktop and mobile devices?
How many bit rates are required to support those formats given the fluctuating bandwidth
environment of the Internet?
How many media player formats need to be supported per bit rate?
How can advertisements be inserted into these different video delivery containers?
How is rights management incorporated?
The second set of questions concerns itself more with how the above can be accomplished while meeting carrier-grade requirements, which dictate a cost-effective, scalable, reliable and manageable solution for multi-screen IP video delivery:
What equipment is required to perform these functions?
Should transcoding, packaging and delivery equipment be centralized or decentralized?
How will quality of experience be preserved such that premium content owners and viewing
subscribers are satisfied?
How does it scale?
Is it designed for high availability?
What will you learn by reading this article?
Before discussing specific requirements, it is useful to understand the evolution of IP delivery protocols – a vital element of providing TV-quality content over the Internet – and without which, true multi-screen service models could not be a reality.
Certainly, tremendous advancements in core and last mile IP network bandwidth have been achieved in the last decade around the world – primarily driven by web-based data consumption. However, video traffic is creating an exponential increase in bandwidth requirements. This, coupled with the fact that the Internet at large is not a managed quality-of-service environment, requires new methods of video transport to be considered in order to provide the quality of video experience across any device and network that consumers have come to expect from managed TV-delivery networks.
The evolution of video delivery transport has led to a new set of de facto standard adaptive delivery protocols from Apple, Microsoft and Adobe that are now positioned for broad adoption. Consequently, networks must be equipped with network elements that can take high-quality live and file-based video outputs from transcoders and ‘package’ them for delivery to devices ready to accept these new delivery protocols. The world of IP video delivery now finds itself rapidly evolving to the modern era of HTTP adaptive streaming. But, there were two distinctly different eras that preceded it, as reviewed below.
For many years, stateful protocols including Real Time Streaming Protocol (RTSP), Adobe’s Real Time Messaging Protocol (RTMP) and Real Networks’ RTSP over Real Data Transport (RDT) protocol, were utilized to stream video content to desktop and mobile clients. Stateful protocols require that, from the time a client connects to a streaming server until the time it disconnects, the server must track client state. If the client needs to perform any video session control commands like start, stop, pause or fast-forward it must do so by communicating state information back to the streaming server. Once a session between the client and the server has been established, the server sends media as a stream of small packets typically representing a few milliseconds of video. These packets can be transmitted over UDP or TCP. TCP overcomes firewall blocking of UDP packets, but may also incur increased latency – to the point of causing video stuttering artifacts – as packets are occasionally resent due to network packet loss.
These protocols served the market relatively well, particularly during the era where desktop and mobile device experiences were limited by frequency, quality, duration, screen/window size/resolution, constrained processor, memory and storage capabilities of mobile devices, etc.
However, the above factors have changed dramatically in the last few years, and that has exposed a number of stateful protocol implementation weaknesses:
Stateful media protocols have difficulty getting through firewalls and routers
Stateful media protocols require special proxies / caches for distribution
Stateful media protocols cannot react quickly or gracefully to rapidly fluctuating network
Stateful media client server implementations are vendor specific, and thus require the
purchase of vendor-specific servers and licensing arrangements – which are more expensive
to operate and maintain
The Era of the Stateless Protocol – HTTP Progressive Download
A newer type of media delivery is HTTP progressive download. Progressive download (as opposed to ‘traditional’ file download) pulls a file from an HTTP web server and allows the video file to start playing before the entire file has been downloaded. Most media players including Adobe Flash, Windows Media Player, Apple QuickTime, etc., support progressive download. Further, most video hosting websites use progressive download extensively, if not exclusively.
HTTP progressive download differs from traditional file download in one important respect. Traditional files have audio and video data separated in the file. At the end of the file, a record of the location and structure of the audio and video tracks (track data) is provided. Progressively downloadable files have track data at the beginning of the file and interleave the audio and video data. A player downloading a traditional file must wait until the end of the file is reached in order to understand the track data, but a player downloading a progressively downloadable file gets the track data immediately and can, therefore, play back the audio and video as it is received.
Unfortunately, for live streams, it isn’t possible to both efficiently store the audio/video and to create progressive download files. Audio and video track data needs to be computed and written to the front of the file after the entire file is created. Thus, it isn’t possible to deliver a live stream using progressive download, because the track data can never be available until after the entire file has been created.
Even so, HTTP progressive download greatly improves upon its stateful protocol predecessors as a result of the following:
Progressive download has no issue getting through firewalls and routers as HTTP traffic is
passed through port 80 unfettered
Progressive download employs the same HTTP download infrastructure utilized by content
delivery networks (CDNs) and hosting providers to provide web data content, making it much
easier and less expensive to deliver rich media content
Progressive download takes advantage of newer desktop and mobile clients’ formidable
processing, memory and storage capabilities to start video playback quickly, maintain flow,
and preserve a high quality experience
Adaptive HTTP streaming takes HTTP video delivery to the next level. In this case, the source video, whether a file or a live stream, is encoded into segments – sometimes referred to as ‘chunks’ – using a desired delivery format, which includes a container, video codec, audio codec, encryption protocol, etc. Segments typically represent two to ten seconds of video. Each segment is sliced at video Group of Pictures (GOP) boundaries beginning with an instantaneous decoder refresh (IDR) frame, giving the segment complete independence from previous and successive segments. Encoded segments are subsequently hosted on a regular HTTP web server.
Clients request segments from the web server, downloading them via HTTP. As the segments are downloaded to the client, it plays back the segments in the order requested. Since the segments are sliced along GOP boundaries with no gaps between, video playback is seamless – even though it is actually just a file download via a series of HTTP GET requests.
Adaptive delivery enables a client to ‘adapt’ to fluctuating network conditions by selecting video file segments encoded at different bit rates. As an example, suppose a video file had been encoded using 11 different bit rates from 500 Kbps to 1 Mbps in 50 Kbps increments, i.e., 500 Kbps, 550 Kbps, 600 Kbps, etc. The client then observes the effective bandwidth throughout the playback period by evaluating its buffer fill/depletion rate. If a higher quality stream is available, and network bandwidth appears able to support it, the client will download the higher quality bit rate segment. If a lower quality stream is available, and network bandwidth can’t support the currently used flow of segments, the client will download a lower quality bit rate segment flow. The client can switch between segments encoded at different bit rates every few seconds.
This delivery model works for both live- and file-based content. In either case, a manifest file is provided to the client, which defines the location (that is, the URL) and parameters (that is, the bit rate or other data) of each segment. In the case of an on-demand file request, the manifest is sent at the beginning of the session. In the case of a live feed, updated ‘rolling window’ manifest files are sent as new segments are created, as shown in the figure below.
Since a web server can typically send data as fast as its network connection will allow, the client can evaluate its buffer conditions and make forward-looking decisions on whether future segment requests should be at a higher or lower bit rate to avoid buffer overrun or starvation. Each client will make this decision based on trying to select the highest possible bit rate for maximum quality of playback experience.
A number of advantages accrue with adaptive HTTP streaming:
Lower infrastructure costs for content providers result from elimination of specialty streaming servers in lieu of generic HTTP caches/proxies already in place for HTTP data serving Content delivery is dynamically adapted to the weakest link in the end-to-end delivery chain, including highly varying last mile conditions Subscribers no longer need to statically select a bit rate on their own, as the client can now perform that function dynamically and automatically Subscribers enjoy fast start-up and seek times as playback control functions can be initiated via the lowest bit rate and subsequently ratcheted up to a higher bit rate Annoying user experience shortcomings including long initial buffer time, disconnects and playback start/stop are virtually eliminated
Client can control bit-rate switching – with no intelligence in the server – taking into account CPU load, available bandwidth, resolution, codec and other local conditions Simplified ad insertion accomplished by file substitution
With that explanation of Internet-based delivery protocols and why they have evolved to adaptive HTTP streaming, we will now look at the core requirements across encoding/transcoding, packaging and delivery required to make multi-screen services a viable carrier-class offering.
The transcoder (or encoder, if the input is not already compressed) is responsible for ingesting the content, encoding to all necessary outputs, and preparing each output for advertising readiness and delivery to the packager for segmentation. The transcoder must perform the following functions for multi-screen adaptive delivery – and at high concurrency, in real time and with high video quality output.
Video Transcoding Transcode the output video to a progressive format, which requires the transcoder to support input de-interlacing
Transcode the input to each required output profile – where a given profile will have its own resolution and bit rate parameters – including scaling to resolutions suitable for each client device. Because the quality of experience of the client depends on having a number of different profiles, it is necessary to encode a significant number of output profiles for each input. Deployments may use anywhere from 4 to 16 output profiles per input. The table below shows a typical use case for the different output profiles:
|Width||Height||Video Bit rate|
GOP-align each output profile such that client playback (shifting between different bit rate ‘chunks’ created for each profile) is continuous and smooth
Audio Transcoding Transcode audio into AAC – the codec used by adaptive delivery protocols from Apple, Microsoft and Adobe
Add IDR frames at ad insertion points, so that the video is ready for SCTE 35 ad insertion. It is also potentially possible to align chunk boundaries with ad insertion points so that ad insertion can be done via chunk-substitution rather than traditional stream splicing.
Ingest Fault Tolerance
The transcoding system needs to allow two different transcoders that ingest the same input to create identically IDR-aligned output – contributing to strong fault tolerance. This can be used to create a redundant backup of encoded content in such a way that any failure of the primary transcoder is seamlessly backed up by the secondary transcoder.
To realize the benefits of HTTP adaptive streaming, a ‘packager’ function – sometimes referred to a ‘segmenter’, ‘fragmenter’ or ‘encapsulator’ – must take each encoded video output from the transcoder and ‘package’ the video for each delivery protocol. To perform this function, the packager must be able to:
Ingest Ingest live streams or files, depending on whether the work flow is live or on-demand
Segmentation Segment chunks according to the proprietary delivery protocols specified by Microsoft Smooth Streaming, Apple HTTP Live Streaming (HLS), and Adobe HTTP Dynamic Streaming
Encrypt segments on a per delivery protocol basis (in a format compatible with each delivery protocol) as they are packaged, enabling content rights to be managed on an individual session basis. For HLS, this is file-based AES-128 encryption. For Smooth Streaming, it is also AES-128, but with PlayReady compatible signaling. Adobe HTTP Dynamic Streaming uses Adobe Flash Access for encryption.
Integrate with third party key management systems to retrieve necessary encryption information.
o Note: Third party key management servers manage and distribute the keys to clients. If the client is authorized, it can retrieve decryption keys from a location designated in the manifest file. Alternatively, depending on the protocol used, key location can be specified within each segment. Either way, the client is responsible for retrieving decryption keys, which are normally served after the client request is authenticated. Once the keys are received, the client is able to decrypt the video and display it.
It can be valuable to have an architecture that allows the transcoder and packager to be centralized or distributed.
Centralized Packaging In this scenario, the input video is transcoded once at a central location, delivered over a core network to each edge point, and then packaged into multiple formats at the edge of the network.
While this approach minimizes the overall number of packagers and network elements, each end client final delivery format (number of client protocols x number of bit rates per client protocol) must be transmitted over the core network to each edge point, significantly increasing bandwidth consumption.
Decentralized Packaging Alternatively, packaging can be disaggregated from the transcoder and moved to edge distribution points. In this scenario, however, MPEG-2 transport streams are transported over the core network via UDP – a connectionless protocol that can suffer data loss.
Service providers must, therefore, balance the bandwidth savings on the core network with potential lower quality of experience (depending on the quality of the network) or use a more reliable core transport protocol.
The final step in this process is the actual delivery of segments to end clients – the aforementioned desktop/laptop/tablet PCs, smart phones, IP-based set-top boxes and now Internet-enabled television sets.
Optimal delivery network design must take into consideration several content type, device type, delivery protocol type and DRM options. This section discusses a few technical considerations regarding the best protocols to use, the optimal number of profiles, and issues with DRM integration.
Live vs. File Delivery In the case of live delivery, it is possible to serve segments directly from the packager when the number of clients is relatively small. However, the typical use case involves feeding the segments to a CDN, either via a reverse proxy ‘pull’ or via a ‘push’ mechanism, such as HTTP POST. The CDN is then responsible for delivering the chunks and playlist files to clients.
The same delivery model can also be utilized in video-on-demand (VOD), but VOD also offers the alternative of delivering directly from the packager, even to a large number of users. However, with VOD delivery, it is sometimes desirable to distribute one file (or a small number of files) that contains all the chunks together; referred to as an aggregate format for the content. Distributing one file allows service providers to easily preposition content in the CDN without having to distribute and manage thousands of individual chunks per piece of content. When a client makes a request, the aggregate file is segmented ‘on the fly’ for that client, using the client’s requested format. The trade-off is that while the CDN and file management is simpler, more packagers are required – ‘centralized’ packagers that create and aggregate the chunks and ‘distributed, edge-located’ packagers that segment the aggregation format (on demand) into actual chunks delivered to clients.
Output Profile Selection The optimal number of profiles, bit rates and resolutions to use are very service-specific. However, there are a number of generally applicable guidelines. First, what are the end devices and what is the last-mile network? The end devices drive the output resolutions. It is desirable to have one or two profiles to service the high-quality video service for the device, and these would be encoded at the full resolution of the target device. For PCs, that’s typically 720p30.
Looking at the delivery network, for mobile distribution, it is typical to use very low bandwidth profiles. Even 3G mobile networks, which have relatively high peak bandwidths of several hundred kbps may fall back to much lower sustained bandwidths required for video streaming. WiFi networks have higher capacity, but also suffer from potential degradation depending on the distance to the base station or composition of walls between the transmitter and receiver. DSL distribution to PCs also varies widely in bandwidth capacity. And almost all last-mile networks suffer bandwidth reduction caused by aggregation, for example at a cable node or at the DSLAM. The table below suggests the number of output profiles in different scenarios: Protocol Selection Which of Apple HLS, Microsoft Silverlight Smooth Streaming or Adobe HTTP Dynamic Streaming is the optimal choice for a service provider? Each protocol has its own appeal, and so service providers must carefully consider the following in making a delivery protocol selection:
|Network||Device||Profile: Bit rate||Resolution|
|3G Mobile (3 profiles)||Phone||3G-Low: 100kbps||320×180|
|4G (3G profiles + 1)||Phone||4G-High: 650kbps||640×360|
|WiFi (5 profiles)||Smart Phone / Pad / PC||WiFi-ulow:350kbps||320×180|
|Broadband (8 profiles)||Pad / PC / Console / STB||350 Kbps||320×180|
Adobe has a huge installed client base on PCs. For service operators that want to serve PCs and do not want to distribute a client, this is a big benefit. The availability of Adobe’s server infrastructure, including backwards compatibility with RTMP and Adobe Access, may also be appealing to service operators.
Apple HLS uses MPEG-2 transport stream files as chunks. The existing infrastructure for testing and analyzing TS files makes this protocol easy to deploy and debug. It also allows for the type of signaling that TS streams already carry, such as SCTE 35 cues for ad insertion points, multiple audio streams, EBIF data, etc.
Microsoft Smooth Streaming has a very convenient aggregate format and provides an excellent user experience that can adapt to changes in bandwidth rapidly, as it makes use of short chunks and doesn’t require repeated downloads of a playlist. Smooth Streaming is also an obvious choice when content owners require the use of PlayReady DRM.
Redundancy and Failover Transcoder redundancy is typically managed using an N:M redundancy scheme in which a management system loads one of M standby transcoders with the configuration of a failed transcoder in a pool of N transcoders. The packager component can be managed similarly, but it can also be managed in a 1:1 scheme by having time-outs in the CDN root failover to the secondary packager. Avoiding outages in these scenarios involves making sure that the primary and backup packagers are synchronized, that is, they create identical chunks.
DRM Integration DRM integration remains challenging. Broadly, there are two approaches:
The first uses unique encryption keys for every client stream. In this case, CDN caching provides no value. Every user view is a unicast connection back to the center of the network; network load is high; but the content is as secure as possible.
The second approach uses shared keys for all content, but keys are only distributed to authenticated clients. CDN caching can then lead to significant bandwidth savings, but key management still requires a unique connection to the core for each client. Fortunately, these connections are far lower bandwidth than the video streams.
Different DRM vendors provide different solutions, and interoperability between vendors for client authentication doesn’t exist:
Adobe uses Adobe Access to restrict access to streams, giving a unified, single vendor work flow Apple HLS provides a description of the encryption mechanism, but leaves client authentication as an implementation decision
Microsoft’s PlayReady is a middle ground. Client authentication is well specified, but the interfaces between the key management server and the packager component is not. This means some integration is typically required to create a fully deployed system.
RGB Networks provides video delivery service providers with a comprehensive product family that addresses the full spectrum of IP video transcoding, segmentation and delivery requirements through its Video Multiprocessing Gateway (VMGTM) and TransAct Packager.
The VMG can transcode input video streams into multi-bitrate, multi-resolution streams suitable for packaging into Apple-, Microsoft-, and Adobe-enabled client devices including desktops, laptops, tablet PCs, smart phones, set-top boxes and Ethernet-enabled TV sets.
Transcoded streams are then sent directly to a Packager – co-located or distributed at the network edge. Packaged streams are delivered directly to clients, service provider owned origin web servers, or a third party CDN partner for distribution to end devices. By separating transcoding and packaging functionality, content delivery providers have the flexibility to deploy in centralized or distributed manners.
Designed for Multi-Screen IP Delivery RGB’s Video Multiprocessing Gateway (VMG) product line is the industry’s premier high-density, high-capacity, carrier-class platform designed for the delivery of advanced video processing solutions for standard definition (SD) and high definition (HD) MPEG-2 and MPEG-4/H.264 programming. Its capabilities include transcoding, HD-to-SD down conversion, transrating, program substitution and advanced ad insertion.
The system was designed from inception to be a truly converged platform capable of cost-effectively delivering video and audio solutions for cable TV transport, PC Internet and mobile WiFi/3GPP delivery.
All features are delivered in a highly integrated, yet flexible configuration. Application and control modules within the chassis perform a wide range of IP delivery functions including:
World-class transcoding/transrating video quality with support for MPEG-2 and H.264 Audio processing, including multiple audio formats and audio leveling Ad and program insertion Progressive, scaled IDR-aligned output, suitable for adaptive HTTP streaming Inter-chassis IDR alignment
Designed for Carrier-Class Deployment The VMG is specifically designed with carrier-class attributes, including scalability, reliability and manageability. No other product in the industry packs the same functionality, density, performance and carrier-grade attributes into a single platform for multi-screen IP video delivery.
Up to 144 transcoded HD streams or 432 different IDR-aligned profiles for adaptive streaming per 13 RU chassis – critical to the expansion of transcoded outputs required to satisfy multi-screen adaptive bit rate delivery
High Availability Carrier-class platform with multi-level redundancy ensures uninterrupted service through: Chassis component redundancy, including hot-swappable redundant fans and power supply
and a redundant star topology back plane network 1:1 controller module redundancy N:M application module redundancy
o Service/program-level redundancy on both inputs and outputs, allowing for dual-homing upstream network redundancy
Flexible Scalability ‘Pay-as-you-grow,’ modular design allows for initial capital deployment matched to the immediate service level requirements Easy growth path managed by adding modules within the chassis, as well as licenses on the modules
Two different chassis sizes provide flexibility in meeting current service and space needs, while allowing for a growth path through enabling licenses or adding modules to available slots
Operational Simplicity Functions typically handled by different devices in a legacy headend are now performed by blades within an integrated chassis A unified management interface transparently pools all available resources and allows a single, simple interface to multiple chassis functions
Low Cost of Ownership Designed to reduce capital expenditure via:
integrated Designed to reduce operational expenses via:
Innovative Future-proof Technology Expandable chassis functionality through addition of new modules Best-of-breed optimal processor utilization:
Designed for Multi-Screen IP Delivery RGB’s TransAct Packager ingests H.264 video over MPEG-2 transport streams from the VMG and outputs Apple HTTP Live Streaming (HLS), Microsoft Smooth Streaming, and Adobe HTTP Dynamic Streaming segmented streams for consumption by Ethernet-enabled TVs, set-top boxes, desktop/laptop computers and mobile devices.
Additionally, the Packager can encrypt traffic using AES-128 for Apple HTTP Live Streaming (HLS) and PlayReady for Microsoft Silverlight Smooth Streaming, integrating key exchange with leading Digital Rights Management (DRM) servers.
Flexible Adaptive Streaming Protocol Support Apple HLS Microsoft Smooth Streaming Adobe HTTP Dynamic Streaming
Flexible DRM Interoperability AES-128 and PlayReady DRM handling Interoperability with multiple third party key management systems
Flexible Delivery Direct delivery to end clients or to CDN roots via reverse proxy WebDAV or NFS push of HLS files HTTP Post to IIS servers for Smooth streaming
Designed for Carrier Class Deployment
High Performance Segmentation Engine Intel-based processing farm equipped with significant processing and memory performance required for low latency live- or file-based content segmentation
Obsolescence Protection Able to quickly take advantage of latest Intel processor/memory and architecture advancements Able to quickly roll in new codecs and containers as video compression technologies evolve
Flexible Deployment Can be deployed as a standalone appliance – RGB’s Application Media Server (AMS) Available as a software license for customer-provided hardware Deployable as part of the VMG on the AMP blade
Secure, Scalable Manageability Web browser GUI Secure shell (SSH) access SNMP traps Remote upgrades
Example Deployment Scenarios
Scenario 1 – Centralized Delivery In this scenario, content is ingested by the VMG, transcoded into multiple bit rates, packaged into segments, and provided directly to Apple HLS clients, or to a Microsoft IIS server for Smooth Streaming delivery to Silverlight clients from a centralized point within a single content delivery provider network.
Scenario 2 – Distributed Delivery In this scenario, content is ingested by the VMG, transcoded into multiple bit rates, and provided to an edge-located Packager for subsequent delivery to Apple HLS clients or to a Microsoft IIS server for Smooth Streaming delivery to Silverlight clients. The ‘edge’ can either be within the content provider’s network or in a third party CDN distribution infrastructure.
Online and mobile viewing of premium video content is rapidly becoming an accepted, if not required, complement to the traditional TV experience. As a consequence, video delivery service providers must now concern themselves with the challenge of extending their controlled TV delivery experience to a high-quality multi-screen IP video experience over the Internet.
This new world means providing quality delivery of premium content over uncontrolled, congested networks where evolving delivery protocols, a variety of devices, rights management, in-house or third party CDN interoperability, and creative monetization strategies abound.
High-performance transcoding and adaptive streaming delivery via Apple HLS, Microsoft Smooth Streaming and Adobe HDS enable high-quality video consumption experiences over the Internet.
Content providers must evolve traditional TV delivery infrastructure with products that not only embrace rapidly evolving transcoding and delivery technologies, but do so in a manner that meets strict carrier-grade standards.