Streaming video to iPhone FAQ

Streaming video to iPhone Frequently Asked Questions

  1. What kinds of encoders are supported?

    The protocol specification does not limit the encoder selection. However, the current Apple implementation should interoperate with encoders that produce MPEG-2 Transport Streams containing H.264 video and AAC audio (HE-AAC or AAC-LC). Encoders that are capable of broadcasting the output stream over UDP should also be compatible with the current implementation of the Apple provided segmenter software.

    Apple has tested the current implementation with the following commercial encoders:

    • Inlet Technologies Spinnaker 7000
    • Envivio 4Caster C4
  2. What are the specifics of the video and audio formats supported?

    Although the protocol specification does not limit the video and audio formats, the current Apple implementation supports the following formats:

    • Video: H.264 Baseline Level 3.0
    • Audio:
      • HE-AAC or AAC-LC up to 48 kHz, stereo audio
      • MP3 (MPEG-1 Audio Layer 3) 8 kHz to 48 kHz, stereo audio
      Note: iPad, iPhone 3G, and iPod touch (2nd generation and later) support H.264 Baseline 3.1. If your app runs on older versions of iPhone or iPod touch, however, you should use H.264 Baseline 3.0 for compatibility.
  3. What duration should media files be?

    The main point to consider is that shorter segments result in more frequent refreshes of the index file, which might create unnecessary network overhead for the client. Longer segments will extend the inherent latency of the broadcast and initial startup time. A duration of 10 seconds of media per file seems to strike a reasonable balance for most broadcast content.

  4. How many files should be in listed in the index file during a continuous, ongoing session?

    The specification requires at least 3 media files be listed in the index file, but the optimum number may be larger. The client identifies an ongoing session by the lack of an #EXT-X-ENDLIST tag in the index file. The client does not allow the user to seek into the last two files in the index for ongoing broadcasts. Periodically, the client requests a new copy of the index.

    The important point to consider when choosing the optimum number is that the number of files available during a live session constrains the client’s behavior when doing play/pause and seeking operations. The longer the list, the longer the client can be paused without losing its place in the broadcast, the further back in the broadcast a new client begins, and the wider the time range within which the client can seek. The trade-off is that a longer index file adds to network overhead—during live broadcasts, the clients are all refreshing the index file regularly, so it does add up, even though the index file is typically small.

    Another point to consider is that clients typically request new copies of the index file at higher rate when the index contains a shorter list of files.

  5. What data rates are supported?

    The data rate that a content provider chooses for a stream is most influenced by the target client platform and the expected network topology. The streaming protocol itself places no limitations on the data rates that can be used. The current implementation has been tested using audio-video streams with data rates as low as 100 Kbps and as high as 1.6 Mbps to iPhone. Audio-only streams at 64 Kbps are recommended as alternates for delivery over slow cellular connections.

    Note: If the data rate exceeds the available bandwidth, there is more latency before startup and the client may have to pause to buffer more data periodically. During a broadcast using an index file that provides a moving window into the content, the client will eventually fall behind in such cases, causing one or more segments to be dropped. In the case of VOD, no segments are lost, but inadequate bandwidth does cause slower startup and periodic stalling while data buffers.
  6. What is a .ts file?

    A .ts file contains an MPEG-2 Transport Stream. This is a file format that encapsulates a series of encoded media samples—typically audio and video. The file format supports a variety of compression formats, including MP3 audio, AAC audio, H.264 video, and so on. Not all compression formats are currently supported in the Apple HTTP Live Streaming implementation, however. (For a list of currently supported formats, see “Media Encoder.”)

  7. What is an .M3U8 file?

    An .M3U8 file is a extensible playlist file format. It is an m3u playlist containing UTF-8 encoded text. The m3u file format is a de facto standard playlist format suitable for carrying lists of media file URLs. This is the format used as the index file for HTTP Live Streaming. For details, see IETF Internet-Draft of the HTTP Live Streaming specification.

  8. How does the client software determine when to switch streams?

    The current implementation of the client observes the effective bandwidth while playing a stream. If a higher-quality stream is available and the bandwidth appears sufficient to support it, the client switches to a higher quality. If a lower-quality stream is available and the current bandwidth appears insufficient to support the current stream, the client switches to a lower quality.

    Note: For seamless transitions between alternate streams, the audio portion of the stream should be identical in all versions.
  9. Where can I find a copy of the media stream segmenter from Apple?

    The media stream segmenter, file stream segmenter, and other tools are in the /usr/bin/ directory of Mac OS X computers, version 10.6 and later. These tools are frequently updated, so you should download the current version of the HTTP Live Streaming Tools from the Apple Developer website. See “Download the Tools” for details.

  10. What settings are recommended for a typical HTTP stream, with alternates, for use with the media segmenter from Apple?

    Your encoder should produce MPEG-2 transport stream (.ts) files with the following characteristics for the Apple segmenter:

    • H.264 Baseline 3.0 video
    • Keyframes every 3 seconds
    • HE-AAC (version 1) stereo audio at 44.1 kHz
    • Four streams:
      • Cellular Fallback—Audio only or audio with still image, 64 Kbps
      • Low—96 Kbps video, 64 Kbps audio
      • Medium—256 Kbps video, 64 Kbps audio
      • High—800 Kbps video, 64 Kbps audio

    These settings are the current recommendations. There are also certain requirements. The current mediastreamsegmenter tool works only with MPEG-2 Transport Streams as defined in ISO/IEC 13818. The transport stream must contain H.264 (MPEG-4, part 10) video and AAC or MPEG audio. If AAC audio is used, it must have ADTS headers. H.264 video access units must use Access Unit Delimiter NALs, and must be in unique PES packets.

    The segmenter also has a number of user-configurable settings. You can obtain a list of the command line arguments and their meanings by typing man mediastreamsegmenter from the Terminal application. A target duration (length of the media segments) of 10 seconds is recommended, and is the default if no target duration is specified.

  11. How can I specify what codecs or H.264 profile are required to play back my stream?

    Use the CODECS attribute of the EXT-X-STREAM-INF tag. When this attribute is present, it must include all codecs and profiles required to play back the stream. The following values are currently recognized:

    AAC-LC “mp4a.40.2”
    HE-AAC “mp4a.40.5”
    MP3 “mp4a.40.34”
    H.264 Baseline Profile level 3.0 “avc1.42001e” or “avc1.66.30”

    Note: Use “avc1.66.30” for compatibility with iPhone OS versions 3.0 to 3.12.

    H.264 Main Profile level 3.0 “avc1.4d001e” or “avc1.77.30”

    Note: Use “avc1.77.30” for compatibility with iPhone OS versions 3.0 to 3.12.

    The attribute value must be in quotes. If multiple values are specified, one set of quotes is used to contain all values, and the values are separated by commas. An example follows.

    #EXTM3U
    #EXT-X-STREAM-INF:PROGRAM-ID=1, BANDWIDTH=500000
    mid_video_index.m38u
    #EXT-X-STREAM-INF:PROGRAM-ID=1, BANDWIDTH=800000
    wifi_video_index.m38u
    #EXT-X-STREAM-INF:PROGRAM-ID=1, BANDWIDTH=3000000, CODECS="avc1.4d001e, mp4a.40.5"
    h264main_heaac_index.m38u
    #EXT-X-STREAM-INF:PROGRAM-ID=1, BANDWIDTH=64000, CODECS="mp4a.40.5"
    aacaudio_index.m38u
  12. How can I create an audio-only stream from audio/video input?

    Add the -audio-only argument when invoking the stream or files segmenter.

  13. How can I add a still image to an audio-only stream?

    Use the -meta-file argument when invoking the stream or file segmenter with -meta-type=picture to add an image to every segment. For example, this would add an image named poster.jpg to every segment of an audio stream created from the file track01.mp3:

    mediafilesegmenter -f /Dir/outputFile -a --meta-file=poster.jpg --meta-type=picture track01.mp3

    Remember that the image is typically resent every ten seconds, so it’s best to keep the file size small.

  14. How can I specify an audio-only alternate to an audio-video stream?

    Use the CODECS and BANDWIDTH attributes of the EXT-X-STREAM-INF tag together.

    The BANDWIDTH attribute specifies the bandwidth required for each alternate stream. If the available bandwidth is enough for the audio alternate, but not enough for the lowest video alternate, the client switches to the audio stream.

    If the CODECS attribute is included, it must list all codecs required to play the stream. If only an audio codec is specified, the stream is identified as audio-only. Currently, it is not required to specify that a stream is audio-only, so use of the CODECS attribute is optional.

    The following is an example that specifies video streams at 500 Kbps for fast connections, 150 Kbps for slower connections, and an audio-only stream at 64 Kbps for very slow connections. All the streams should use the same 64 Kbps audio to allow transitions between streams without an audible disturbance.

    #EXTM3U
    #EXT-X-STREAM-INF:PROGRAM-ID=1, BANDWIDTH=500000
    mid_video_index.m38u
    #EXT-X-STREAM-INF:PROGRAM-ID=1, BANDWIDTH=150000
    3g_video_index.m38u
    #EXT-X-STREAM-INF:PROGRAM-ID=1, BANDWIDTH=64000, CODECS="mp4a.40.5"
    aacaudio_index.m38u
  15. What are the hardware requirements or recommendations for servers?

    See question #1 for encoder hardware recommendations.

    The Apple stream segmenter is capable of running on any Intel-based Mac. We recommend using a Mac with two Ethernet network interfaces, such as a Mac Pro or an XServe. One network interface can be used to obtain the encoded stream from the local network, while the second network interface can provide access to a wider network.

  16. Does the Apple implementation of HTTP Live Streaming support DRM?

    No. However, media can be encrypted and key access can be limited using HTTPS authentication.

  17. What client platforms are supported?

    iPhone, iPad, and iPod touch (requires iPhone OS version 3.0 or later) or any device with QuickTime X or later installed.

  18. Is the protocol specification available?

    Yes. The protocol specification is an IETF Internet-Draft, at http://tools.ietf.org/html/draft-pantos-http-live-streaming.

  19. Does the client cache content?

    The index file can contain an instruction to the client that content should not be cached. Otherwise, the client may cache data for performance optimization when seeking within the media.

  20. Is this a real-time delivery system?

    No. It has inherent latency corresponding to the size and duration of the media files containing stream segments. At least one segment must fully download before it can be viewed by the client, and two may be required to ensure seamless transitions between segments. In addition, the encoder and segmenter must create a file from the input; the duration of this file is the minimum latency before media is available for download. Typical latency with recommended settings is in the neighborhood of 30 seconds.

  21. What is the latency?

    Approximately 30 seconds, with recommended settings. See question #15.

  22. Do I need to use a hardware encoder?

    No. Using the protocol specification, it is possible to implement a software encoder.

  23. What advantages does this approach have over RTP/RTSP?

    HTTP is less likely to be disallowed by routers, NAT, or firewall settings. No ports need to be opened that are commonly closed by default. Content is therefore more likely to get through to the client in more locations and without special settings. HTTP is also supported by more content-distribution networks, which can affect cost in large distribution models. In general, more available hardware and software works unmodified and as intended with HTTP than with RTP/RTSP. Expertise in customizing HTTP content delivery using tools such as PHP is also more widespread.

    Also, HTTP Live Streaming is supported in Safari and the media player framework on iPhone OS. RTSP streaming is not supported.

  24. Why is my stream’s overall bit rate higher than the sum of the audio and video bitrates?

    MPEG-2 transport streams can include substantial overhead. They utilize fixed packet sizes that are padded when the packet contents are smaller than the default packet size. Encoder and multiplexer implementations vary in their efficiency at packing media data into these fixed packet sizes. The amount of padding can vary with frame rate, sample rate, and resolution.

  25. How can I reduce the overhead and bring the bit rate down?

    Using a more efficient encoder can reduce the amount of overhead, as can tuning the encoder settings. Also, the -optimize argument can be passed to the Apple mediastreamsegmenter. This removes some unnecessary padding and can significantly reduce the overhead, particularly for low-bandwidth streams.

  26. Do all media files have to be part of the same MPEG-2 Transport Stream?

    No. You can mix media files from different transport streams, as long as they are separated by EXT-X-DISCONTINUITY tags. See the protocol specification for more detail. For best results, however, all video media files should have the same height and width dimensions in pixels.

0