Skip to main content

Roku Integration

Roku devices do not support playing full screen HLS video+audio while also playing a separate music stream via the Feed.fm SDK as other devices do. In order to support adding Feed.fm music to streaming HLS videos on Roku devices, we’ve created a service that can dynamically mix Feed.fm music into a single HLS video+audio stream suitable for consumption by the Roku media player. This service:

  • is distributed as a Docker image
  • provides a unique music stream to every end user, and supports all DMCA and First Play music stations streamed from Feed.fm
  • works with live and VOD streams
  • works with encrypted HLS streams
  • re-mixes in new music when a user rewinds, to ensure DMCA playback restrictions are maintained
  • can mix in audio at different volume levels, which may be selected by the end user, as alternate audio tracks
  • adds a subtitles playlist that allows the Roku device to display the title/artist/track name of the currently playing song

This service has been authored by Feed.fm, but must be run in your datacenter / cloud infrastructure.

This guide will discuss how the service works, how to set it up, and how to test it out.

Theory of Operation

An HLS stream starts with an .m3u8 Multivariant Playlist that contains URLs and metadata pointing to multiple .m3u8 media playlists. These media playlists are encodings of the same video at different bitrates and quality levels. The media player decides which media playlist to stream to the end user, and can switch between them while streaming, based on available bandwidth.

The HLS mixing service works by creating a new .m3u8 Multivariant Playlist that can be passed to the Roku media player in place of the original. The new playlist points to all the original media files, but it adds new alternate audio media playlists to it that point to the HLS mixer. The Roku media player recognizes and will play these alternate audio streams alongside the original video.

The HLS mixer monitors the video and audio segments being downloaded by the Roku player, and if the player re-requests an audio segment, the HLS mixer will mix new music into the audio segment. This ensures that the end user isn’t able to replay DMCA-restricted music.

Additionally, the HLS mixing service creates a new subtitle playlist that holds the title, artist, and release information for the songs that are mixed in. The Roku media player can display this information during playback.

Integration

Integrating the HLS mixing service (the “muxer”) consists of two tasks:

  • run one or more copies of the hls-mux docker image from Feed.fm’s Amazon ECR repository in your cloud infrastructure as a web service
  • modify your Roku application to hand the video player URLs to your hls-mux instance, rather than URLs that point directly to your HLS streams

Install hls-mux Docker image in your cloud infrastructure

The HLS mixing service is created by Feed.fm, but must be installed and run on your own computing infrastructure and accessible to all end users. The service is distributed as a Docker image, so it can be easily added to ECS or Kubernetes clusters, or run on a single machine. The service is stateful, so care must be taken when running multiple copies of the service behind a load balancer.

Installing and running Docker image

The mixing service is published, by Feed.fm, to Amazon’s Elastic Container Registry as 415109207418.dkr.ecr.us-east-1.amazonaws.com. Every update to the mixer is tagged with a new version number, which is updated based on semantic versioning, and the changes are added to the CHANGELOG.md file in the GitHub repository. The latest published version of the mixer is always tagged latest.

The mixer may be downloaded and run locally with the following commands:

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 415109207418.dkr.ecr.us-east-1.amazonaws.com

docker pull 415109207418.dkr.ecr.us-east-1.amazonaws.com/hls-mux:latest

docker run -p80:80 415109207418.dkr.ecr.us-east-1.amazonaws.com/hls-mux:latest

If you are able to bring up http://localhost/ in your browser and see the following, then the mixer is successfully running:

Screenshot of browser. URL: localhost. Page: A form with fields such as “original m3u8 URL”, “token”, and “secret”.

Capacity Planning

The HLS mixer must run and manage multiple audio encoders and decoders for every end user that is retrieving an HLS stream through the mixer. Local storage is used to temporarily store video and audio segments during streaming. Once some period of time has passed since the last request for a stream (20 minutes for live streams, 60 minutes for VOD, by default), the mixer will stop the encoders and decoders associated with the stream and clear out any storage used for the stream. The maximum number of simultaneous users that a single instance can stream to is directly proportional to the CPU power available to the service. The service does not have any hard-coded upper limit of simultaneous streams; it will just run slower for each user as available CPU power is split among more encoder/decoders.

Running multiple instances of the service behind a load balancer is a good way to support a very large number of users. Care must be taken to ensure that, once Client X has begun a stream on Instance 1, all future requests from X must be directed to Instance 1. We recommend using “sticky sessions”, whereby the load balancer returns a session cookie on the first request from a client so that future requests from the same client are directed to the same backend server instance.

We have included a sample Terraform configuration in the GitHub repository. The Terraform script in terraform/client will deploy an instance of this service to AWS Fargate, with server logs sent to CloudWatch. The script requires an AWS region and an ARN that points to an SSL certificate generated via the AWS Certificate Manager. There is no requirement that this service run at Amazon or use Fargate; please contact Feed.fm for assistance with deployment there or elsewhere.

Monitoring

The HLS mixer will, by default, output info, warn, and error level debugging on the console. The mixer can output more verbose logging by setting the LOG_LEVEL environment variable to debug or silly. Note that debug logs are quite verbose and aren’t recommended on heavily loaded instances.

Alternatively, the mixer can be instructed to send logging directly to an AWS Cloudwatch Logs endpoint by posting the following to the container:

curl -XPOST --data id=<ID> --data secret=<SECRET> --data region=<REGION> --data level=debug https://<URL-OF-INSTANCE>/log

where ID, SECRET, and REGION are your AWS access ID, secret token, and the region where there is a Cloudwatch log group named hls-mux.

Cloudwatch Logging can be turned off with:

curl -XDELETE https://<URL-OF-INSTANCE>/log

If you are having difficulty with your HLS mixer, Feed.fm can provide you with AWS credentials to send your logs directly to our Cloudwatch Logs for review.

The HLS mixer also will respond to requests to https://<URL-OF-INSTANCE>/status with a JSON object that lists active streams and available disk space.

Point Roku media player to hls-mux instance URLs

To make use of the HLS mixer, a client only needs to craft a new URL to pass to the Roku media player in place of an original HLS URL. There are two ways to generate this URL: one is to build the URL locally (GET /prepare), and the second is to request a URL from the HLS mixing server (POST /prepare).

Craft a GET /prepare URL

Given an HLS stream with URL <STREAM>, an HLS mixer instance at <HOSTNAME>, a client id string <CLIENTID>, and a random 30 character string <RANDOM>, the following URL could be passed directly to the Roku media player to mix in demo Feed.fm music to an existing HLS stream:

https://<HOSTNAME>/prepare?url=<URL>&token=demo&secret=demo&key=<RANDOM>&client_id=<CLIENTID>

When retrieved, that URL will return a Multivariant Playlist that contains the original media playlists and additional audio media playlists that are flagged as the default audio for all video streams. This URL must only be used by a single user, or the behavior is undefined. Other streams must provide a different key value to distinguish the streams.

The client_id parameter should be persisted on the Roku device and used in all HLS mixer requests. This will allow Feed API servers to track music playback for this user and ensure playback restrictions are enforced. A client_id parameter may be generated in the first place by making an HTTP POST call to https://feed.fm/api/v2/session and retrieving session.client_id in the JSON response.

Request URL with POST /prepare

Alternatively, a POST call to https://<HOSTNAME>/prepare with parameters passed in via the request body will return with a JSON response that looks like the following:

{
"success": true,
"client_id": "lacx44qb:uw:frerxsr0mmg",
"url": "https://<HOSTNAME>/media/23cb8490af624fe0abbb31603ee7675b-1152973985/37e61902a21830865c5451e6655d63e92c7cb362dbb96d8e83b5b3450b3338e9/playlist.m3u8",
"metadata_url": "https://<HOSTNAME>/media/23cb8490af624fe0abbb31603ee7675b-1152973985/37e61902a21830865c5451e6655d63e92c7cb362dbb96d8e83b5b3450b3338e9/song"
}

The url response points to a new Multivariant Playlist that may be passed to the Roku media player. If a client_id wasn’t supplied as part of the POST request, then the returned client_id should be saved on the Roku device for future calls. If a client_id was passed as an input parameter, the response client_id will match it.

When to use GET /prepare vs POST /prepare

The advantage of the GET /prepare method is to support HLS mixer load balancer setups that support “sticky sessions” using HTTP cookies. The Roku media player maintains its own set of HTTP cookies. If the first request to the HLS mixer cluster is made by the Roku media player, then the response will include any load balancer session cookies, and future requests from the media player with those cookies will ensure that they are directed to the same HLS mixer instance.

The disadvantages of GET /prepare is that a separate call must be made to generate a client_id if one does not already exist, and extra care must be taken to prevent Feed.fm authentication tokens from being exposed.

The advantage of the POST /prepare method is that it can be performed server side, so that request parameters are never exposed to clients for abuse. This method also returns a metadata_url endpoint that can be used to programmatically retrieve song metadata if that is desired over the use of subtitles.

The disadvantage of the POST /prepare method is that the returned URL must be routed to the exact same HLS mixer instance that generated it.

Parameters for GET /prepare and POST /prepare

ParameterRequired?Description
urlyesThe URL of an .m3u8 Multivariant Playlist
keyGET: yes
POST: no
A unique 10–32 character string, used to uniquely identify the generated stream.
tokenyesYour Feed.fm token
secretyesYour Feed.fm secret. In order to prevent your master token and secret values from being exposed to clients and abused, you should consider using the POST https://feed.fm/api/v2/access_token endpoint to generate temporary credentials that expire after a period of time.
ipcertain requestsThis is the IP address of the end user that will be streaming the mixed audio. This is only required when making a POST /prepare call from a device that is not the device that will stream the resulting HLS stream. This address is used to determine a user's physical location (via geo-ip mapping) to determine what music the end user has access to. While not technically required, if this is not provided and the end user does not have rights to music playback in their geographic area, then the POST /prepare call will succeed and the user will receive errors when trying to play the returned playlist.
client_idyesThis is a Feed-generated unique ID that represents the user who will stream the playlist. This value might have been retrieved from one of the Feed.fm SDKs (likely POST https://feed.fm/api/v2/session defined in the REST API Docs), or it can be the client_id value returned from this POST /prepare call. This value is required when users listen to multiple videos, so that DMCA playback requirements can be enforced.
station_namenoThe name of the station to pull music from. This station must be available for the given token/secret credentials. If the server is unable to find the matching station, no error will be returned and the default (first available) station will play.
fallback_station_namenoThis is the name of a station to pull music from if no music can be pulled from the staion_name station. This is intended for use with First Play stations.
music_delaynoThis is the number of seconds to delay the start of Feed.fm audio that is mixed in with the original [instructional] audio. This can be used to delay music playback until after, say, intro credits have completed.
music_volumeno

This is a floating point number, from 0 to 1, that indicates the volume (gain) adjustment to music before it is mixed in with original [instructional] audio. A value of 1 means the audio is mixed in with no adjustment, and a value of 0 means the audio would be fully muted before mixing. This field defaults to 0.4.

This field can also contain comma separated volume/name pairs, in order to support multiple volume level audio tracks. For instance, "0.2,low,0.5,medium,1.0,high" would cause three audio tracks to be created, named 'low', 'medium', and 'high'. The first listed track will be set as the 'default' audio track, and the user can select between the other two tracks at any time.

max_secondsnoThis is the number of seconds past the last access of the stream for which the returned URL should be considered valid. Attempts to use the stream after this number of seconds will result in HTTP 404 errors from the service. The default value is 1200 (20 minutes) for live streams, and 3600 (60 minutes) for VOD streams.
renditionnoWhen this is not specified, the audio that is mixed with Feed.fm music is pulled from the lowest bitrate video+audio stream over 1MB/second, or failing that, the highest bitrate video+audio stream. When this is specified, the server will look for an alternate audio rendition with the given name and mix music with that rendition instead.