Lesson 1.1: Remux

nope

In this lesson, you will learn how to use Libav to remux a video file. You will learn about the AVFormatContext struct that is used for muxing and demuxing, you will learn how to open an output file and prepare it for storing video and audio streams, and you will learn how to read packets of data from an input and write them to an output. All the code for this tutorial can found here.

To follow along with this tutorial, you’ll need to have the following packages installed:

  • gcc
  • ninja
  • meson
  • libavformat-dev
  • libavcodec-dev
  • libavutil-dev

The first thing you’ll need to do is copy the meson.build file and create a blank remux.c file. To generate a build configuration, open a terminal, cd into the Remux directory, and run the command meson setup build. From this point on, all you need to do to compile the code is cd into the build directory and run ninja. The commands to run the program can be found in the run.sh file. There are several video files provided in the videos/inputs folder and the commands are designed to store the ouput files in videos/outputs so you’ll need to create that folder.

Opening Input File & Initializing AVFormatContext

AVFormatContext *in_fmt_ctx = NULL, *out_fmt_ctx = NULL;

if ((ret = avformat_open_input(&in_fmt_ctx, in_filename,
  NULL, NULL)) < 0)
{
  fprintf(stderr, "Failed to open input video file: '%s'.\n",
    in_filename);
  goto end;
}

if ((ret = avformat_find_stream_info(in_fmt_ctx, NULL)) < 0) {
  fprintf(stderr, "Failed to retrieve input stream info.");
  goto end;
}

av_dump_format(in_fmt_ctx, 0, in_filename, 0);

if ((ret = avformat_alloc_output_context2(&out_fmt_ctx,
  NULL, NULL, out_filename)))
{
  fprintf(stderr, "Failed to allocate output format context.\n");
  goto end;
}

...

end:
  avformat_close_input(&in_fmt_ctx);
  avformat_free_context(out_fmt_ctx);

Once we’ve added our #include statements, declared all the necessary variables, and defined the usage print statement, the first thing we’ll do is open the input file that is passed on the command line. The AVFormatContext struct is the main struct used in Libav to represent a file. There are two main ways for allocating an AVFormatContext struct, depending on whether the file is an existing file being read from disk or whether we are creating the file. If we are opening an existing file, we use avformat_open_input and if we are creating the file, we use avformat_alloc_output_context2. The former is usually paired with the avformat_find_stream_info function. For this example, this function is mostly needed if the input file is a webm or flv file. We can also call the av_dump_format function which will print out information about the input file, much like when running ffprobe on the file. The latter creates a blank struct that will be manually populated.

Both functions have the option to take in a format struct, either AVFormatInput or AVFormatOutput, that will determine the format of the file. When opening a file, if the file does not match the specified format, this will cause an error. If a format is not provided, it will be auto detected. When creating a file, if a format is not specified, the name of the output file provided will be used to determine the outuput format, just like when using ffmpeg.

Adding Streams

if ((ret = v_stream_idx = av_find_best_stream(in_fmt_ctx,
  AVMEDIA_TYPE_VIDEO, -1, -1, NULL, 0)) < 0)
{
  fprintf(stderr, "Failed to find video stream in input file.\n");
  goto end;
}

if (!(out_stream = avformat_new_stream(out_fmt_ctx, NULL))) {
  fprintf(stderr, "Failed to allocate video output stream.\n");
  ret = AVERROR(ENOMEM);
  goto end;
}

if ((ret = avcodec_parameters_copy(out_stream->codecpar,
  in_fmt_ctx->streams[v_stream_idx]->codecpar)) < 0)
{
  fprintf(stderr, "failed to copy video codec parameters\n");
  goto end;
}

out_stream->codecpar->codec_tag = 0;

Now that we’ve created a blank AVFormatContext struct for our output file, we need to start populating it. The first thing we’ll do is create an AVStream for the video stream. To do this, we’ll use the avformat_new_stream function, and pass in out_fmt_ctx. The AVFormatContext struct contains a streams array which contains all the streams for the file. The avformat_new_stream function will allocate a new AVStream, add it to the streams array of out_fmt_ctx, and return a pointer to it, or NULL on error.

Once we have a new blank stream for the video, we’ll need to populate it with all the information about the stream we want to create. In this case, we are copying the stream from the input so we’ll copy all the same information. To do this, we’ll need to know which stream in the input is the video stream. We’ll use the av_find_best_stream function to get the index of the streams array from in_fmt_ctx that holds the video stream. We’ll pass in AVMEDIA_TYPE_VIDEO so the function will look for a video stream. It will generally return the first video stream it finds, as long as there is a decoder for that stream’s codec. This is similar to using ffmpeg without using the map option.

Once we know the index of the video stream, we’ll use the avcodec_copy_parameters function to copy the codecpar struct from the video stream in in_fmt_ctx to the new AVStream we just created. Although we are not decoding any data in this example, the output file will need to contain all this information so that it can be decoded properly for playback or further processing. The last thing we’ll do is make sure that codec_tag is set to 0. This is to make sure it is set properly later on. This value should be set by avformat_write_header, which will be called later, but the value will only by overwritten if it is initially set to 0.

The process for creating the audio stream is mostly the same as for the video except we’ll specify AVMEDIA_TYPE_AUDIO as the stream type to look for in the input file.

Pre Muxing

  if (!(pkt = av_packet_alloc())) {
    fprintf(stderr, "Failed to allocate AVPacket.\n");
    ret = AVERROR(ENOMEM);
    goto end;
  }

  if (!(out_fmt_ctx->oformat->flags & AVFMT_NOFILE)) {
    if ((ret = avio_open(&out_fmt_ctx->pb, out_filename,
      AVIO_FLAG_WRITE)) < 0)
    {
      fprintf(stderr, "Failed to open output file.\n");
      goto end;
    }
  }

  if ((ret = avformat_write_header(out_fmt_ctx, NULL)) < 0) {
    fprintf(stderr, "Failed to write header for output file.\n");
    goto end;
  }

  ...

  end:
    ...
    av_packet_free(&pkt);
    if (out_fmt_ctx && !(out_fmt_ctx->flags & AVFMT_NOFILE))
      avio_closep(&out_fmt_ctx->pb);
    ...

Now that we’ve prepared the output context, we have a few things to do before we’ll start muxing. The first thing we’ll do is allocate an AVPacket that will be used to store data read from the input file. The next thing we’ll do is create an output file that the data will be read to. Some demuxers will create the file automatically so we first need to check if there is an AVFMT_NOFILE flag on output format. If not, we call avio_open which will create a file with the name passed in on the command line, open it for writing, and store a reference to it in the pb field of out_fmt_ctx. Lastly, well call avformat_write_header. This will write all the necessary metadata to the file such as the container format, number of streams, and codecs used, as well as any other information that will be needed to properly open and read the contents of the file. We will also need to add lines to the end procedure to free the AVPacket and close the file.

Muxing

  while ((ret = av_read_frame(in_fmt_ctx, pkt)) >= 0)
  {
    in_stream = in_fmt_ctx->streams[pkt->stream_index];

    if (pkt->stream_index == v_stream_idx) {
      pkt->stream_index = VIDEO_STREAM_IDX;
    }
    else if (pkt->stream_index == a_stream_idx) {
      pkt->stream_index = AUDIO_STREAM_IDX;
    }
    else {
      av_packet_unref(pkt);
      continue;
    }

    pkt->pos = -1;
    out_stream = out_fmt_ctx->streams[pkt->stream_index];
    av_packet_rescale_ts(pkt, in_stream->time_base,
      out_stream->time_base);

    if ((ret = av_interleaved_write_frame(out_fmt_ctx, pkt)) < 0)
    {
      fprintf(stderr, "Failed to write packet to file.\n");
      goto end;
    }

    av_packet_unref(pkt);

    if ((ret = av_write_trailer(out_fmt_ctx)) < 0) {
      fprintf(stderr, "Failed to write trailer to file.\n");
      goto end;
    }
  }

We’ll start by calling av_read_frame inside a loop that runs as long the return value is not negative. A negative value signals the end of the file, or an error. The packet data read from the input will be stored in pkt. This struct will have a field stream_index specifying which stream from the input it came from. This value is used to get a reference to that stream from in_fmt_ctx->streams. We check if the the packet came from the video stream or the audio stream, and then overwrite the value of the packet’s stream_index because this value will be used by the muxer to determine which stream to place the packet into in the output file. If the packet does not belong the either the video or audio stream that was found earlier when we called av_find_best_stream, it is ignored.

Next we’ll set pkt->pos to -1 so that it will be set by the muxer. Initially this value will indicate the position of the packet in the input file but it needs to be reset to the position of the packet in the output file. After that, we’ll call av_packet_rescale_ts. This will make sure that if the output file has a different time base for it’s timestamps, the timestamp of each packet will be converted.

Finally, we will call av_interleaved_write_frame to write the packet to the output file. Once the loop has finished, all data from the input should now be written to the new output file. The only thing left to do is call av_write_trailer which will write any necessary metadata to file that wasn’t known before the writing process such as byte locations of I frames used for seeking.

In the next lesson, we will build off the concepts in this lesson by learning how to access, transfer, and set user defined metadata values. You will be able to copy all metadata from the input to the output and pass in a title value on the command line that will be used to set the metadata title of the output. You will also learn how to transfer chapter information.

Go To Next Lesson - 1.2: Copy Metadata

Go To Previous Lesson - 0: Libav Tutorial Introduction

View All Lessons