Lesson 1.7: Clipping
In this lesson, you will learn how to make clips of videos. It will be similar
to using the -ss
and -t
command line options with ffmpeg
. All the code
for this tutorial can be found
here.
Inital Setup
start_sec = strtod(argv[1], NULL);
in_filename = argv[2];
duration_sec = strtod(argv[3], NULL);
out_filename = argv[4];
if ((ret =
avformat_open_input(&in_fmt_ctx, in_filename, NULL, NULL)) < 0)
{
fprintf(stderr, "Failed to open input video file: '%s'.\n",
in_filename);
goto end;
}
if ((ret = avformat_find_stream_info(in_fmt_ctx, NULL)) < 0) {
fprintf(stderr, "Failed to retrieve input stream info.");
goto end;
}
if ((ret = avformat_alloc_output_context2(&out_fmt_ctx,
NULL, NULL, out_filename)))
{
fprintf(stderr, "Failed to allocate output format context.\n");
goto end;
}
First we assign our command line arguments to our variables. The first argument
will be the time in seconds that we will seek to before we beginning reading
frames from the input file. It is like the -ss
option in ffmpeg
. We use the
strtod
function to convert the input, which is a char
by default, into a
double
. Then, as usual, we call avformat_open_input
on the input file to
create an AVFormatContext
for the input and avformat_alloc_output_context2
with the output filename specified by the user to create an AVFormatContext
for the output.
Initialize Streams
for (int i = 0; i < in_fmt_ctx->nb_streams; i++) {
if ((ret = initialize_stream(out_fmt_ctx,
in_fmt_ctx->streams[i])) < 0)
{
fprintf(stderr, "Failed to initialize stream %d.\n", i);
goto end;
}
if (out_fmt_ctx->streams[i]->codecpar->codec_type ==
AVMEDIA_TYPE_VIDEO)
{
video_idx = i;
}
}
if (video_idx == -1) {
fprintf(stderr, "Failed to find video stream.\n");
goto end;
}
Now we loop through all the streams in the input and pass each one to the
initialize_stream
function. When we find the video stream, the record the
index in video_idx
because we use a video frame to determine when to start
reading frames from the input to the output. The output video, like all video,
must start with a keyframe. If for some reason we haven’t found a video stream
after looping through all the streams, we fail.
int initialize_stream(AVFormatContext *out_fmt_ctx,
AVStream *in_stream)
{
AVStream *out_stream;
int ret = 0;
if (!(out_stream = avformat_new_stream(out_fmt_ctx, NULL))) {
fprintf(stderr,
"Failed to allocate output stream for input stream.\n");
ret = AVERROR(ENOMEM);
return ret;
}
if ((ret = avcodec_parameters_copy(out_stream->codecpar,
in_stream->codecpar)) < 0)
{
fprintf(stderr,
"Failed to copy codec parameters for input stream.\n");
return ret;
}
out_stream->codecpar->codec_tag = 0;
if ((ret = av_dict_copy(&out_stream->metadata,
in_stream->metadata, AV_DICT_DONT_OVERWRITE)) < 0)
{
fprintf(stderr,
"Failed to copy metadata for input stream.\n");
return ret;
}
return ret;
}
The initialize_stream
function is pretty standard. It takes in the input stream
that will be used to copy parameters, and the output context that the new stream
will be added to. We use avformat_new_stream
to create the new stream and
avcodec_parameters_copy
to copy codec parameters from the input stream to the
output stream. We set codec_tag
to 0
so it will be set properly by the muxer.
Finally, we use av_dict_copy
to copy stream metadata from the input stream to
the output stream.
Seek Frame
start_ts =
start_sec *
in_fmt_ctx->streams[video_idx]->time_base.den /
in_fmt_ctx->streams[video_idx]->time_base.num;
if ((ret =
av_seek_frame(in_fmt_ctx, video_idx, start_ts,
AVSEEK_FLAG_BACKWARD)) < 0)
{
fprintf(stderr, "Failed to seek to start frame.\n");
goto end;
}
Next, we convert the start time in seconds into a timestamp using the timebase of
the video stream. Then we call av_seek_frame
to seek to the closest keyframe to
the timestamp.
Make Clip
while ((ret = av_read_frame(in_fmt_ctx, pkt)) >= 0)
{
if (
pkt->dts < 0 ||
pkt->pts < 0
) {
av_packet_unref(pkt);
continue;
}
in_stream = in_fmt_ctx->streams[pkt->stream_index];
out_stream = out_fmt_ctx->streams[pkt->stream_index];
if (first_dts_set == NOT_SET)
{
if (pkt->stream_index == video_idx)
{
if (!(pkt->flags & AV_PKT_FLAG_KEY)) {
av_packet_unref(pkt);
continue;
}
first_dts = pkt->dts;
first_dts_set = SET;
duration_ts = av_rescale_q(duration_sec * AV_TIME_BASE,
AV_TIME_BASE_Q,
in_fmt_ctx->streams[video_idx]->time_base);
end_ts = first_dts + duration_ts;
}
else {
av_packet_unref(pkt);
continue;
}
}
if (pkt->dts > end_ts) {
av_packet_unref(pkt);
break;
}
pkt->pts = av_rescale_q(pkt->pts - first_dts,
in_stream->time_base, out_stream->time_base);
pkt->dts = av_rescale_q(pkt->dts - first_dts,
in_stream->time_base, out_stream->time_base);
pkt->duration = av_rescale_q(pkt->duration,
in_stream->time_base,out_stream->time_base);
pkt->pos = -1;
if ((ret = av_interleaved_write_frame(out_fmt_ctx, pkt)) < 0) {
fprintf(stderr, "Failed to write packet to file.\n");
goto end;
}
av_packet_unref(pkt);
}
Once we’ve allocated an AVPacket
, opened the output file, and written the file
header, we are ready to start reading packets from the input file. If we read any
packets with negative timestampes, we ignore them. Then we check if first_dts_set
is SET
. This value tells us if first_dts
is set, which is the decode time
stamp of the frame that will be the first frame of clip. av_seek_frame
does
not always seek perfectly to a keyframe so we need to check this ourselves
before we start reading writing frames to the output.
If first_dts
is not set, we check if the current frame is from the video
stream beause we are looking for a video keyframe. Then we check if the current
frame is a keyframe with the AV_PKT_FLAG_KEY
flag. If not, we go to the next
frame. If it is a keyframe, then this is the first video key frame we’ve read
from the input file. We set first_dts
to the dts
of the current frame.
We set first_dts_set
to SET
so we know we’ve found the first_dts. Then we
rescale the duration_sec
to the timebase of the the video stream. Finally, we
calculate the end_ts
by adding duration_ts
to first_dts
. This is the
timestamp that will tell us once we`ve reached the desired length for the clip.
Now that we have found the starting keyframe to start the clip from, and we know
when to end the clip, we first check if the current frame’s dts
is past
end_ts
. If so, we just stop the loop. If not, we rescale the pkt’s timestamps
to the output timebase, just like usual, except we first subtract the value of
first_dts
from the current timestamp so that every timestamp for the entire
video will be offset by the same number and timestamps will start at 0
.
Then we set pkt->pos
to -1
and write the frame to the output file. Once that’s
done, we write the trailer and do our cleanup and we’re done. The output file
should now contain a clip of the original file that begins at the closest video
key frame to the timestamp specified by the user with a length equal to the
duration specified by the user.
The next lesson will be the start of Section 2 and we will begin learning about transcoding video and audio.