The idea for this posts arose from my frustration on not finding any organized documentation for learning FFmpeg. Thus, my aim in writing this series has been to provide newbie learners to quickly get up-and-running with FFmpeg.
FFmpeg is a command-line tool for *nix and Windows systems that, in its simplest form, provide a facility to decode and an encode media files. With the proliferation of video on the Internet and in our daily lives, users need the ability to transcode (convert) audio and video files from one format to another. For example, a user might have downloaded a video from YouTube and need to convent it to a format playable on an iPod or other media device.
Besides this obvious use, FFmpeg is also capable of a few other fundamental manipulations on the audio and video data. These manipulations include changing the sample rate of the audio and advancing or delaying it with respect to the video, reducing the size of the media file. They also include changing the frame rate of the resulting video, cropping it, resizing it, placing bars left and right and/or top and bottom in order to pad it when necessary, or changing the aspect ratio of the picture. Furthermore, ffmpeg allows importing audio and video from different sources such as a microphone.
The main components of FFmpeg are libavcodec, an audio/video codec library, libavformat, an audio/video container mux/demux library, and the ffmpeg command line program for passing various transcoding options to the main program.
The FFmpeg project was started by Fabrice Bellard, and has been maintained by Michael Niedermayer since 2004. The name of the project comes from the MPEG video standards group, together with “FF” for “fast forward”. On March 13, 2011 a group of FFmpeg developers decided to fork the project under the name Libav (http://libav.org/) due to some project management related issues.
FFmpeg is used by many open source and proprietary projects, including ffmpeg2theora, VLC, MPlayer, HandBrake, Blender, Google Chrome, and various others.
FFmpeg is made of the following main components.
Programs
ffmpeg – a command line tool to convert multimedia files between formats.
ffserver – a multimedia streaming server for live broadcasts.
ffplay – a simple media player based on SDL and the FFmpeg libraries.
ffprobe – a simple multimedia stream analyzer.
Libraries
libavutil – a library containing functions for simplifying programming, including random number generators, data structures, mathematics routines, core multimedia utilities, and much more.
libavcodec – a library containing decoders and encoders for audio/video codecs.
libavformat – a library containing demuxers and muxers for multimedia container formats.
libavdevice – a library containing input and output devices for grabbing from and rendering to many common multimedia input/output software frameworks, including Video4Linux, Video4Linux2, VfW, and ALSA.
libavfilter – a library containing media filters.
libswscale – a library performing highly optimized image scaling and color space/pixel format conversion operations.
In this posts we will primarily focus on the ffmpeg program, the other programs like ffserver are used for video broadcasts and is outside the scope of this posts. Among the libraries, the most notable parts of FFmpeg are libavcodec, an audio/video codec library, and libavformat, an audio/video container mux and demux library
To be honest, trying to shoehorn the complete details of audio and video in a paragraph or two is plainly ridiculous, as the topic is rather complex. But since this is a beginner’s guide, a few basic overviews will be enough to get you started using ffmpeg properly.
If you are working with audio and video, you are well aware that these files take an inordinate space for storage. You cannot easily work with these files if they were not compressed beforehand. Assuming an NTSC standard video format; a raw (uncompressed) video at 720×480 pixels, 30 frames per second and 24-bit RGB color, would take about 1,036,800 bytes (1 Mb) per frame. That’s almost 30MB per second, or over 200GB for a 2-hour movie. And that’s just the video. Audio stream also takes additional storage. Something needs to be done so that the movie can be stored on a consumer-grade medium such as a DVD. The data needs to be compressed beforehand.
Conventional, lossless compression algorithms such as ZIP, which everyone uses on a regular basis, don’t reduce the size of the data enough, so we need to look into lossy compression for further size reduction. Lossy compression works by discarding some data in the media which results in smaller file sizes. So now you might be thinking what data the compression algorithm discards. Well in general the algorithm does not discard any random data, which would be a disaster. The compression algorithm discards data only if it thinks that the data is redundant. For example in movie frames many times not much changes between successive frames; if the compression software discards some of these frames the viewer will hardly notice any difference, but the storage requirement of those frames have been saved.
Lossy compression is commonly used to compress multimedia data such as audio, video and still images. The only negative aspect of lossy compression is that as some data is removed during compression which can reduce the fidelity of the output.
The algorithms that allow us to encode and decode the data, whether by using lossy or lossless technique are called codecs. Several codecs are enclosed in the libavcodec library supplied with ffmpeg, which enables you to work with a wide variety of video and audio formats.
Once the audio and video streams have been encoded by their respective codecs, this encoded data needs to be put together into a single file. This file is called the ‘container’. A graphic of the process is shown below.
A movie is made-up of two main components, Audio and Video. Both “components” produce a separate stream of data that must be decoded by your DVD-player or some program so we can see and hear the video properly.
The bitrate of a movie is the key to the quality of the audio and video of that movie. Also, particular formats specify the bitrate or the maximum bitrate to be used. Bitrate is a measurement of the number of bits that are transmitted over a set length of time. Your overall bitrate is a combination of your video stream and audio stream in your file with the majority coming from your video stream. Bitrate denotes the average number of bits that one second of audio or video data will take up in your compressed bit stream. The overall bitrate of your movie is a combination of your video stream and audio stream in your file with the majority coming from your video stream.
A bit rate is usually measured in some multiple of bits per second – for example, kilobits, or thousands of bits per second (Kbps – for example, kilobits, or thousands of bits per second (Kbps).
Bitrates come in two versions – VBR (Variable Bit Rate encoding) or CBR (Constant Bit Rate encoding). VBR allows a higher bitrate (and therefore more storage space) to be allocated to the more complex segments of media files while less space is allocated to less complex segments. The average of these rates can be calculated to produce an average bitrate for the file. VBR allows you to set a maximum and minimum bitrate. The compression algorithm then tries to efficiently compress the data reducing to the minimum bitrate when there is little or no motion on screen and increasing to the maximum defined rate when the motion is prevalent. This helps to give you a smaller overall file size without compromising the quality of the video.
CBR is used when a predictable flat bit rate is needed. Although the flat bitrate throughout the entire file comes at the price of efficiency for the codec; usually resulting in a larger file, but smoother playback. CBR is useful for streaming multimedia content on limited capacity channels since it is the maximum bit rate that matters, not the average, so CBR would be used to take advantage of all of the capacity. CBR would not be the optimal choice for storage as it would not allocate enough data for complex sections (resulting in degraded quality) while wasting data on simple sections.
Depending on your video you might want to use a VBR for a streaming playback if the sudden spikes do not exceed your target user’s connection speed. For example if there is only one high motion scene in a video, you will be wasting considerable bandwidth on a CBR throughout the entire file and may better serve your user’s need by using a VBR. Either way try experimenting with the two settings to find what works best for your video.
Briefly, a bitrate specifies how many kilobits the file may use per second of audio. The following shows the quality for various standard audio bitrates.
| 64 Kbps | Audio encoded at 64 Kbps have a 15:1 compression ratio. This bitrate is not recommended for digital music but is acceptable for voice-only recordings. |
| 96 Kbps | Audio encoded at 96 Kbps have a 15:1 compression ratio. One minute of music will be about 700KB of disk space. | 128 Kbps | Audio encoded at 128 Kbps have an 11:1 compression ratio. One minute of music is takes around 1MB of disk space. | 160 Kbps | Audio encoded at 160 Kbps have a 9:1 compression ratio. One minute of music will is about 1.5MB of disk space. | 192 Kbps and above | MP3s encoded at this setting take up the most space but have CD quality sound and can take up to 2MB of space per 60 seconds of music. Online music stores or music download services will have at least this high of a bitrate. |
The audio sampling frequency is the number of times per second audio is sampled and stored – CD audio is sampled at 44.1 KHz, which means when the sound is converted from analog to digital, 44100 samples per second are taken of the audio signal. The higher the sampling rate the audio has, the wider the frequency range it provides. In other words, higher is better quality. Your lows will be lower; your highs will be higher. For example the following image shows an analog signal on the left converted to a digital representation using two different sampling rates. As you can see the higher sampling will lead to an even more exact reproduction of the original signal.
The sample rate can be thought of as how often or how much the sound is described. CD quality audio has 44,100 of these measurements a second. That’s why it’s called 44.1 kilohertz (khz).
So what is the relationship between bitrate and sampling frequency? Bitrate simply specifies the number of bits per second that are used to encode the audio stream. The uncompressed bitrate for CD audio is 16 bits x 44100 samples x 2 channels = 1411200bps, or approximately 1411kbps. When audio is stored in an uncompressed format, the bitrate is a linear function of the sample rate; i.e. doubling the sample rate doubles the bitrate.
With uncompressed audio, there is a direct relationship between the sample rate and the bitrate. A 44.1kHz 16-bit stereo signal takes 1411.2 kbps, or approximately 10.4Mb per minute to record. A 44.1kHz 16-bit mono file would take half of this, as would a 44.1kHz 8-bit stereo file or a 22.05kHz 16-bit stereo file.
But now formats like Ogg Vorbis and MP3, compress audio by making calculated guesses about the sounds humans aren’t likely to hear and then discard these sound samples. As part of this process, such formats allow us to make some of the decisions by deciding how much to throw away, or to put it more simply, how much data to use to represent the original sound. So, using our 44.1kHz stereo sample, we can choose to use as little as 48kbps or as much as approx 500kbps to store this sound. At 500kbps, more of the original sound fidelity is preserved than at 48kbps.
Calculating values
An audio file’s bit rate can be easily calculated when given sufficient information.
Bit rate = (sampling rate) x (bit depth) x (number of channels)e.g., a recording with a 44.1 kHz sampling rate, a 16 bit depth, and 2 channels:
44100 x 16 x 2 = 1411200 bits per second, or 1411.2 kbit/sThe file size of an audio recording can also be calculated using a similar formula:
File Size (Bytes) = (sampling rate) x (bit depth) x (total channels) x (seconds) / 8e.g. a 70 minutes long CD quality recording will take up 740MB:
44100 x 16 x 2 x 4200 / 8 = 740880000 Bytes
Some standard sampling frequencies with their applications is given below.
| Sampling Rate | Use |
|---|---|
| 8,000 Hz | Telephone, walkie-talkie, wireless intercom and wireless microphone transmission; adequate for human speech. |
| 11,025 Hz | used for lower-quality PCM, MPEG |
| 22,050 Hz | One half the sampling rate of audio CDs; used for lower-quality PCM and MPEG |
| 32,000 Hz | miniDV digital video camcorder, video tapes with extra channels of, DAT, High-quality digital wireless microphones, digitizing FM radio. |
| 44,100 Hz | Audio CD, also most commonly used with MPEG-1 audio (VCD, SVCD, MP3). Most professional audio equipment uses 44.1 kHz sampling and above. |
| 48,000 Hz | he standard audio sampling rate used by professional digital video equipment such as tape recorders, video servers, vision mixers and so on. Also used for sound with consumer video formats like DV, digital TV, DVD, and films. |
| 96,000 Hz | DVD-Audio, some LPCM DVD tracks, Blu-ray Disc audio tracks, HD DVD High-Definition DVD) audio tracks. |
The frame rate is how many unique consecutive images are displayed per second in the video to give the illusion of movement; each image thus is called a ‘frame’. The human brain perceives a smooth continuous motion if shown around 24 frames per second. If the frames are less than this magic number, you will see a jerky motion rather than a smooth one. Most video creators use this frame rate.
This is not a standard of course, if your video is a screen cast you can get to frame rates as low as 5fps. Television standards such as PAL (common in Europe and some parts of Asia) uses 25fps, while NTSC standard (used in the US and Japan) uses 29.97fps. Generally you should never exceed the frame rate of the source video. Obviously, the best results will be achieved if the frame rate is kept the same as your original source.
A container file is used to identify and combine different data types. Simpler container formats can contain different types of audio formats, while more advanced container formats can support multiple audio and video streams, subtitles and meta-data — along with the synchronization information needed to play back the various streams together. In most cases, the file header and most of the metadata are specified by the container format. For example, container formats exist for optimized, low-quality, internet video streaming which differs from high-quality DVD streaming requirements.
The video file formats we’re familiar with, such as Quicktime movies (.mov), .avi are media container formats. Some container formats just contain audio, like WAV file fro Windows, MP3 music files or AIFF files for Macs. Others contain audio and video, such as ASF files for Windows, which contain audio compressed with the WAV codec and video compressed with the WMV codec. There are dozens of these container formats. If you’re uploading a video to an online site, check to see what formats the site supports. Sometimes this can be confusing because the list of accepted formats may have both compression formats like MPEG-4 and container formats like .mov listed.
FFmpeg is developed under GNU/Linux, but it can be compiled under most operating systems, including Mac OS X, Microsoft Windows, AmigaOS. In most of the Linux distros, you can directly install ffmpeg using their respective package managers. But in case you are looking for installing the latest version or want to customize the installation, you might need direct installation from the source code too, but as it is an involved and tricky procedure, I’m not discussing it here.
Installing FFmpeg on Ubuntu
Run the following command in the terminal to install FFmpeg.
$ sudo apt-get install ffmpeg |
Installing FFmpeg on Fedora
FFmpeg can be directly installed from the repos using the following command.
$ su -c 'yum install ffmpeg' |
Installing FFmpeg on CentOS
FFmpeg can be directly installed from the repos using the following command.
$ yum install ffmpeg ffmpeg-devel |
Installing FFmpeg on Windows
By far the easiest way to start using FFmpeg is to get a precompiled binary. Zeranoe.com has pre-built binaries for windows, which makes it easier to install ffmpeg. So if you are using Windows you can get up and running FFmpeg in no time. Go ahead and grab the binaries from the below link.
http://ffmpeg.zeranoe.com/builds/
Once installed use the following command to get the ffmpeg version and the versions of the codecs installed.
C:\ffmpeg>ffmpeg -version |
On my Windows machine it returns the following; of course this may be different on your system, depending on the version of FFmpeg installed:
ffmpeg version N-31100-g9251942, Copyright (c) 2000-2011 the FFmpeg developers
built on Jun 30 2011 21:17:59 with gcc 4.5.3
libavutil 51. 11. 0 / 51. 11. 0
libavcodec 53. 7. 0 / 53. 7. 0
libavformat 53. 4. 0 / 53. 4. 0
libavdevice 53. 2. 0 / 53. 2. 0
libavfilter 2. 24. 0 / 2. 24. 0
libswscale 2. 0. 0 / 2. 0. 0
libpostproc 51. 2. 0 / 51. 2. 0
ffmpeg N-31100-g9251942
libavutil 51. 11. 0 / 51. 11. 0
libavcodec 53. 7. 0 / 53. 7. 0
libavformat 53. 4. 0 / 53. 4. 0
libavdevice 53. 2. 0 / 53. 2. 0
libavfilter 2. 24. 0 / 2. 24. 0
libswscale 2. 0. 0 / 2. 0. 0
libpostproc 51. 2. 0 / 51. 2. 0
Adhering to the UNIX culture, FFmpeg relies on a plethora of command-line options to do its work. The generic syntax of an FFmpeg command is shown below.
ffmpeg [[infile options]['-i' infile]]...{[outfile options] outfile}... |
Each section of the command is explained below.
ffmpeg – The first is the FFmpeg executable file name.
infile option – This is where you put options for your input video or audio file. This tells FFmpeg to apply any options give here to the input file before processing starts. This section is not as widely used as the ‘outfile options’.
-i infile – This is the actual video or audio file you use for processing, and also the directory of where it is located.
e.g /home/george/media/myvideo.flv. You will always need to include the `-i` option before your file name.
outfile options – This is where you will put the various options that are required which you want to be applied to the video or audio you will be creating.
outfile – The name of the output file you want to create, and also the directory path if it not the same as your input file directory.
e.g is /home/george/media/out.flv
Now that we have FFmpeg installed, in the next post we will learn about audio processing.
Subscribe now to get the next post update.
This site is a digital habitat of Sameer Borate, a freelance web developer working in PHP, MySQL and WordPress. I also provide web scraping services, website design and development and integration of various Open Source API's. Contact me at metapix[at]gmail.com for any new project requirements and price quotes.
1 Response
1
[ffmpeg] How can I combine hundreds of images into a movie « Catatan Fahmi
October 21st, 2012 at 3:33 am
[...] with images and videos. There are tons of tutorial you may find on the internet (here, or may be here). As a FFmpeg-beginner I would love to share my experience on combining numerous images in one [...]