After you create a video, one of the most common things you want to do with it is grab thumbnails.
Thumbnails have many uses:
- Use them as the poster for your video player.
- Analyze them for content moderation, use AI to generate tags to associate with the video
- When rendering a list of videos for users to browse
- Sharing on social media with the og:image tag
If you're looking for somewhere to host and stream your videos for you, Mux's Video API has everything you need to manage video for your application.
Check out Mux's Video API!Extract a single thumbnail at a specific timestamp
ffmpeg -i input_video.mp4 \
-vf "select='gte(t,9)',scale=320:-1" \
-frames:v 1 thumbnail.pngHere's a breakdown of the command:
- -i input_video.mp4: Specifies the input video file.
- -vf "select='gte(t,9)',scale=320:-1": Applies the thumbnail filter and scales the output image to width 320 pixels while preserving the aspect ratio. Adjust the resolution as needed. gte(t,9) is selecting the 9 second mark to extract the thumbnail. For one minute you would use gte(t,60), you can also be more prices, for example 7.5 seconds: gte(t,7.5)
- -frames:v 1: Extracts only one frame (you can adjust this to extract more frames).
- thumbnail.png: The name of the output image file.
Extract a multiple thumbnails at regular intervals
If you want to extract multiple thumbnails at regular intervals, you can modify the command to include a timestamp pattern and specify fps=1 to get 1 thumbnail per second:
ffmpeg -i input_video.mp4 \
-vf "fps=1,scale=320:-1" \
-vsync vfr \
-q:v 2 thumb%04d.png- -vsync vfr: Ensures variable frame rate for correct frame extraction.
- -vf "fps=1,scale=320:-1": Tells FFmpeg to produce 1 frame per second and to scale the thumbnail to 320 pixels wide
- -q:v 2: Sets the quality of the output thumbnails (lower values mean higher quality).
- thumb%04d.png: Specifies the output file name pattern where %04d will be replaced by a sequence number (e.g., thumb0001.png, thumb0002.png, etc.).
Extract 1 thumbnail per I-frame or keyframe
The most time and resource efficient way to extract multiple thumbnails from a video would be to extract each I-frame (or keyframe) from the video.
ffmpeg -i input_video.mp4 \
-vf "select='eq(pict_type,I)',scale=320:-1" \
-vsync vfr \
-q:v 2 thumb%04d.pngHere's a breakdown of the command:
- select='eq(pict_type,I)' selects only I-frames. eq(pict_type,I) checks if the frame is an I-frame
This will output an image for every single I-frame of the video, which might be a lot. You could modify this command to only select every 10th I-frame with another filter:
Extract 1 thumbnail for every 10th I-frame or keyframe
This command is nearly the same but adds select='not(mod(n,10))' which will take the previously selected I-frames and select every 10th one.
ffmpeg -i input_video.mp4 \
-vf "select='eq(pict_type,I)',select='not(mod(n,10))',scale=320:-1" \
-vsync vfr \
-q:v 2 thumb%04d.pngIn the select='not(mod(n,10))' command
- n is the frame number
- mod(n,10) is the remainder when n is divided by 10
- not(mod(n,10)) is true when the remainder is 0 (i.e. for every 10th frame)
Benefits and drawbacks of extracting I-frame thumbnails
If you're doing the I-frame approach, there are some benefits and drawbacks to be aware of. Weather you choose this direction or not depends on if your particular use case is sensitive to the limitations of this approach.
Benefits
- Efficiency: I-frames are complete images that don't depend on any other frames, which means FFmpeg can grab them quickly without a lot of computation.
- Quality: I-frames typically have the best quality in a video stream because they are self-contained
- Representativeness: I-frames often occur at scene changes or significant moments in the video, making them good candidates for a set of images that will represent the content contained in the video.
Drawbacks
- The spacing between I-frames can be irregular and unpredictable
- Some videos, especially videos optimized for streaming might have very few I-frames, which limits the number of thumbnails that you get.
Extracting thumbnails with Mux
If you have videos hosted with the Mux Video API you can extract thumbnails, gifs and storyboards on-demand with the image API.
Video thumbnail extraction FAQs
What's the difference between extracting thumbnails from I-frames vs. regular intervals?
I-frame extraction is more efficient because I-frames are complete images that don't require additional decoding, and they often occur at scene changes making them representative of content. However, I-frame spacing can be irregular and unpredictable. Extracting at regular intervals gives you consistent thumbnail spacing but requires more computational resources since FFmpeg needs to decode frames that aren't I-frames.
How do I choose the right thumbnail quality setting?
The -q:v flag in FFmpeg controls JPEG quality, with values ranging from 2 (highest quality, larger file size) to 31 (lowest quality, smallest file size). For most use cases, -q:v 2 provides excellent quality suitable for video posters and social sharing. If file size is a concern and thumbnails will be displayed small, you can use -q:v 5 for a good balance.
Can I extract thumbnails while a video is still uploading or processing?
With FFmpeg, you need the complete video file available locally before extracting thumbnails. However, API-based solutions like Mux's image API can generate thumbnails on-demand once the video has been ingested, without requiring you to download the entire file or run FFmpeg yourself.
What image format should I use for video thumbnails?
PNG provides lossless quality and supports transparency, making it ideal when quality is paramount. JPEG offers smaller file sizes with acceptable quality for most video thumbnails, especially for web use. The choice depends on your use case—use PNG for video players where quality matters, and JPEG for thumbnail grids or social sharing where file size impacts load time.
How many thumbnails should I extract from a video?
This depends entirely on your use case. For a video player poster, you only need one representative thumbnail. For a preview or storyboard feature, extracting one thumbnail every 5-10 seconds provides good coverage. For content moderation or AI tagging, I-frame extraction ensures you capture scene changes without generating excessive images.
What's a storyboard and when should I use one instead of individual thumbnails?
A storyboard combines multiple thumbnails into a single sprite sheet image, with metadata indicating the position of each thumbnail. This approach is more efficient for video scrubbing features where you want to show preview thumbnails as users hover over the progress bar—one HTTP request loads all thumbnails instead of dozens of individual requests.