Media Files
Introduction
Several BIDS datatypes make use of media files — audio recordings, video recordings, combined audio-video recordings, and still images. This appendix defines the common file formats, metadata conventions, and codec identification schemes shared across all datatypes that use media files.
The following media suffixes are defined:
| Name | suffix |
Description |
|---|---|---|
| Audio file | audio | An audio data file containing one or more audio streams. Common formats include WAV (uncompressed), MP3, AAC, and Ogg Vorbis. |
| Audio-video file | audiovideo | A media file containing both audio and video streams. Common containers include MP4, MKV, AVI, and WebM. |
| Image file | image | A still image data file. Common formats include JPEG, PNG, SVG, WebP, and TIFF. |
| Video file | video | A video data file containing one or more video streams but no audio. Common containers include MP4, MKV, AVI, and WebM. |
Datatypes that incorporate media files (for example, behavioral recordings or stimuli) define their own file-naming rules, directory placement, and datatype-specific metadata. The conventions described here apply uniformly to all such datatypes.
Relationship to the photo suffix
The media file definitions introduced here generalize the concept of all media in BIDS.
The existing photo suffix (used for photographs of anatomical landmarks,
head localization coils, and tissue samples) predates this framework and covers
a narrower use case — still images in specific electrophysiology and microscopy datatypes.
The media suffixes (audio, video, audiovideo, image) are intended as the
general-purpose mechanism for all media content in BIDS.
In practice, a "photo" could equally be a video of an experimental setup with verbal
narration, an audio recording describing electrode placement, or a drawing rather than
a photograph.
The media file framework should be generally adopted for new datatypes,
and a future proposal may deprecate the photo suffix in favor of the broader image
suffix with appropriate migration tooling
(see bids-utils).
Supported Formats
Audio formats
| Format | Extension | Description |
|---|---|---|
| Waveform Audio | .wav | A Waveform Audio File Format audio file, typically containing uncompressed PCM audio. |
| MP3 Audio | .mp3 | An MP3 audio file. |
| Advanced Audio Coding | .aac | An Advanced Audio Coding audio file. |
| Ogg Vorbis | .ogg | An Ogg audio file, typically containing Vorbis-encoded audio. |
Video container formats
| Format | Extension | Description |
|---|---|---|
| MPEG-4 Part 14 | .mp4 | An MPEG-4 Part 14 media container file. |
| Audio Video Interleave | .avi | An Audio Video Interleave media container file. |
| Matroska Video | .mkv | A Matroska media container file. |
| WebM | .webm | A WebM media container file, typically containing VP8/VP9 video and Vorbis/Opus audio. |
Image formats
| Format | Extension | Description |
|---|---|---|
| Joint Photographic Experts Group Format | .jpg | A JPEG image file. |
| Portable Network Graphics | .png | A Portable Network Graphics file. |
| Scalable Vector Graphics | .svg | A Scalable Vector Graphics image file. |
| WebP Image | .webp | A WebP image file. |
| Tag Image File Format | .tif | A Tag Image File Format file. |
| Tag Image File Format | .tiff | A Tag Image File Format image file. The .tiff extension is the long form of .tif. |
When choosing a format, consider the trade-off between file size and data fidelity. Uncompressed or lossless formats (WAV, PNG, TIFF) preserve full quality but produce larger files. Lossy formats (MP3, AAC, JPEG) significantly reduce file size at the cost of some data loss.
Media Stream Metadata
Media files SHOULD be accompanied by a JSON sidecar file containing technical metadata about the media streams. The following metadata fields are defined for media files.
Duration
Applies to suffixes: audio, video, audiovideo.
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| RecordingDuration | RECOMMENDED | number | Length of the recording in seconds (for example, 3600). |
RecordingDuration reuses the existing BIDS metadata field already defined for
electrophysiology recordings (EEG, iEEG, MEG, and others).
Audio stream properties
Applies to suffixes: audio, audiovideo.
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| AudioCodec | RECOMMENDED | string | The audio codec used to encode the audio stream, expressed as an FFmpeg codec name (for example, "aac", "mp3", "opus", "flac", "pcm_s16le"). This value can be auto-extracted using ffprobe -v quiet -print_format json -show_streams. |
| AudioSampleRate | RECOMMENDED | number | Sampling frequency of the audio stream, in Hz (for example, 44100, 48000, 96000).Must be a number greater than 0. |
| AudioChannelCount | RECOMMENDED | integer | Number of audio channels in the audio or audio-video file (for example, 1 for mono, 2 for stereo).Must be a number greater than or equal to 1. |
| AudioCodecRFC6381 | OPTIONAL | string | The audio codec expressed as an RFC 6381 codec string (for example, "mp4a.40.2" for AAC-LC). This representation is useful for web and broadcast interoperability. |
Note: AudioSampleRate is used instead of the existing SamplingFrequency field
because audio-video files require distinguishing the audio sampling rate from the
video frame rate. The Audio prefix makes this unambiguous in multi-stream containers.
Visual properties
Applies to suffixes: video, audiovideo, image.
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| Width | RECOMMENDED | integer | Width of the video frame or image, in pixels. Must be a number greater than or equal to 1. |
| Height | RECOMMENDED | integer | Height of the video frame or image, in pixels. Must be a number greater than or equal to 1. |
Video stream properties
Applies to suffixes: video, audiovideo.
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| VideoCodec | RECOMMENDED | string | The video codec used to encode the video stream, expressed as an FFmpeg codec name (for example, "h264", "hevc", "vp9", "av1"). This value can be auto-extracted using ffprobe -v quiet -print_format json -show_streams. |
| FrameRate | RECOMMENDED | number | The video frame rate of the video stream, in Hz (for example, 24, 25, 29.97, 30, 60).Must be a number greater than 0. |
| VideoCodecRFC6381 | OPTIONAL | string | The video codec expressed as an RFC 6381 codec string (for example, "avc1.640028" for H.264 High Profile Level 4.0). This representation is useful for web and broadcast interoperability. |
Codec Identification
Codec identification uses two complementary naming systems:
FFmpeg codec names (RECOMMENDED)
The AudioCodec and VideoCodec fields use
FFmpeg codec names as the RECOMMENDED
convention. These names are the de facto standard in scientific computing and can be
auto-extracted from media files using:
ffprobe -v quiet -print_format json -show_streams <file>
RFC 6381 codec strings (OPTIONAL)
The AudioCodecRFC6381 and VideoCodecRFC6381 fields use
RFC 6381 codec strings.
These provide precise codec profile and level information useful for
web and broadcast interoperability.
Common codec reference
| Codec | FFmpeg Name | RFC 6381 String | Notes |
|---|---|---|---|
| H.264 / AVC | h264 |
avc1.640028 |
Most widely supported |
| H.265 / HEVC | hevc |
hev1.1.6.L93.B0 |
High efficiency |
| VP9 | vp9 |
vp09.00.10.08 |
Open, royalty-free |
| AV1 | av1 |
av01.0.01M.08 |
Next-gen open codec |
| AAC-LC | aac |
mp4a.40.2 |
Default audio for MP4 |
| MP3 | mp3 |
mp4a.6B |
Legacy lossy audio |
| Opus | opus |
Opus |
Open, low-latency audio |
| FLAC | flac |
fLaC |
Open lossless audio |
| PCM 16-bit LE | pcm_s16le |
— | Uncompressed (WAV) |
The FFmpeg name column shows the value to use for VideoCodec or AudioCodec.
The RFC 6381 column shows the value for VideoCodecRFC6381 or AudioCodecRFC6381.
RFC 6381 strings vary by profile and level;
the values shown are representative examples.
Privacy Considerations
Media files — particularly audio and video recordings — may contain personally identifiable information (PII), including but not limited to:
- Voices and speech content
- Facial features and other physical characteristics
- Background environments that could identify locations
- Metadata embedded in file headers (for example, GPS coordinates, device identifiers)
Researchers MUST ensure that sharing of media files complies with the informed consent obtained from participants and with applicable privacy regulations. De-identification techniques (for example, voice distortion, face blurring, metadata stripping) SHOULD be applied where appropriate before data sharing.
Example
A complete sidecar JSON file for an audio-video recording:
{
"RecordingDuration": 312.5,
"VideoCodec": "h264",
"VideoCodecRFC6381": "avc1.640028",
"FrameRate": 30,
"Width": 1920,
"Height": 1080,
"AudioCodec": "aac",
"AudioCodecRFC6381": "mp4a.40.2",
"AudioSampleRate": 48000,
"AudioChannelCount": 2
}