Skip to content

Media Files

Introduction

Several BIDS datatypes make use of media files — audio recordings, video recordings, combined audio-video recordings, and still images. This appendix defines the common file formats, metadata conventions, and codec identification schemes shared across all datatypes that use media files.

The following media suffixes are defined:

Name suffix Description
Audio file audio An audio data file containing one or more audio streams. Common formats include WAV (uncompressed), MP3, AAC, and Ogg Vorbis.
Audio-video file audiovideo A media file containing both audio and video streams. Common containers include MP4, MKV, AVI, and WebM.
Image file image A still image data file. Common formats include JPEG, PNG, SVG, WebP, and TIFF.
Video file video A video data file containing one or more video streams but no audio. Common containers include MP4, MKV, AVI, and WebM.

Datatypes that incorporate media files (for example, behavioral recordings or stimuli) define their own file-naming rules, directory placement, and datatype-specific metadata. The conventions described here apply uniformly to all such datatypes.

Relationship to the photo suffix

The media file definitions introduced here generalize the concept of all media in BIDS. The existing photo suffix (used for photographs of anatomical landmarks, head localization coils, and tissue samples) predates this framework and covers a narrower use case — still images in specific electrophysiology and microscopy datatypes.

The media suffixes (audio, video, audiovideo, image) are intended as the general-purpose mechanism for all media content in BIDS. The media file framework should be generally adopted for new datatypes, and a future proposal may deprecate the photo suffix in favor of the broader image suffix with appropriate migration tooling (see bids-utils).

Supported Formats

Audio formats

Format Extension Description
Waveform Audio .wav A Waveform Audio File Format audio file, typically containing uncompressed PCM audio.
Free Lossless Audio Codec .flac A FLAC lossless audio file.
MP3 Audio .mp3 An MP3 audio file.
Advanced Audio Coding .aac An Advanced Audio Coding audio file.
Ogg Vorbis .ogg An Ogg audio file, typically containing Vorbis-encoded audio.

Video container formats

Format Extension Description
MPEG-4 Part 14 .mp4 An MPEG-4 Part 14 media container file.
Audio Video Interleave .avi An Audio Video Interleave media container file.
Matroska Video .mkv A Matroska media container file.
WebM .webm A WebM media container file, typically containing VP8/VP9 video and Vorbis/Opus audio.

Image formats

Format Extension Description
Joint Photographic Experts Group Format .jpg A JPEG image file.
Portable Network Graphics .png A Portable Network Graphics file.
Scalable Vector Graphics .svg A Scalable Vector Graphics image file.
WebP Image .webp A WebP image file.
Tag Image File Format .tif A Tag Image File Format file.
Tag Image File Format .tiff A Tag Image File Format image file. The .tiff extension is the long form of .tif.

When choosing a format, consider the trade-off between file size, data fidelity, openness and prevalence of the format in the domain of application. Uncompressed or lossless formats (WAV, PNG, TIFF) preserve full quality but produce larger files. Lossy formats (MP3, AAC, JPEG) significantly reduce file size at the cost of some data loss.

Media Stream Metadata

Media files SHOULD be accompanied by a JSON sidecar file containing technical metadata about the media streams. The following metadata fields are defined for media files.

Duration

Applies to suffixes: audio, video, audiovideo.

Key name Requirement Level Data type Description
RecordingDuration RECOMMENDED number Length of the recording in seconds (for example, 3600).

RecordingDuration reuses the existing BIDS metadata field already defined for electrophysiology recordings (EEG, iEEG, MEG, and others).

Audio stream properties

Applies to suffixes: audio, audiovideo.

Key name Requirement Level Data type Description
AudioCodec RECOMMENDED string The audio codec used to encode the audio stream, expressed as an FFmpeg codec name (for example, "aac", "mp3", "opus", "flac", "pcm_s16le"). This value can be auto-extracted using ffprobe -v quiet -print_format json -show_streams.
AudioSampleRate RECOMMENDED number Sampling frequency of the audio stream, in Hz (for example, 44100, 48000, 96000).

Must be a number greater than 0.
AudioChannelCount RECOMMENDED integer Number of audio channels in the audio or audio-video file (for example, 1 for mono, 2 for stereo).

Must be a number greater than or equal to 1.
AudioBitDepth OPTIONAL integer Number of bits per sample in the audio stream (for example, 16, 24, or 32). Typically reported for uncompressed or losslessly compressed audio.

Must be a number greater than or equal to 1.
AudioCodecRFC6381 OPTIONAL string The audio codec expressed as an RFC 6381 codec string (for example, "mp4a.40.2" for AAC-LC). This representation is useful for web and broadcast interoperability.

Note: AudioSampleRate is used instead of the existing SamplingFrequency field because audio-video files require distinguishing the audio sampling rate from the video frame rate. The Audio prefix makes this unambiguous in multi-stream containers.

Image properties

Applies to suffixes: video, audiovideo, image.

Key name Requirement Level Data type Description
ImageWidth RECOMMENDED integer Width of the video frame or image, in pixels. Corresponds to the number of columns in the stored pixel grid as captured, without applying any orientation correction that may be reported by container metadata (for example, the EXIF Orientation tag).

Must be a number greater than or equal to 1.
ImageHeight RECOMMENDED integer Height of the video frame or image, in pixels. Corresponds to the number of rows in the stored pixel grid as captured, without applying any orientation correction that may be reported by container metadata (for example, the EXIF Orientation tag).

Must be a number greater than or equal to 1.
ImagePixelFormat OPTIONAL string The pixel format of the video frame or image, as reported by FFmpeg's pix_fmt field (for example, "yuv420p", "yuv420p10le", "gray16le", "rgb24"). A single pix_fmt value encodes the color model, channel count, chroma subsampling, and bit depth, and can be extracted automatically with ffprobe.
ImageBitDepth OPTIONAL integer Bit depth per channel of the stored pixel data of the video frame or image (for example, 8, 10, 12, 16). For multi-channel data this is the depth of each individual channel. When ImagePixelFormat is also provided, this field is redundant with the bit depth encoded in the FFmpeg pix_fmt value (for example, yuv420p10le -> 10) and the two MUST agree. ImageBitDepth is nonetheless useful as a more directly discoverable summary, and as the primary precision field for image-only sidecars whose producing tools do not naturally surface pix_fmt.

Must be a number greater than or equal to 1.

Video stream properties

Applies to suffixes: video, audiovideo.

Key name Requirement Level Data type Description
VideoCodec RECOMMENDED string The video codec used to encode the video stream, expressed as an FFmpeg codec name (for example, "h264", "hevc", "vp9", "av1"). This value can be auto-extracted using ffprobe -v quiet -print_format json -show_streams.
VideoFrameRate RECOMMENDED number The video frame rate of the video stream, in Hz (for example, 24, 25, 29.97, 30, 60). For variable rate videos, this value should be the nominal frame rate.

Must be a number greater than 0.
VideoFrameCount RECOMMENDED integer Total number of frames in the video stream. For constant frame rate video this can be derived from VideoFrameRate and RecordingDuration, but for variable frame rate (VFR) video the derivation is undefined, so an explicit value is needed. Also useful as an integrity check to detect truncated or corrupted files.

Must be a number greater than or equal to 1.
VideoCodecRFC6381 OPTIONAL string The video codec expressed as an RFC 6381 codec string (for example, "avc1.640028" for H.264 High Profile Level 4.0). This representation is useful for web and broadcast interoperability.

Codec Identification

Codec identification uses two complementary naming systems:

The AudioCodec and VideoCodec fields use FFmpeg codec names as the RECOMMENDED convention. These names are the de facto standard in scientific computing and can be auto-extracted from media files using:

ffprobe -v quiet -print_format json -show_streams <file>

RFC 6381 codec strings (OPTIONAL)

The AudioCodecRFC6381 and VideoCodecRFC6381 fields use RFC 6381 codec strings. These provide precise codec profile and level information useful for web and broadcast interoperability.

Common codec reference

Codec FFmpeg Name RFC 6381 String Notes
H.264 / AVC h264 avc1.640028 Most widely supported
H.265 / HEVC hevc hev1.1.6.L93.B0 High efficiency
VP9 vp9 vp09.00.10.08 Open, royalty-free
AV1 av1 av01.0.01M.08 Next-gen open codec
AAC-LC aac mp4a.40.2 Default audio for MP4
MP3 mp3 mp4a.6B Legacy lossy audio
Opus opus Opus Open, low-latency audio
FLAC flac fLaC Open lossless audio
PCM 16-bit LE pcm_s16le Uncompressed (WAV)

The FFmpeg name column shows the value to use for VideoCodec or AudioCodec. The RFC 6381 column shows the value for VideoCodecRFC6381 or AudioCodecRFC6381. RFC 6381 strings vary by profile and level; the values shown are representative examples.

Privacy Considerations

Media files — particularly audio and video recordings — may contain personally identifiable information (PII), including but not limited to:

  • Voices and speech content
  • Facial features and other physical characteristics
  • Background environments that could identify locations
  • Metadata embedded in file headers (for example, GPS coordinates, device identifiers)

Researchers MUST ensure that sharing of media files complies with the informed consent obtained from participants and with applicable privacy regulations. De-identification techniques (for example, voice distortion, face blurring, metadata stripping) SHOULD be applied where appropriate before data sharing.

Example

A complete sidecar JSON file for an audio-video recording:

{
    "RecordingDuration": 312.5,
    "VideoCodec": "h264",
    "VideoCodecRFC6381": "avc1.640028",
    "VideoFrameRate": 30,
    "VideoFrameCount": 9375,
    "ImageWidth": 1920,
    "ImageHeight": 1080,
    "ImagePixelFormat": "yuv420p",
    "ImageBitDepth": 8,
    "AudioCodec": "aac",
    "AudioCodecRFC6381": "mp4a.40.2",
    "AudioSampleRate": 48000,
    "AudioChannelCount": 2
}