Skip to content

Media Files

Introduction

Several BIDS datatypes make use of media files — audio recordings, video recordings, combined audio-video recordings, and still images. This appendix defines the common file formats, metadata conventions, and codec identification schemes shared across all datatypes that use media files.

The following media suffixes are defined:

Name suffix Description
Audio file audio An audio data file containing one or more audio streams. Common formats include WAV (uncompressed), MP3, AAC, and Ogg Vorbis.
Audio-video file audiovideo A media file containing both audio and video streams. Common containers include MP4, MKV, AVI, and WebM.
Image file image A still image data file. Common formats include JPEG, PNG, SVG, WebP, and TIFF.
Video file video A video data file containing one or more video streams but no audio. Common containers include MP4, MKV, AVI, and WebM.

Datatypes that incorporate media files (for example, behavioral recordings or stimuli) define their own file-naming rules, directory placement, and datatype-specific metadata. The conventions described here apply uniformly to all such datatypes.

Relationship to the photo suffix

The media file definitions introduced here generalize the concept of all media in BIDS. The existing photo suffix (used for photographs of anatomical landmarks, head localization coils, and tissue samples) predates this framework and covers a narrower use case — still images in specific electrophysiology and microscopy datatypes.

The media suffixes (audio, video, audiovideo, image) are intended as the general-purpose mechanism for all media content in BIDS. In practice, a "photo" could equally be a video of an experimental setup with verbal narration, an audio recording describing electrode placement, or a drawing rather than a photograph. The media file framework should be generally adopted for new datatypes, and a future proposal may deprecate the photo suffix in favor of the broader image suffix with appropriate migration tooling (see bids-utils).

Supported Formats

Audio formats

Format Extension Description
Waveform Audio .wav A Waveform Audio File Format audio file, typically containing uncompressed PCM audio.
MP3 Audio .mp3 An MP3 audio file.
Advanced Audio Coding .aac An Advanced Audio Coding audio file.
Ogg Vorbis .ogg An Ogg audio file, typically containing Vorbis-encoded audio.

Video container formats

Format Extension Description
MPEG-4 Part 14 .mp4 An MPEG-4 Part 14 media container file.
Audio Video Interleave .avi An Audio Video Interleave media container file.
Matroska Video .mkv A Matroska media container file.
WebM .webm A WebM media container file, typically containing VP8/VP9 video and Vorbis/Opus audio.

Image formats

Format Extension Description
Joint Photographic Experts Group Format .jpg A JPEG image file.
Portable Network Graphics .png A Portable Network Graphics file.
Scalable Vector Graphics .svg A Scalable Vector Graphics image file.
WebP Image .webp A WebP image file.
Tag Image File Format .tif A Tag Image File Format file.
Tag Image File Format .tiff A Tag Image File Format image file. The .tiff extension is the long form of .tif.

When choosing a format, consider the trade-off between file size and data fidelity. Uncompressed or lossless formats (WAV, PNG, TIFF) preserve full quality but produce larger files. Lossy formats (MP3, AAC, JPEG) significantly reduce file size at the cost of some data loss.

Media Stream Metadata

Media files SHOULD be accompanied by a JSON sidecar file containing technical metadata about the media streams. The following metadata fields are defined for media files.

Duration

Applies to suffixes: audio, video, audiovideo.

Key name Requirement Level Data type Description
RecordingDuration RECOMMENDED number Length of the recording in seconds (for example, 3600).

RecordingDuration reuses the existing BIDS metadata field already defined for electrophysiology recordings (EEG, iEEG, MEG, and others).

Audio stream properties

Applies to suffixes: audio, audiovideo.

Key name Requirement Level Data type Description
AudioCodec RECOMMENDED string The audio codec used to encode the audio stream, expressed as an FFmpeg codec name (for example, "aac", "mp3", "opus", "flac", "pcm_s16le"). This value can be auto-extracted using ffprobe -v quiet -print_format json -show_streams.
AudioSampleRate RECOMMENDED number Sampling frequency of the audio stream, in Hz (for example, 44100, 48000, 96000).

Must be a number greater than 0.
AudioChannelCount RECOMMENDED integer Number of audio channels in the audio or audio-video file (for example, 1 for mono, 2 for stereo).

Must be a number greater than or equal to 1.
AudioCodecRFC6381 OPTIONAL string The audio codec expressed as an RFC 6381 codec string (for example, "mp4a.40.2" for AAC-LC). This representation is useful for web and broadcast interoperability.

Note: AudioSampleRate is used instead of the existing SamplingFrequency field because audio-video files require distinguishing the audio sampling rate from the video frame rate. The Audio prefix makes this unambiguous in multi-stream containers.

Visual properties

Applies to suffixes: video, audiovideo, image.

Key name Requirement Level Data type Description
Width RECOMMENDED integer Width of the video frame or image, in pixels.

Must be a number greater than or equal to 1.
Height RECOMMENDED integer Height of the video frame or image, in pixels.

Must be a number greater than or equal to 1.

Video stream properties

Applies to suffixes: video, audiovideo.

Key name Requirement Level Data type Description
VideoCodec RECOMMENDED string The video codec used to encode the video stream, expressed as an FFmpeg codec name (for example, "h264", "hevc", "vp9", "av1"). This value can be auto-extracted using ffprobe -v quiet -print_format json -show_streams.
FrameRate RECOMMENDED number The video frame rate of the video stream, in Hz (for example, 24, 25, 29.97, 30, 60).

Must be a number greater than 0.
VideoCodecRFC6381 OPTIONAL string The video codec expressed as an RFC 6381 codec string (for example, "avc1.640028" for H.264 High Profile Level 4.0). This representation is useful for web and broadcast interoperability.

Codec Identification

Codec identification uses two complementary naming systems:

The AudioCodec and VideoCodec fields use FFmpeg codec names as the RECOMMENDED convention. These names are the de facto standard in scientific computing and can be auto-extracted from media files using:

ffprobe -v quiet -print_format json -show_streams <file>

RFC 6381 codec strings (OPTIONAL)

The AudioCodecRFC6381 and VideoCodecRFC6381 fields use RFC 6381 codec strings. These provide precise codec profile and level information useful for web and broadcast interoperability.

Common codec reference

Codec FFmpeg Name RFC 6381 String Notes
H.264 / AVC h264 avc1.640028 Most widely supported
H.265 / HEVC hevc hev1.1.6.L93.B0 High efficiency
VP9 vp9 vp09.00.10.08 Open, royalty-free
AV1 av1 av01.0.01M.08 Next-gen open codec
AAC-LC aac mp4a.40.2 Default audio for MP4
MP3 mp3 mp4a.6B Legacy lossy audio
Opus opus Opus Open, low-latency audio
FLAC flac fLaC Open lossless audio
PCM 16-bit LE pcm_s16le Uncompressed (WAV)

The FFmpeg name column shows the value to use for VideoCodec or AudioCodec. The RFC 6381 column shows the value for VideoCodecRFC6381 or AudioCodecRFC6381. RFC 6381 strings vary by profile and level; the values shown are representative examples.

Privacy Considerations

Media files — particularly audio and video recordings — may contain personally identifiable information (PII), including but not limited to:

  • Voices and speech content
  • Facial features and other physical characteristics
  • Background environments that could identify locations
  • Metadata embedded in file headers (for example, GPS coordinates, device identifiers)

Researchers MUST ensure that sharing of media files complies with the informed consent obtained from participants and with applicable privacy regulations. De-identification techniques (for example, voice distortion, face blurring, metadata stripping) SHOULD be applied where appropriate before data sharing.

Example

A complete sidecar JSON file for an audio-video recording:

{
    "RecordingDuration": 312.5,
    "VideoCodec": "h264",
    "VideoCodecRFC6381": "avc1.640028",
    "FrameRate": 30,
    "Width": 1920,
    "Height": 1080,
    "AudioCodec": "aac",
    "AudioCodecRFC6381": "mp4a.40.2",
    "AudioSampleRate": 48000,
    "AudioChannelCount": 2
}