RAWcooked

Data is never RAW.
Data is always cooked in a way or another.

A current reality of many archives

Many archives preserve their single-image based files as TIFF or DPX, grouped by entire films or single reels, and stored inside folders, or TAR or ZIP archives. Yet to have a single file format which is suitable for the preservation of both film and video has been a dream of many audio-visual archives, especially in smaller ones, since the beginning of digital moving images. We present RAWcooked, the missing piece of software that implements exactly this capability, in a free and open-source environment.

Additional context

The missing piece of software

The solution we propose is RAWcooked, a piece of software that encodes from and decodes to various so-called «raw» audio-visual file formats in a completely transparent way. The image content and/or sound content, including all the related metadata and sidecar files, is fully preserved bit-by-bit.

Encoding RAW into a Matroska file

RAWcooked encodes RAW audio-visual data into the Matroska container (MKV), using the video codec FFV1 for the image and audio codec FLAC for the sound. The metadata accompaigning the RAW data are of course preserved, and sidecar files, like MD5, LUT or XML, can be added into the Matroska container as attachments. This allows to manage these audio-visual file format in an effective and transparent way, while saving typically between one and two thirds of the needed storage, and speeding up the back-up on LTO cartridges.

 

Following image formats are currently planned for the first release:

  • TIFF, RGB 16-bit per colour channel (rgb48)
  • DPX, RGB 16-bit per colour channel (rgb48)
  • DPX, RGB 12-bit per colour channel
  • DPX, RGB 10-bit per colour channel

 

Following sound formats are currently planned for the first release:

  • WAVE, PCM signed 24-bit (s24), any number of audio channels
  • WAVE, PCM signed 16-bit (s16), any number of audio channels
  • BWF, PCM signed 24-bit (s24), any number of audio channels
  • BWF, PCM signed 16-bit (s16), any number of audio channels

 

For instance, the unplayable ZIP files could be replaced by a playable file, with the full reversibility of the original package content.

 

The first release of RAWcooked is designed and programmed in order that the support for one or more of the following formats can easily be added in a following version:

  • Bayer-based formats (e.g. bayer_rggb16)
  • OpenEXR or generally HDR
  • RGB 24-bit per colour channel (rgb72)
  • BWF and WAVE, PCM signed 32-bit (s32), any number of audio channels

 

Of course, other developments can be sponsored by any interested body or individual (e.g. other file formats or flavours). Please feel free to contact us, support this project and get involved!

Decoding the Matroska file back to the original RAW

Whenever needed, RAWcooked decodes back the encoded Matroska file to the original RAW files, including all the original metadata and sidecar files. It is important to stress out clearly that the encoded files can be decoded, and the resulting files are bit-by-bit identical to the original ones. Not only the image and/or sound content is fully preserved, but also all enclosed metadata and the all file’s characteristics. Therefore, an encoded and decoded RAW file cannot be differentiated from its original.

A step-by-step journey

Philosophy

In the genuine spirit of sharing, the principles governing this project are:

  • open-source code freely available on GitHub
  • cross-platform: runs on Linux, macOS and Windows
  • installers for various OS provided by MediaArea.net
  • programmed in C

 

The software is released under the MIT licence and the documentation under the CC BY licence. The copyright is held jointly by AV Preservation by reto.ch, Switzerland, and MediaArea.net, France. This way, any interested body or individual can not only use and distribute free of charge RAWcooked, but also correct, modify and improve it.

Deliverables

A command line interface (CLI) which allows to encode and decode single files, as well as to batch process the encoding and decoding of multiple files. This consists in:

  • rawcooked, the CLI command to encode and decode files, and
  • the related man page, a short in-line user manual

 

In addition to the CLI, the Matroska files which have been encoded by RAWcooked can be used on wide range of popular software supporting Matroska/FFV1 and FLAC plackack, including:

  • VLC media player for playback
  • FFmpeg tools can transcode, playback and probe those files
  • QCTools can performs in-depth analysis
  • MediaInfo can display technical information

This way an archive with limited resources can process the files encoded by RAWcooked on every current computer (e.g. for cataloguing or access purposes).

Presentations

  • Jérôme Martinez: RAWcooked, «No Time to Wait! Open Media, Open Formats, Open Archives» at Österreichisches Filmmuseum in Vienna, Austria, on 9–10 November 2017

Schedule

November 2017: Alpha Release
An alpha release is presented as proof of concept during the AMIA Conference (29 November – 2 December 2017 in New Orleans, LA).
February 2018: Beta Release
Beta releases are available for testing purposes by the community.
April 2018: First Release
The first release is officially launched during the FIAF conference (22–27 April 2018 in Prague, Czechia).

Funding plan

MediaArea.net realises the implementation of the basis module for anticipated costs of EUR 10 000 which are guaranteed by AV Preservation by reto.ch.

The archival community and the industry are invited to buy the flavours (endianess, padding, packing, strips, etc.) of the formats they actually need. An example file of each flavour of a format will be tested by MediaArea.net and the exact specification implemented into the software. This way the contributing bodies and individuals are sure their files can be processed by the software. The price is EUR 1 000 for the first flavour of a format and EUR 500 for each following.


2017–11–11