Datafiles format information.



Contents


Introduction

An Allegro datafile is a bit like a zip file in that it consists of lots of different pieces of data stuck together one after another, and optionally compressed. This means that your game doesn't have to clutter up the disk with hundreds of tiny files, and it makes programming easier because you can load everything with a single function call at program startup. Another benefit is that the LZSS file compression algorithm works much better with one large file than with many small ones.

Datafiles have the extension .dat, and can be created and edited with the graphical grabber program or the command line dat utility. They can be stored as separate files and loaded into memory by the load_datafile() function, or you can use dat2s to convert them into asm code which can then be linked directly into your executable.

Objects

Each datafile contains a number of objects, of varying types. Object types are represented by 32 bit integer ID's, which are interpreted as four character ASCII strings. These ID's can be constructed with the DAT_ID() macro, for example a DATA object is represented by DAT_ID('D','A','T','A'), or you can use the predefined DAT_* constants for the standard data types:

Properties

Each object can have any number of properties attached to it. These are ASCII strings describing attributes of the object, such as its name and where it came from. Like the objects themselves, properties are identified by 32 bit integer ID's which are constructed from four character strings by the DAT_ID() macro. Allegro defines the standard properties:

You can use whatever other ID's you like to store custom information about your objects (the grabber internally use some other properties stored in a hidden DAT_INFO object, so they won't conflict with yours).


File format specification

In case anyone wants to do some serious hackery, and for my own future reference, here are some details of the innards of the datafile format.

Note that this is different to the datafile format used by Allegro versions 2.1 and earlier. Allegro can still load files from the old format, but it was much less flexible and didn't support nice things like object properties, so you should load any old files into the grabber and save them out again to convert to the new format.

Nb. if all you want to do is write a utility that manipulates datafiles in some way, the easiest approach is probably to use the helper functions in datedit.c, which are currently shared by the dat, dat2s, and grabber programs. These functions handle loading, saving, inserting and deleting objects, and modifying the contents of datafiles in various ways, but life is too short for me to bother documenting them all here. Look at the source...

Anyway. All numbers are stored in big-endian (Motorola) format. All text is stored in UTF-8 encoding. A datafile begins with one of the 32 bit values F_PACK_MAGIC or F_NOPACK_MAGIC, which are defined in allegro.h. If it starts with F_PACK_MAGIC the rest of the file is compressed with the LZSS algorithm, otherwise it is uncompressed. This magic number and optional decompression can be handled automatically by using the packfile functions and opening the file in F_READ_PACKED mode. After this comes the 32 bit value DAT_MAGIC, followed by the number of objects in the root datafile (not including objects nested inside child datafiles), followed by each of those objects in turn.

Each object is in the format:

   OBJECT =
      var    - <property list>      - any properties relating to the object
      32 bit - <type ID>            - object type ID
      32 bit - <compressed size>    - size of the raw data in the file
      32 bit - <uncompressed size>  - see below
      var    - <data>               - the contents of the object

The property list can contain zero or more object properties, in the form:

   PROPERTY =
      32 bit - <magic>              - "prop"
      32 bit - <type ID>            - property type ID
      32 bit - <size>               - size of the property string, in bytes
      var    - <data>               - property string, _not_ null-terminated

If the uncompressed size field in an object is positive, the contents of the object are not compressed (ie. the raw and compressed sizes should be the same). If the uncompressed size is negative, the object is LZSS compressed, and will expand into -<uncompressed size> bytes of data. The easiest way to handle this is to use the pack_fopen_chunk() function to read both the raw and compressed sizes and the contents of the object.

The contents of an object vary depending on the type. Allegro defines the standard types:

   DAT_FILE =
      32 bit - <object count>       - number of objects in the sub-file
      var    - <object list>        - objects in the same format as above

   DAT_FONT =
      16 bit - <font size>          - 8, 16, -1, or 0

      if font size == 8 {           - obsolete as of version 3.9.x!
         unsigned char[95][8]       - 8x8 bit-packed font data
      }

      if font size == 16 {          - obsolete as of version 3.9.x!
         unsigned char[95][16]      - 8x16 bit-packed font data
      }

      if font size == -1 {          - obsolete as of version 3.9.x!
         95x {
            16 bit - <width>        - character width
            16 bit - <height>       - character height
            var    - <data>         - character data (8 bit pixels)
         }
      }

      if font size == 0 {           - new format introduced in version 3.9.x
         16 bit - <ranges>          - number of character ranges
         for each range {
            8 bit  - <mono>         - 1 or 8 bit format flag
            32 bit - <start>        - first character in range
            32 bit - <end>          - last character in range (inclusive)
            for each character {
               16 bit - <width>     - character width
               16 bit - <height>    - character height
               var    - <data>      - character data
            }
         } 
      }

   DAT_SAMP =
      16 bit - <bits>               - sample bits (negative for stereo)
      16 bit - <freq>               - sample frequency
      32 bit - <length>             - sample length
      var    - <data>               - sample data

   DAT_MIDI =
      16 bit - <divisions>          - MIDI beat divisions
      32x {
         32 bit - <length>          - track length, in bytes
         var    - <data>            - MIDI track data
      }

   DAT_FLI =
      var - <data>                  - FLI or FLC animation, standard format

   DAT_BITMAP =
   DAT_C_SPRITE =
   DAT_XC_SPRITE =
      16 bit - <bits>               - bitmap color depth
      16 bit - <width>              - bitmap width
      16 bit - <height>             - bitmap height
      var    - <data>               - bitmap data

      Valid color depths are 8, 15, 16, 24, 32, and -32. Both 15 and 16 bit 
      images are stored in 5.6.5 RGB format, and 24 and 32 bit images as 
      8.8.8 RGB. The special -32 flag indicates that the data is in true 32 
      bit RGBA format.

   DAT_RLE_SPRITE =
      16 bit - <bits>               - sprite color depth
      16 bit - <width>              - sprite width
      16 bit - <height>             - sprite height
      32 bit - <size>               - data size, in bytes
      var    - <data>               - RLE compressed sprite data

      Valid color depths are 8, 15, 16, 24, 32. and -32. Both 15 and 16 bit 
      images are stored in 5.6.5 RGB format with 16 bit skip counts and EOL 
      markers, and 24 and 32 bit images as 8.8.8 RGB. with 32 bit skip 
      counts and markers. The special -32 flag indicates that the data is in 
      true 32 bit RGBA format.

   DAT_PALETTE =
      256 x {
         8 bit - <red>              - red component, 0-63
         8 bit - <green>            - green component, 0-63
         8 bit - <blue>             - blue component, 0-63
         8 bit - <pad>              - alignment padding
      }

I think that covers everything.