Wednesday, June 26, 2013

Implementing a generic single value container in c++

In my previous post I explained the format of the script system for ZEngine. Each Puzzle has a Results section which essentially stores function names and their arguments:
results {
    action:assign(5985, 0)
    background:timer:7336(60)
    event:change_location(C,B,C0,1073)
    background:music:5252(1 a000h1tc.raw 1)    
}
I wanted to be able to store each action inside a struct, and then have a linked list of all the structs. However, the problem is that both the number of arguments and the size of the arguments are variable. Marisa Chan's solution was to store all the arguments in a space delimited char array. IE:
char arguments[25] = "1 a00h1tc.raw 1";

Simple, but not without it's problems.
  1. Since the char array is in a struct, the size is fixed. In order to make sure we never overflow, we have to allocate a fairly large array. That said, in this particular case, each 'large' array in this case would only be ~30 bytes per struct.
  2. By storing everything as strings, we put off parsing till the action function is actually called. At first glace, this doesn't seem too bad, since the data will have to be parsed anyway. However, this method forces it to be parsed at every call to that action function.

Another option was to have everything stored in a linked list of void pointers. However, I don't think I need to convince anyone that void pointers are just gross and using them would be just asking for problems.

What I really wanted was a typed way to store a variably typed (and sized) value. Therefore I created what I'm calling the "Object" class. (I'm up for suggestions for a better name)

The heart of the class is a union that stores a variety of pointers to different types and an enum that defines what type is being stored:
class Object {
public:
    enum ObjectType : byte {
        BOOL,
        BYTE,
        INT16,
        UINT16,
        INT32,
        UINT32,
        FLOAT,
        DOUBLE,
        STRING,
    };

private:
    ObjectType _objectType;

    union {
        bool *boolVal;
        byte *byteVal;
        int16 *int16Val;
        uint16 *uint16Val;
        int32 *int32Val;
        uint32 *uint32Val;
        float *floatVal;
        double *doubleVal;
        Common::String *stringVal;
    } _value;
}
_objectType keeps track of what type of data the object is storing and _value points to the actual data. If _value were instead to hold the actual data value, the union would be forced to sizeof(Common::String), which is quite large (~34 bytes), due to internal caching. Then we're back to the argument of storing things in containers much larger than what they need. By putting the data on the heap and only storing pointers to the data, we save the wasted space, but at the CPU cost of heap allocation.

Now that the data is stored, how do we get it back? My original idea was to have implicit cast operators:
operator bool();
operator byte();
operator int16();
      .
      .
      .
However, LordHoto, one of the GSoC mentors and ScummVM developers, brought my attention to the problems that can arise when using implicit casting. For example, a user could try to cast the data to a type that wasn't stored in the Object and the cast would work, but the data would be completely corrupted. Also, from a user point of view, it wasn't intuitive.

Therefore, I removed the cast operators and created accessor methods:
bool getBoolValue(bool *returnValue) const;
bool getByteValue(byte *returnValue) const;
bool getInt16Value(int16 *returnValue) const;
           .
           .
           .

bool Object::getBoolValue(bool *returnValue) const {
    if (_objectType !=  BOOL) {
        warning("'Object' not of type bool.");
        return false;
    }

    *returnValue = *_value.boolVal;
    return true;
}
This adds a layer of type semi-protection to the class.

Lastly, I added assigment operators to the class, but rather than making this post even longer, I'll just link the full source here and here.


Advantages of 'Object' class
  • Can store relatively 'any' type of data. (Any type not currently supported could be trivially added)
  • Only uses as much space as needed.
  • Transforms dynamically typed data into a statically typed 'box' that can be stored in arrays, linked lists, hashmaps, etc. and can be iterated upon
Disadvantages of 'Object' class
  • Adds a small memory overhead per object. ( 1 byte + sizeof(Operating System pointer) )
  • Adds one heap memory allocation per object


So is it better than Marisa Chan's implementation? It really depends on what you define as better. While it does save memory, only requires data to be parsed once, and, in my opinion, adds a great deal of elegance to handling the Results arguments, it does so at the cost of heap storage. Not only the cost of the initial allocation, but the cost of potential defragmentation runs. But then again, is the cost of heap storage really that big, especially since the data should have a relatively long life? (On average, the time an end user spends in a room in the game) That I don't know, since it all depends on the memory allocator implementation.

In the end, I believe both methods perform well, and as such I choose the eloquence of using the 'Object' class. I am very much open to your thoughts on both the class as a whole or on your take of the problem. Also, if I misspoke about something please, please, please let me know.

Thanks for reading and have fun coding,
-RichieSams


Edit: Upon further inspection I noticed that by using Common::String I'm not only negating any memory size benefits from using 'Object', but potentially even using more memory, since Common::String has such a huge size.
Marisa Chan:
char arguments[25] = "1 a00h1tc.raw 1";
size = 25;

Object:
Object arg1 = 1;
Object arg2 = "a00h1tc.raw";
Object arg3 = 1;

size = (3 *sizeof(Object)) + sizeof(byte) + sizeof(Common::String) + sizeof(byte);
size = 15 + 1 + 34 + 1;
size = 51;
I could instead store the data in a char array, but it would mean that the Object class would need a stringLength member, which adds another 1 - 4 bytes on every instance of the class. Even with this new insight, I think I will continue to use 'Object', again for the added eloquence it adds. The memory difference is still rather small.

Scripting!!!!!!

I just realized that I forgot to do a post last week! I was being so productive, time just flew by.

Last week and the beginning of this week I've been working on the script management system for ZEngine. Well, before I get into that, let me go back a little further. According to my original timeline, the next milestone was creating a skeleton engine that could do basic rendering, sounds, and events. So, last Monday, I started by cleaning up the main game loop and splitting everything into separate methods and classes. With that, the run loop looks like this:
Common::Error ZEngine::run() {
    initialize();
    
    // Main loop
    uint32 currentTime = _system->getMillis();
    uint32 lastTime = currentTime;
    const uint32 desiredFrameTime = 33; // ~30 fps
    
    while (!shouldQuit()) {
        processEvents();
        
        currentTime = _system->getMillis();
        uint32 deltaTime = currentTime - lastTime;
        lastTime = currentTime;
        
        updateScripts();
        updateAnimations(deltaTime);
        
        if (_needsScreenUpdate)
        {
            _system->updateScreen();
        }
        
        // Calculate the frame delay based off a desired frame rate
        int delay = desiredFrameTime - (currentTime - _system->getMillis());
        // Ensure non-negative
        delay = delay < 0 ? 0 : delay;
        _system->delayMillis(delay);
    }
    
    return Common::kNoError;
}
No bad, if I do say so myself. :)

That done, I started implementing the various method shells, such as processEvents(). It was about that time that I realized the the structure of the scripting system had a huge impact on the structure of the engine as a whole. For example, should the event system call methods directly, or should it just register key presses, etc. and let the script system handle the calls? I had a basic understanding of how it probably worked, knowing the history of adventure games, but it was clear I needed to understand the script system before I could go any further.

The .scr files themselves are rather simple; they're text-based if-then statements. Here's an example of a puzzle and a control:
puzzle:5251 {
    criteria { 
        [4188] = 1
        [4209] ! 5
        [7347] = 1
        [67] = 0
    }
    criteria { 
        [4209] > 1
        [7347] = 1
        [67] = 1
        [4188] = [6584]
    }
    results {
        action:assign(5985, 0)
        background:timer:7336(60)
        event:change_location(C,B,C0,1073)
        background:music:5252(1 a000h1tc.raw 1)    
    }
    flags {
        ONCE_PER_INST
    }
}

control:8454 push_toggle {
    flat_hotspot(0,265,511,54)
    cursor(backward)
}
Puzzles:
  • Criteria are a set of comparisons. If ANY of the criteria are satisfied, the results are called.
    • The number in square brackets is the key in a 'global' variable hashmap. (The hashmap isn't actually global in my implementation but rather a member variable in the ScriptManager class) 
    • Next is a simplified form of the standard comparison operators ( ==, !=, <, > ). 
    • The last number can either be a constant or a key to another global variable. 
  • Results are what happens when one of the criteria is met. The first part defines a function, and the remaining parts are the arguments.
  • I haven't fully figured out flags, but from what I can see it's a bitwise OR of when results can be called. For example, only once per room.
For those of you that understand code better than words:
if (criteriaOne || criteriaTwo) {
    assign(5985, 0);
    timer(7336, 60);
    change_location('C', 'B', "C0", 1073);
    music(5252, 1, "a000h1tc.raw", 1);
}

Controls:
  • I haven't done much work on controls yet, but from what I have done, they look to be similar to results and are just called whenever interacted with. For example, a lever being toggled.

The majority of the week was spent working on the best way to store this information so all the conditions could be readily tested and actions fired. The best way I've come up with so far, is to have a Criteria struct and a Results struct as follows:
/** Criteria for a Puzzle result to be fired */
struct Criteria {
    /** The id of a global state */
    uint32 id;
    /**  
     * What we're comparing the value of the global state against
     * This can either be a pure value or it can be the id of another global state
     */
    uint32 argument;
    /** How to do the comparison */
    CriteriaOperator criteriaOperator;
    /** Is 'argument' the id of a global state or a pure value */
    bool argumentIsAnId;
};

/** What happens when Puzzle criteria are met */
struct Result {
    ResultAction action;
    Common::List<Object> arguments;
};

CriteriaOperator is an enum of the operators and ResultAction is an enum of all the possible actions. The other variables are pretty self explanatory.

Using the Criteria and Result structs, the Puzzle struct is:
struct Puzzle {
    uint32 id;
    Common::List<criteria> criteriaList;
    Common::List<result> resultList;
    byte flags;
};

Thus, the process is: read a script file, parse the puzzles into structs and load the structs into a linked list representing all the currently active puzzles. Elegant and exceedingly fast to iterate for criteria comparison checking. Now, some of you may have noticed the 'Object' class and are probably thinking to yourselves, "I thought this was c++, not c# or <insert terrible coffee-named language here>." It is, but that is a whole post to itself, which I will be writing after this one.

So, a couple hundred words in, what have I said? Well, over this past week I discovered how the script system determines what events to fire. This has helped me not only to design the script system code, but also has given me insight into how to design the other systems in the engine. For example, I now know that mouse and keyboard events will just translate to setting global state variables.

What I have left to do in the ScriptManager:
  • Figure out what CriteriaFlags are used for
  • Create shell methods for all the Result 'actions'
  • Write the parser and storage for control and figure out how they are called

Well that's about it for this post, so until next time,
-RichieSams

Wednesday, June 12, 2013

ZFS File format

Over the years I've reverse engineered quite a few file formats, but I've never really sat down and picked apart why a format was designed the way it was. With that said, I wanted to show the ZFS archive file format and highlight some of the peculiarities I saw and perhaps you guys can answer some of my questions.

For some context, Z-engine was created around 1995 and was used on Macintosh, MS-DOS, and Windows 95.

Format
The main file header is defined as:
struct ZfsHeader {
    uint32 magic;
    uint32 unknown1;
    uint32 maxNameLength;
    uint32 filesPerBlock;
    uint32 fileCount;
    byte xorKey[4];
    uint32 fileSectionOffset;
};
  • magic and unknown1 are self explanatory
  • maxNameLength refers to the length of the block that stores a file's name. Any extra spaces are null.
  • The archive is split into 'pages' or 'blocks'. Each 'page' contains, at max, filesPerBlock files
  • fileCount is total number of files the archive contains
  • xorKey is the XOR cipher used for encryption of the files
  • fileSectionOffset is the offset of the main data section, aka fileLength - mainHeaderLength

The file entry header is defined as:
struct ZfsEntryHeader {
    char name[16];
    uint32 offset;
    uint32 id;
    uint32 size;
    uint32 time;
    uint32 unknown;
};
  • name is the file name right-padded with null characters
  • offset is the offset to the actual file data
  • id is a the numeric id of the file. The id's increment from 0 to fileCount
  • size is the length of the file
  • unknown is self explanatory

Therefore, the entire file structure is as follows:
[Main Header]
 
[uint32 offsetToPage2]
[Page 1 File Entry Headers]
[Page 1 File Data]
 
[uint32 offsetToPage3]
[Page 2 File Entry Headers]
[Page 2 File Data]
 
etc.


Questions and Observations

maxNameLength
Why have a fixed size name block vs. null terminated or [size][string]? Was that just the popular thing to do back then so the entire header to could be cast directly to a struct?

filesPerBlock
What is the benefit to pagination? The only explanation I can see atm is that it was some artifact of their asset compiler max memory. Maybe I'm missing something since I've never programmed for that type of hardware.

fileSectionOffset
I've seen things like this a lot in my reverse engineering; they give the offset to a section that's literally just after the header. Even if they were doing straight casting instead of incremental reading, a simple sizeof(mainHeader) would give them the offset to the next section. Again, if I'm missing something, please let me know.


Well that's it for now,
-RichieSams

Git is hard, but ScummVM Common is awesome

This week I started working on Z-engine proper... And immediately ran face-first into the complexity of git. Well, let me restate that. Git isn't hard, per-se, but has so many features and facets that it can very easily go over your head. Anybody with a brain can mindlessly commit and push things to a git repo. However, if you really want structured and concise commit flow, it takes not only knowing the tools, but actually sitting back and thinking about what changes should be put in what commits and which branches.

So that said, I'll go over the things I really like about git or just distributed source control in general.

Branchy development is absolutely a must. It's really really helpful to separate different parts of a project or even different parts of the same section of a project. It makes identifying and diff-ing changes really easy. Also, I found it's really helpful to have a local "work-in-progess" version of the branch I'm working on. That allows me to commit really often and not really have to worry about commit message formatting or general structure. Then when I'm ready to do a push to the repo, I rebase my commits in my WIP branch to fit all my needs, then rebase them to the main branch before pushing.

On that note, rebase is AMAZING!!! It's like the "Jesus" answer in Sunday school, or "Hydrogen bonding" in chemistry class. However, "With great power comes great responsibility". So I try my hardest to only use rebase on my local repo.


On to details about Z-engine work!!

My first milestone for Z-engine was to get a file manager fully working, seeing how pretty much every other part of the engine relies on files. When I was writing my proposal for GSoC, I thought I was going to have to write my own file manager, but Common::SearchManager to the rescue!

By default, the SearchManager will register every file within the game's directory. So any calls to
Common::File.open(Common::String filePath);
will search the game's directory for the filePath and open that file if found.
Well that was easy. Done before lunch.... Well, not quite. Z-engine games store their script files in archive files. The format is really really simple, but I'll save that for a post of itself. Ideally, I wanted to be able to do:
Common::File.open("fileInsideArchive.scr");
After some searching and asking about irc, I found that I can do exactly that by implementing Common::Archive:
class ZfsArchive : public Common::Archive {
public:
    ZfsArchive(const Common::String &fileName);
    ZfsArchive(const Common::String &fileName, Common::SeekableReadStream *stream);
    ~ZfsArchive();
    
    /**
     * Check if a member with the given name is present in the Archive.
     * Patterns are not allowed, as this is meant to be a quick File::exists()
     * replacement.
     */
    bool hasFile(const Common::String &fileName) const;
    
    /**
     * Add all members of the Archive to list.
     * Must only append to list, and not remove elements from it.
     *
     * @return the number of names added to list
     */
    int listMembers(Common::ArchiveMemberList &list) const;
    
    /**
     * Returns a ArchiveMember representation of the given file.
     */
    const Common::ArchiveMemberPtr getMember(const Common::String &name) const;
    
    /**
     * Create a stream bound to a member with the specified name in the
     * archive. If no member with this name exists, 0 is returned.
     * @return the newly created input stream
     */
    Common::SeekableReadStream *createReadStreamForMember(const Common::String &name) const;
}
and then registering each archive with the SearchManager like so:
// Search for .zfs archive files
Common::ArchiveMemberList list;
SearchMan.listMatchingMembers(list, "*.zfs");
 
// Register the files within the zfs archive files with the SearchMan
for (Common::ArchiveMemberList::iterator iter = list.begin(); iter != list.end(); ++iter) {
    Common::String name = (*iter)->getName();
    ZfsArchive *archive = new ZfsArchive(name, (*iter)->createReadStream());
    
    SearchMan.add(name, archive);
}

In summary, git can be complicated, but it has a wealth of potential and is extremely powereful. Also, the ScummVM Common classes are absolutely fantastic and make the lives of engine developers sooooo much easier. A toast to the wonderful people who developed them. Well, that's all for now.

So until next time, happy coding. :)
-RichieSams