RichieSam's Adventures in Code-ville: March 2014

Friday, March 14, 2014

Loading more interesting scenes - Part 1

When I finished the Deferred Shading Demo, I started looking at how Forward and Deferred differed in terms of frame-time and frame quality. I couldn't see any noticable differences in frame quality, which is good. But, forward shading was upwards of 2ms cheaper than deferred (depending on the camera position)!! I thought deferred shading was supposed to be better than forward?!?!?

At first I though my implementation was wrong, or that there was a bug in my code. But then it slowly dawned on me what the problem was. If you recall, the whole point to deferred is reducing the number of pixels that are shaded. A large majority of this is not shading pixels that are occluded (they fail the z-test). However, with my simple geometry, there are very few camera positions in which ANY geometry is occluded. Thus deferred shading just adds overhead.

So with this in mind I started looking for more complex scenes to test my code against. After a bit of searching I found Morgan McGuire's amazing Computer Graphics Data database. He has a good 20 some-odd scenes that he's personally maintained. (As most of them are no longer even available in their original form). Huge props to him and any others involved in the project.

Anyway, I downloaded the popular Crytek Sponza Scene in obj form. Awesome. Now what? Well, now I needed to load obj in Vertex and Index buffers. I looked around to see if there was any library to do it for me (why re-invent the wheel?), but I only found a smattering of thrown together code. Well and assimp. But assimp seemed a bit large for a temporary obj loader. More on that later. So with that said, I used the code here as a starting point, and created my own obj loader.

First off, obj files are HARD to parse. Mostly because they're so flexible. And being a text-based format, parsing text is just not fun. Ok, so maybe they're not HARD, but they're not easy either. The first major roadblock is that obj allows you to specify all the parts of a vertex separately.

Separate vertex definitions

For example:

v  476.1832 128.5883 224.4587
vn -0.9176 -0.3941 -0.0529
vt 0.1674 0.8760 0.0000

The 'v' represents the vertex position, the 'vn' represents the vertex normal, and the 'vt' represents the vertex texture coordinates. Indices can then choose whichever grouping of position, normal, and texture coordinates they need. Like this:

f 140/45/140 139/18/139 1740/17/1852

This is especially handy if large portions of your scene have the same surface normals, like square buildings. Then you only have to store a single 'vn' for all the vertices sharing the same normal.

HOWEVER, while this is great for storage, DirectX expects a vertex to be a singular unit, AKA, position, normal, AND texture coordinate all together. (Yes, you can store them in separate Vertex Buffers, but then you run into cache misses and data incoherence) I chose to work around it like this:

I use these data structures to hold the data:

std::vector<Vertex> vertices;
std::vector<uint> indices;

typedef std::tuple<uint, uint, uint> TupleUInt3;
std::unordered_map<TupleUInt3, uint> vertexMap;

std::vector<DirectX::XMFLOAT3> vertPos;
std::vector<DirectX::XMFLOAT3> vertNorm;
std::vector<DirectX::XMFLOAT2> vertTexCoord;

When reading vertex data ('v', 'vn', 'vt'), the data is read into its corresponding vector.
Then, when reading indices, the code creates a true Vertex and adds it to vertices. I use the unordered_map to check if the vertex already exists before creating a new one:

TupleUInt3 vertexTuple{posIndex, texCoordIndex, normalIndex};

auto iter = vertexMap.find(vertexTuple);
if (iter != vertexMap.end()) {
    // We found a match
    indices.push_back(iter->second);
} else {
    // No match. Make a new one
    uint index = meshData->Vertices.size();
    vertexMap[vertexTuple] = index;

    DirectX::XMFLOAT3 position = posIndex == 0 ? DirectX::XMFLOAT3(0.0f, 0.0f, 0.0f) : vertPos[posIndex - 1];
    DirectX::XMFLOAT3 normal = normalIndex == 0 ? DirectX::XMFLOAT3(0.0f, 0.0f, 0.0f) : vertNorm[normalIndex - 1];
    DirectX::XMFLOAT2 texCoord = texCoordIndex == 0 ? DirectX::XMFLOAT2(0.0f, 0.0f) : vertTexCoord[texCoordIndex - 1];

    vertices.push_back(Vertex(position, normal, texCoord));
    indices.push_back(index);
}

Success! On to the next roadblock!

N-gons

Obj supports all polygons; you just add more indices to the face definition:

f 140/45/140 139/18/139 1740/17/1852 1784/25/429 1741/35/141

Again, this is extremely handy for reducing storage space. For example, if two triangles are are co-planar, you can combine them into a quad, etc. HOWEVER, DirectX only supports triangles. Therefore, we have to triangulate any faces that have more than 3 vertices. Triangulation can be quite complicated, depending on what assumptions you choose to make. However, I chose to assume that all polygons are convex, which makes life significantly easier. Following the algorithm in Braynzar Soft's code, you can triangulate by making triangles with the first vertex, the next vertex and the previous vertex. For example, let's choose this pentagon:

We would then form triangles like so:

So the triangles are:

0 1 2
0 2 3
0 3 4

The code can be found here. One note before I move on: This way of triangulating is definitely not optimal for high N-gons; it will create long skinny triangles, which is bad for rasterizers. However, it serves its purpose for now, so it will stay.

Normals

It's perfectly legal for an face in obj to not use normals:

f 1270/3828 1261/3831 1245/3829

Similarly, you can have a face that doesn't use texture coordinates:

f -486096//-489779 -482906//-486570 -482907//-486571

(You'll also notice that you can use negative indices, which correspond to the index (1 - current number of vertices). But that's an easy thing to work around). The problem is normals. My shader code assumes that a vertex has a valid normal. If it doesn't, the default initialization to (0.0f, 0.0f, 0.0f) makes the whole object black. Granted, I could add some checks in the shader, where if the normal is all zero, just use the material color, but this just adds dynamic branching and in reality, there shouldn't be any faces that don't have normals.

So the first thing I tried is 'manually' calculating the vertex normals using this approach. The approach uses the cross product of two sides of a triangle to get the face normal, then averages all the face normals for faces sharing the same vertex. Simple, but it takes FOREVER. The first time I tried it, it ran for 10 minutes.... Granted, it is O(N₁² + N₂), where N₁ is the number of vertices and N₂ is the number of faces. The Sponza scene has 184,330 triangles and 262,267 faces. Therefore, I resolved to do the normal calculations once, and then re-create the obj with those normals. I'll get to that in a bit.

Vectors

After creating the basic obj loader I did some crude profiling and found some interesting behavior. When compiled for "Release", the obj loader ran 1 - 2 magnitudes of time faster. After much searching, I found out that in "Release", the VC++ compiler turns off a bunch of run-time checks on vectors. These checks are really good, in that they give improved iterator checks and various useful debug checks. However, they're really really slow. You can turn them off with compiler preprocessor defines, but I wouldn't. But just something to be aware of.

So that's obj's. With that all done, I can now load interesting models! Yay!!

But even in "Release", the scene still takes ~4 seconds to load on my beefy computer. Hmmm.... Well the first thing I did was to put the obj parsing into a separate thread so the main window was still interactive.

I also sleep the main thread in 50ms intervals to give the background thread as many cycles as it can. I need to do some further testing to see if sleeping the main thread affects child threads. This is using std::thread. I wouldn't think it would, but it doesn't hurt to test. Let me know your thoughts.

Well, that's it for now. I'll cover some of the specifics of what changed in the renderer from Deferred Shading Demo to Obj Loader Demo in the next post, but this post is getting to be a bit long. As always, feel free to comment or leave suggestions.

-RichieSams

Monday, March 10, 2014

Introducing the Halfling Project

Hello everyone!

It's been entirely too long since I've posted about what I've been working on. Granted, I did make a post a couple weeks ago about Git, but that was mostly for my class. So here goes!

We last left off with me wrapping up GSoC with ScummVM. I have since joined the ScummVM dev team (Yay!) and my current progress on the ZVision engine was merged into the master branch. Unfortunately, due to school keeping me quite busy and another project, I haven't had much time to work more on the engine. That said, it's not abandoned! I'm planning on working more on it after I graduate in August.

I have always been quite fascinated by computer graphics, especially in the algorithms that make real-time graphics possible. Wanting to get into the field, I started teaching myself DirectX 11 last December using Frank Luna's wonderful book, An Introduction to 3D Game Programming with DirectX 11. However, rather than just using his base code, I chose to create my own rendering framework, and thus The Halfling Project was born.

"Why re-invent the wheel?", you ask? Because it forces me to fully understand the graphics concepts, rather than just copy-pasting cookie-cutter code. Also, no matter how recent a tutorial is, there is bound to be some code that is out of date. For example, Frank Luna's code uses .fx files and the D3DX library. Effect files can still be used, but Microsoft discourages it. And the D3DX library doesn't exist anymore. Granted it has a replacement (DirectXMath), but it has a slightly different API. Thus, even if I were to 'copy-paste', I would still have to change the code to fit the new standards.

That said, I didn't come up with everything from scratch. The Halfling Project is heavily influenced by Luna's code, MJP's sample framework, and Glenn Fiedler's blog posts. Overall, The Halfling Project is just a collection of demos that happen to use the same base framework. So, with that in mind, let me describe some of the demos and what I plan for the future.

(If you would like to try out the demos for yourself, there are compiled binaries in my Git repo. You will need a DirectX11 capable graphics card or integrated graphics and will need to install the VS C++ 120 redistributable, which is included with the demos.)

Crate Demo:

My "Hello World" of DirectX 11! Ha ha! So much code for a colored box.... I can't tell you how happy I was when it worked though!

Me: "Look! Look what I made!"
My roommate: "What? It's a box."
Me: "But.... it was hard..."

I guess he had a point though. On to more interesting things!

Wave Simulation Demo:

So the next thing to change was to make the geometry a bit more interesting. I borrowed a wave simulation algorithm from Frank Luna's code and created this demo. Each update, it applies the wave equation to each vertex and updates the Vertex Buffer.

Lighting Demo:

So now we had some interesting geometry, now it was time for some lights! Well, one light...

I actually didn't use the wave simulation geometry because it required a dynamic vertex buffer. (Yes I know you could do it with a static buffer and transformations, but baby steps) Instead, I borrowed another function from Frank Luna's code that used sin/cos to create hills. The lighting is a forward renderer using Lambert diffuse lighting and Blinn-Phong specular lighting. Rather than bore you with my own re-hash of what's already written, I will point you to Google.

Deferred Shading Demo:

This is where I diverged from Frank Luna's book and started off on my own. I like to read graphics white papers and talks on my bus ride to and from school. One that I really liked was Andrew Lauritzen's talk about Tiled Shading. In my head, deferred shading was the next logical step after traditional forward shading, so I launched in, skipping right to tiled deferred shading. However, it wasn't long before I was in way over my head. I guess I should have seen that coming, but hind-sight is 20-20. Therefore I resolved to first implement naïve deferred shading, and THEN think about tiled (and perhaps clustered).

So how is deferred shading different than forward shading?

Traditional Forward:

The application submits all the triangles it wants rendered to the GPU.
The hardware rasterizer turns the triangles into pixels and sends them off to the pixel shader
The pixel shader applies any lighting equations you have

Assuming no light culling, this means the lighting equation is invoked
((# pixels from submitted triangles) x (# lights)) times

The output merger rejects pixels that fail the depth test and does pixel blending if blending is enabled

Traditional Deferred:

GBuffer Pass:

The application submits all the triangles it wants rendered to the GPU.
The hardware rasterizer turns the triangles into pixels and sends them off to the pixel shader
The pixel shader stores the pixel data in a series of texture buffers called Geometry Buffers or GBuffers for short

GBuffer contents vary by implementation, mostly depending on your lighting equation in the second pass
Common data is World Position, Surface Normal, Diffuse Color, Specular Color, and Specular Power

The output merger rejects pixels that fail the depth test. Blending is NOT allowed.

Lighting Pass:

The application renders a fullscreen quad, guaranteeing a pixel shader thread for every pixel on the screen
The pixel shader samples the GBuffers for the data it needs to light the pixel
Then applies the lighting equation and returns the final color

Assuming no light culling, this means the lighting equation is invoked
((# pixels on screen) x (# lights)) times

The output merger is pretty much a pass-though, as we don't use a depth buffer for this pass.

So what's the difference? Why go through all that extra work?

Deferred Shading invokes the lighting equation fewer times (generally)

In the past 10 years, there has been a push to make real-time graphics more and more realistic. A massive part of realism is lighting. But, lighting is usually THE most expensive calculation for a scene. In forward shading, you calculate lighting for each and every pixel that the rasterizer creates. However, depending on your scene, a large number of these pixels will be rejected by the depth test. Thus, a large number of calculations were *wasted* in a sense. Granted there are ways around this, but they aren't perfect and I'll leave that for future exploration. Thus, deferred shading effectively separates scene complexity and lighting complexity.

This all said, deferred shading isn't the cure-all for everything; it does have some significant draw-backs

It requires a large* amount of bandwidth and memory to store the GBuffers

Large is a relative term. It ultimately depends on what platform you're targeting

It requires hardware that allows multiple render targets

Somewhat of a moot point with today's hardware, but still something to watch for

No hardware anti-aliasing.
No transparent geometry / blending

So how is my deferred shading demo implemented?

GBuffers:

Albedo-MaterialIndex	DXGI_FORMAT_R8G8B8A8_UNORM
Normal	DXGI_FORMAT_R16G16_FLOAT
Depth	DXGI_FORMAT_R32_FLOAT

8 bits	8 bits	8 bits	8 bits
Albedo Red	Albedo Green	Albedo Blue	Material Index
Normal Phi		Normal Theta
Depth

Albedo	Stores the RGB diffuse color read from texture mapping
MaterialIndex	An offset index to a global material array in the shader
Normal	The fragment surface unit normal stored in spherical coordinates. (We don't store radius since we know it's 1 for a unit normal)
Depth	The hardware depth buffer. It stores (1 - z/w). By swapping the depth planes, we spread the depth precision out more evenly.

Converting the normal to/from spherical coordinates is just some trig, but here is the code I use. Note: My code assumes that the GBuffer can handle non-uniform data. (AKA, potentially outside the range [0, 1])

I use the depth buffer to calculate the world position of the pixel. The basic principle is that since we know the position of the pixel on the screen, using that, the depth, and the inverse ViewProjection matrix, we can calculate the world postion. I'll point you here and here for more information.

So you managed to get through all that, let me reward you with a video and some screenshots. :)

With 500 point lights and 500 spot lights

Visualizing the GBuffers

And one last one to show you that the depth buffer does actually have data in it:

Well that's it for now! I have another demo I'm working on right now, but I'll leave that for another post. If you want a sneak peak, there is a build of it in my repo.

As always, feel free to ask questions and leave comments or suggestions.

-RichieSams