One of those days by Loonies [web]

     __._        _.____   _.____   _.____   _.__    ____._    ____._
   _/  |/________\|_   \__\|_   \__\|_   \__\|__)__/  _ |/___/  __|/____
  _)   _/      /   /    /   /    /   /    /      (_   __|  (_____      (_
  \    \      /        /        /   /     \       /         /   /       /
 - -diP--------------------------------------------------------------uP!- -

                           One Of Those Days
                        Winner at Revision 2018
                           8 kb Intro compo

Music: Booster
Synth, editing, additional size optimization: Blueberry
Code, editing, etc: Psycho (psycho@loonies.dk)

Crinkler for compression
Rocket for scripting
Oidos synth for music
Internal shader compactor

Windows 10 Creators update (reading from typed uavs)
DirectX 11, feature level 11_0

Watching One of those days 3 by Candide Thovex 2 years ago with its completely over-the-top stunts and editing
gave the idea for a similar demo, maybe making it even more crazy. Also because it could be fun to try an 
action-camera style video instead of the normal cinematic goal.
And since landscapes and especially snow landscapes (ie without too much vegation) are relatively easy to make
look good and realistic it was also a good candidate for a sort of photo realistic 8 kb.
However we were already far into another big 8k project (nexus 8), so that would wait for next year, since it 
was an obvious Revision candidate. 2017 came but issues about the realism, especially for trees postponed it in 
favor of the 4k (which did the trees and stuff..). But finally got around to do it this year even though it became
way too late, music arriving on thursday, and we had to do all real scripting etc at the party.

Base landscape is generated in 3 compute passes to a 2k*16k texture. Based on value noise, some overall valley 
shape, selective smoothing in second pass and gradient calculation in third.

Two pass accelerated tile based raymarching (2-3x speed up in these scenes).
Part of the distance function modfies the landscape and part adds additional handwritten objects. Even with the tile
optimization there's a good deal of bounding object / LOD code, skipping of noise texture lookups unless we are close etc.
Single directional lightsource except for the tunnel (lightened by lamp part of the distance function).

Simple post processing pass with "artistic fake-dof" / radial blur and some sort of fake subsurface scattering for 
the snow (blur snow to other nearby snow, add new noise afterwards).

Scripting is done with the Rocket system. We have a tiny replayer code that loops through the whole script per frame 
and fills out a constant buffer array for the shader. We should probably have done some custom interpolation to avoid the 
linear movement (fine for cinematic cameras but not here...) and reduce the needed number of rotation adjustments. Camera
height is mostly physics based though.

Size breakdown
setup: 400
shader: 4000
sync replayer: 140
sync data: 1400 (way too much, got surprised here, but 700 keys total)
music: 900
oidos: 650

(New and improved) two pass tile based raymarching
Prepass renders to 4*int in 8x8 pixel resolution (8x8 is a good threadgroup size to keep main pass consistent) and outputs per tile:
-Minimum depth  (starting point for primary rays per pixel)
-Maximum shadow occluder distance (end point for shadow rays per pixel)
-Bitmask of which parts of the scene that are visible  (sdf is full of 'if (mask&2)' etc)
-Bitmask of which parts of the scene that shadows (with a special value telling the whole tile is shadowed)

I made an illustrative shadertoy example: https://www.shadertoy.com/view/XdycWy

For each bit in the mask:
-cone-raymarch, note the value t1, where sdf is less than the cone radius. This is the normal coarse-depth prepass (+masking)
-continue marching for some time (I go to t1*1.3) search for a distance t2 where sdf is less than cone radius
-clear mask if new t2 is less than so far minumum t1 (since tested scene bit occludes whole tile at this distance!)
-take combined minimum t1 and t2 for all mask bits

Now we have a good mask (for non-edge tiles only one object), and a good starting distance.
!But we also have a bounding sphere for the tile!
(maybe - for some edge tiles (t2>t1*1.3) or thin stuff - the sphere is too big to make sense)

So lets go on optimizing shadows:
-Beam-raymarch again for each bit from lightsource towards the tile bounding sphere. 
-Again find both t1 and t2 (t2 stepping is slightly different, doing longer steps inside) 
-If a t2 is found we know the whole tile is in shadow and no need to march shadow rays in the mainpass!
-Otherwise save the shadow mask and the maximum t1 (lightsource distance minus t1)

Current prepass shader function is around 270 bytes compressed.

So, in the main per-pixel pass we have:
-A good starting position for each ray
-For most tiles we only have one object in our sdf (rest is masked out)
-For many all-shadowed tiles we don't have to trace shadow rays at all
-For the all-lit tiles the amount of shadow steps are usually very low (break out of the tile bounding sphere)
-Usually only a few objects in our sdf for the shadow rays.
-Fewer objects in the sdf does not only mean faster calculation but also longer steps.

Prepass is not exactly free (10% or so), so it may make more sense to go to a slightly larger tile size.