This is my MSc industry project. The goal is to take low-quality depth maps (256×256, 8-bit) from FaceLift's 3DGS face reconstruction and recover them to 1024×1024, 16-bit — basically ×4 spatial SR plus bit-depth recovery.
I built the full pipeline from scratch: data collection, FaceLift inference, postprocessing, training, and evaluation. Everything runs on a single RTX 4070 Laptop (8 GB VRAM).
11-minute voiceover walkthrough of the implementation
The main finding is that 3DGS rendering degradation is different from the standard bicubic degradation that existing SR benchmarks assume. A 5×5 cross-degradation test shows up to 5.7 dB PSNR drop when you train on one type and test on the other. DORNet (CVPR 2025 SOTA) trained on NYU barely beats plain bicubic on our data.
The other interesting result: pixel metrics (PSNR) don't tell the whole story. PixelShuffle-based methods (EDSR, SRResNet, SGNet) get higher PSNR than our UNet, but when you convert the depth to a 3D mesh, our UNet has an F-score of 0.999 vs their ~0.65. That's a 35 percentage-point gap that's completely invisible in PSNR.
I also found that 3DGS-rendered normals can't be used as training supervision (per-splat aliasing makes them noisy). Using DSINE pseudo-GT normals instead works much better.
pip install -r requirements.txt
# Train the main model (~2h on RTX 4070)
python scripts/train_depth_upres.py --batch_size 2 --grad_accum 4 --epochs 100 --ampThe pipeline scripts in scripts/ are meant to be run in order. See run_pipeline.py for the full sequence. There's a manual step (FaceLift rendering in FaceLift/01_render_improve.ipynb) between steps 4 and 7.
scripts/ All pipeline + training + eval scripts
notebooks/ Jupyter notebooks (training, baselines, ablations, figures)
FaceLift/ FaceLift 3DGS code (upstream, not modified by me)
external/SGNet/ SGNet baseline (AAAI 2024)
eval/ CSV results for all tables in the report
configs/ Pipeline config
paper/ LaTeX source
figures/ Generated paper figures
Key scripts:
train_depth_upres.py— main model training (DepthUpResUNet, 7.77M params)train_sgnet.py— SGNet baseline trainingeval_mesh_quality.py— F-score and Hausdorff evaluationpostprocess_maps.py— depth normalization, hole filling, normal smoothing
| Method | Params | PSNR (dB) | F-score @1e-3 |
|---|---|---|---|
| Bicubic | — | 42.5 | 0.998 |
| EDSR | 1.52M | 46.8 | 0.637 |
| SRResNet | 1.53M | 47.3 | 0.647 |
| SGNet | 9.22M | 48.9 | 0.654 |
| UNet (ours) | 7.77M | 47.0 | 0.999 |
| UNet+DSINE | 7.78M | 47.2 | 0.999 |
SwinIR-tiny (0.23M) collapsed entirely (PSNR 11.95 dB) — it can't learn this task without a bicubic residual prior.
data/ and checkpoints/ are not included (too large). They're generated by running the pipeline scripts from step 1.
All experiments ran on a single RTX 4070 Laptop 8 GB. AMP FP16 is required — without it the model doesn't fit in VRAM. 13 complete training runs, ~62 GPU-hours total.