CTNet
Note: CTNet is an experimental proof of concept. Results are preliminary — all experiments use ImageNette2-320 on a single consumer GPU.
CTNet (Cosine Transform Network) compresses neural networks by reparameterizing convolutional layers into the Discrete Cosine Transform (DCT) domain and using H.265/HEVC video encoding as the compression backend. Instead of traditional pruning or fixed-bitwidth quantization, CTNet trains weights directly as DCT coefficients, regularized by a differentiable proxy of the H.265 bitrate cost. At export time, DCT coefficient maps are tiled into 2D frames and encoded as H.265 video streams.
Co-authored with Stylianos Iordanis.
CTNet-18 (ResNet-18 based) achieves 92.31% Top-1 on ImageNette2-320 with a total compressed model size of 4.5 MB — a 10.2x total compression from the original 44.6 MB, with DCT layers alone compressing 25-37x. Accuracy is only ~4% below the uncompressed baseline.
Key ideas
- DCT-domain parameterization: convolutional weights are learned directly as DCT coefficients, with spatial weights materialized via IDCT at each forward pass
- Differentiable H.265 rate proxy: a training-time regularizer that approximates H.265 bitrate cost, modeling significance maps, level coding, and zig-zag scan order — the same components HEVC uses internally
- Video codec export: DCT coefficient maps are reshaped into 2D images, normalized per-frame, and encoded as H.265 video streams — decoding is a single ffmpeg call
No custom entropy coder needed. Leverages decades of video codec optimization with hardware-accelerated H.265 decoding available on virtually every modern device.