A relaxed group-wise splitting method (RGSM) is developed and evaluated for channel pruning of deep neural networks. Experiments with VGG-16 and ResNet-18 architectures on CIFAR-10/100 image data show that RGSM can achieve much higher channel sparsity than group Lasso method, while keeping comparable accuracy.
Multi-resolution paths and multi-scale feature representation are key elements of semantic segmentation networks. We develop two techniques for efficient networks based on the recent FasterSeg network architecture. One is to use a state-of-the-art high resolution network (e.g. HRNet) as a teacher to distill a light weight student network. Due to dissimilar structures in the teacher and student networks, distillation is not effective to be carried out directly in a standard way. To solve this problem, we introduce a tutor network with an added high resolution path to help distill a student network which improves FasterSeg student while maintaining its parameter/FLOPs counts. The other finding is to replace standard bilinear interpolation in the upscaling module of FasterSeg student net by a depth-wise separable convolution and a Pixel Shuffle module which leads to 1.9% (1.4%) mIoU improvements on low (high) input image sizes without increasing model size.
A Fast Feature Affinity loss is developed for intermediate feature knowledge distillation. It requires less computational cost as well as storage cost. Experiments with modified EfficientNet architectures on CIFAR-100 data show that both Feature Affinity loss and Fast Feature Affinity loss improve the accuracy of the network and have close performance.
A compact DETR based architecture is proposed for human-only detection. By replacing the backbone of DETR with MobileNet-V3 and shrink the decoder layer, we first obtain a baseline model. Then we replace the transformer encoder with convolutional encoder. And experiments show that convolutional based encoders have better performance, but lessFLOPs and parameters.