Advanced Deep Learning Methods for Image Pixel-Wise Prediction

Yongjun Chen

Deep learning methods have been highly successful in performing image pixel-wise prediction tasks. One of the most popular methods employs an encoder-decoder network with deconvolutional layer (DCL) for up-sampling feature maps. However, a key limitation of DCL is that they suﬀer from the checkerboard artifact problem, which decreases prediction accuracy. This artifact is caused by the independence among adjacent pixels on the output feature maps. Previous work has solved the checkerboard artifact issue of DCL only in 2D space. Because the number of intermediate feature maps needed to generate DCL grows exponentially with dimensionality, solving the checkerboard artifact issue is much more challenging with a higher number of dimensions. Furthermore, current deep learning methods make image pixel-wise predictions by applying a model on regular patches centered on each pixel. Consequently, current methods are further limited for image analysis and prediction tasks because patches are determined from network architecture instead of data learning. To address these limitations, I proposed the voxel deconvolutional layer (VoxelDCL) to solve the checkerboard artifact problem of DCL in 3D space. Additionally, I developed an efficient approach to implement VoxelDCL. To demonstrate the eﬀectiveness of VoxelDCL, four variations of voxel deconvolutional networks (VoxelDCNs) were built based on U-Net architecture, but with VoxelDCL instead of DCL. Then, the proposed networks were applied to volumetric brain image labeling tasks, using the ADNI and LONI LPBA40 datasets. The experimental results showed that one variation, iVoxelDCNa, achieved the best performance across all experiments, reaching 83.34% in terms of dice ratio on the ADNI dataset and 79.12% on the LONI LPBA40 dataset. These values represent increases of 1.39% and 2.21%, respectively, compared with the classic U-Net method used as the baseline. Additionally, all VoxelDCN variations outperformed the baseline method on the two datasets, demonstrating their enhanced eﬀectiveness. To relax the constraint of using patches with ﬁxed shapes and sizes, I proposed dense transformer networks (DTNs), which can dynamically learn shapes and sizes of patches from input data. Experimental studies that applied these DTNs on natural and biological image pixel-wise prediction tasks also demonstrated superior performance compared to the baseline method.

Advanced Deep Learning Methods for Image Pixel-Wise Prediction

Files and links (1)

Abstract

Metrics

Details