Non-Local DWI Image Super-Resolution with Joint Information Based on GPU Implementation

2019-12-19YanfenGuoZheCuiZhipengYangXiWuandShaahinMadani

Computers Materials&Continua 2019年12期

Yanfen Guo ,Zhe Cui ,Zhipeng Yang,Xi Wu and Shaahin Madani

Abstract: Since the spatial resolution of diffusion weighted magnetic resonance imaging (DWI) is subject to scanning time and other constraints,its spatial resolution is relatively limited.In view of this,a new non-local DWI image super-resolution with joint information method was proposed to improve the spatial resolution.Based on the non-local strategy,we use the joint information of adjacent scan directions to implement a new weighting scheme.The quantitative and qualitative comparison of the datasets of synthesized DWI and real DWI show that this method can significantly improve the resolution of DWI.However,the algorithm ran slowly because of the joint information.In order to apply the algorithm to the actual scene,we compare the proposed algorithm on CPU and GPU respectively.It is found that the processing time on GPU is much less than on CPU,and that the highest speedup ratio to the traditional algorithm is more than 26 times.It raises the possibility of applying reconstruction algorithms in actual workplaces.

Keywords: Super-resolution,non-local means,parallel computing,GPU.

1 Introduction

Diffusion Weighted Imaging (DWI) is a special imaging method based on Magnetic Resonance Imaging (MRI) by modifying the main orientation of the MRI magnetic field to acquire multiple diffusion orientation images.DWI utilizes multiple 3-dimensional diffusion-weighted images to probe the water diffusivity along various directions to provide important structural information or functional features about the underlying tissue.[Hasan,Walimuni,Abid et al.(2011);Johansen-Berg and Behrens (2013);Yao and Troupis (2016)].Its importance has been proven in clinical settings for investigating several brain disorders and tumors,such as Alzheimer's disease,schizophrenia,breast tumor,liver and neck cancer [Minosse,Marzi,Piludu et al.(2017);Aribal,Asadov,Ramazan et al.(2016);Ciccarone,Gulino,Esposito et al.(2016);Hejduk,Jedrzejewska,Billewicz-Bobek et al.(2011)].

In spite of all DWI advantages,the use of this technique is still very limited because an important characteristic of DWI is the low spatial resolution and the low signal-to-noise ratio.On the other hand,the acquisition of a set of high-resolution images is constrained by scanning costs,scanning time,scanner availability and patient comfort.In the clinical practice,DWI is usually acquired at a low-resolution.It can provide a better sensitivity for the analysis of brain structure and clinical disease by the improvement of DWI spatial resolution and high SNR [Mori and van Zijl (2002);Zeineh,Holdsworth,Skare et al.(2012);Coupé,Manjón,Chamberland et al.(2013)].Among the post-processing processes that improve the resolution of images,the Super-Resolution (SR) appears as an efficient tool to enhance the information resolution [Vemulapalli,Nguyen and Zhou et al.(2015)].With the development of SR,its methods can be explored from two stages:constrained reconstruction method and learning based method [Yue,Shen,Li et al.(2016)].The former is based on sampling theory for image registered multimodal pairs,and the latter focuses on leaning the end-to-end mapping function of LR/HR image pairs.In the pioneer work,example-based SR methods was widely used.These methods focus on learning a compact dictionary or manifold space to relate LR/HR image pairs and presume the high-frequency (HF) details of LR images [Wang,Qiao,Li et al.(2014)].Trinh et al.[Trinh,Luong and Dibos et al.(2014)] utilized a non-negative sparse linear representation of the input patch over the LR patches to estimate an HR image.Roy et al.[Roy,Carass,Prince (2013)] proposed an image restoration technique called MR image example-based contrast synthesis (MIMECS),which relied on sparse reconstruction from image patches.

This type of non-local approach has high complexity and long computation time due to the patch-based image traversal,and it cannot be used in actual scenes.In recent years,the functions of GPU have been gradually extended from specialized graphics processing core to high-performance computing in computer systems.In the operation of SR algorithm,there is no interdependent relationship between data computation and,therefore,it has good parallelism and is suitable for processing on GPU [Eklund,Dufort,Forsberg et al.(2013)].Medical image processing technology on GPU makes it possible to apply more advanced algorithms and perform computationally intensive tasks quickly in a clinical environment.Hence,medical image processing on GPU has become a very popular technology [Pratx and Xing (2011)].

In this paper,we propose a super-resolution method based on Non-Local Mean (NLM) using collaborative joint information to solve the SR problem in DWI dataset.Simultaneously,in order to improve the computational efficiency,the algorithm proposed in this paper is transplanted from CPU to GPU,which is implementation through CUDA language.

2 Related work

2.1 Super reconstruction with non-local mean

Image SR can be regarded as an ill-posed inverse problem,which requires to build an appropriate model for the low-resolution (LR) image Y and the high-resolution (HR) image X.The general model can be expressed as follows:

where D is the decimation operator,H is the degradation operator,and η is acquisition noise.The SR reconstruction problem is to estimate the underlying HR version X from Y as follows:

Based on this model,the SR image can be estimated by minimizing a least square cost function as follows:

The non-local mean method can be used into an SR reconstruction,the NLM expression was defined as

where Xpand Xqare the voxels at location p and q,w is a searching window,w weighs the similarity between patch S(Xp) and S(Xq) below:

In which h is the decay parameter,Zpis a normalization factor and is defined as the sum of all the weights below:

As shown in Manjón et al.[Manjón,Coupé,Buades et al.(2010);Zhou,Qiu,Li et al.(2018)],after computing this regularization term,the fidelity term is then applied for subsampling consistency [Banerjee and Jawahar (2008)]:

Finally,this non-local interpolation framework is implemented iteratively until convergence,and the two-step iteration can be defined as:

where NN is the nearest neighbor interpolation,and t is the iteration number.Eqs.(7) and (8) correspond to the non-local reconstruction and mean correction,respectively.

This non-local SR method was proposed primarily for 3D MRI,and was then adapted for DWI application,which uses b0 information as the HR reference to guide the reconstruction [Coupé,Manjón,Chamberland et al.(2013)].

On the other hand,joint information,as indicated before,gathers the information from all correlated gradient images,providing extra redundancy,which is beneficial for SR reconstruction.Let XpNand XqNdenote the DWI patches;and N corresponds to the Nth gradient direction.As shown in Manjón et al.[Manjón,Coupé,Buades et al.(2012);Chang,El-Araby,Dang et al.(2014)],a more efficient non-local estimation can be acquired through a more accurate weighting scheme.Intuitively,it is possible for joint information from correlation gradient directions to improve weighting accuracy,which leads to better SR reconstruction:

It should be noted that,the similarity measure in Eq.(10) is not rotationally invariant.As pointed out in Manjón et al.[Manjón,Coupé,Buades et al.(2012)],the rotationally invariant measure can be applied to the proposed SR method for further improvement of the high-resolution image reconstruction.Manjon et al.[Manjón,Coupé,Buades et al.(2012)] proposed a simple but effective rotationally invariant measure,which is based on voxel intensity and the corresponding patch means.In this work,we adapted this rotationally invariant measure into our non-local SR method with joint information of DWI and,therefore,the similarity measure in Eq.(10) can be rewritten as:

where μ is the mean of the patches around voxel XpN(or XqN) and its corresponding n nearby gradient directions.

As shown in (8),the mean correction step in order to ensure that the reconstructed high-resolution image will be consistent with the original low-resolution image can be represented as:

where NN is the nearest neighbor interpolation,H is the degradation operator,D is the decimation operator andis the reconstructed high-resolution image.

2.2 Implementation based on GPU acceleration

With the rapid development of Graphics Processing Unit (GPU),the computational performance of the algorithm has been greatly improved.GPU is very suitable for big data and parallel computing tasks.Compute Unified Device Architecture (CUDA) is a parallel programming development model running on GPU,which is an extension of the C language.In the CUDA paradigm,a parallel task is executed by launching a multi-threaded program called a kernel.The kernel in the host is executed on the hardware according to the thread grid,which contains multiple thread blocks,which in turn contain multiple threads.When the kernel function starts on the host,its execution migrates to the device.The computation of a kernel is distributed to many threads,which are grouped into a grid of blocks (Fig.1).GPU acceleration of medical image reconstruction algorithms has become much more common during recent years.Many research teams are interested in using GPU to visualize medical datasets in real-time,taking advantage of GPU's inherent graphics capabilities.GPU computing is also used in medical image reconstruction.Optical imaging and microscopy have also started to take advantage of GPUs,mainly for acceleration of demanding reconstruction algorithms [Chang,El-Araby,Dang et al.(2014)].The SR algorithm has a large amount of computing tasks and requires repeated calculation of a large amount of data.In addition,the various data computation tasks necessary to be performed during the execution of the algorithm do not depend on each other,which makes the algorithm suitable for transplanting to GPU platforms.

Figure 1:It is a two-layer thread hierarchy consisting of thread blocks and thread block grids.All threads in the same grid share the same global memory space.Thread collaboration within the same thread block can be achieved through synchronization and shared memory

It can be seen from the above formula that the SRNLM algorithm has a huge amount of calculation and a high time complexity.Considering the ability of multithreading parallel computing from GPU,the repetitive computing needs to be mapped to the GPU platform for parallel processing.The flow (Fig.2) of parallelization using CUDA for SRNLM in this article is as follows.

On CPU,it needs to initialize and load low-resolution image.Afterwards,when GPU allocates memory space,it copies the data into its display memory space.

On the device,it firstly gets texture memory and allocates array memory in CUDA.It defines the kernel function (SRNLM Kernel) on the device and calls it to map the image to each thread in pixels.According to the ability of graphics card computing,blocks and grids are set up so that GPU can carry out the SRNLM algorithm in parallel on several pixel points simultaneously according to the thread block.It needs to loop through the kernel function until all image reconstruction is done.

At last,it returns the reconstruction result from display memory to the CPU.The result is integrated and normalized on the CPU,and then the final reconstruction image is obtained.

Figure 2:The Serial code is executed on CPU of the host,and the parallel code is executed on GPU of the device

3 Experiments and results

3.1 Experiments based on SRNLM

A high field in vivo DWI dataset is selected as the experimental data.In addition,B-spline interpolation,which was introduced for DWI resolution enhancement in the literature [Raffelt,Tournier,Rose et al.(2012)] is used.

The in vivo DWI dataset was acquired by a 7T Philips Achieva whole body scanner (Philips Healthcare,Cleveland,OH) with a volume head coil for transmission and 32-channels.A DW dual spin-echo,SENSE accelerated msh-EPI was used to acquire the DWI data (b-value:700 s/mm2;15 diffusion directions);FOV=210×30×21 mm3;matrix size=300×300 with 15 slices and a spatial resolution of 0.7×0.7×2 mm3.A gold standard image was constructed based on this in vivo HR DWI dataset to quantitatively and qualitatively validate the proposed approach.To do that,we averaged 10 acquisitions of high-resolution DW images in the image space (0.7×0.7×2 mm3).Then the LR images used for the experiment were simulated by down-sampling our gold standard by a factor of 2 using the nearest-neighbor interpolation along each axis,which resulted in simulated LR images of 1.4×1.4×4 mm3.Both HR and LR data were filtered using the UNLM3D filter to remove random noise before high-resolution reconstruction.Two objective measure matrixes,namely the Peak Signal to Noise Ratio (PSNR) and Structural Similarity (SSIM) were used to quantitatively evaluate the super-resolved DWI dataset.The PSNR measures the differences between each images and image quality and SSIM measures the structure and perceptual similarities between the original and reconstructed images.

where μxand μyare the mean value of images x and y,σxand σyare standard deviation of x and y,respectively,σxyis the covariance between them,and constants c1and c2are chosen as suggested in Wang et al.[Wang,Bovik,Sheikh et al.(2004)].

For the in vivo dataset,the DWI reconstruction results were compared quantitatively and qualitatively in Fig.3 and Fig.4.For our proposed method,the super-resolution using joint information enhanced the reconstructed image quantitatively for every DWI image with different directions.In addition,both PSNR and SSIM improved with the increasing involvement of nearby DWI images.Fig.4 shows the visual comparison of the reconstructed DWI images.Reconstructed image (Fig.4(b)) has the blurriest result of all;the proposed method using seven nearby gradient images (Fig.4(e)) achieved the image most similar to the golden standard (Fig.4(a)).The crack in the enlarged region is still clear in the image using the proposed method (Fig.4(e)) compared with other method without any involvement of joint information.

Figure 3:PSNR and SSIM estimated between the gold standard and the images reconstructed from the simulated LR image.(a) Plots show the PNSR for the methods compared.(b) Plots show the SSIM for the methods compared

Figure 4:Tests of diffusion weighted image reconstruction obtained for different methods.Top:(a) The gold standard.(b-e) Result of B-spline reconstruction,Non-Local upsampling,Proposed-5 grad and Proposed-RI-5 grad.Bottom:(f-j) The enlarged details of the gold standard,B-spline reconstruction,Non-Local upsampling,Proposed-5 grad and Proposed-RI-5 grad.The red ROIs indicate the detailed reconstruction

Therefore,the impact of misalignments on the quality of results using the proposed method was studied.First,the displacements between b0 and DW images were obtained with an FSL eddy current correction [Smith,Jenkinson and Woolrich (2004)] and the mean displacements estimated from the reconstruction results are proposed in Fig.5(a).Next,Fig.5(b) demonstrates the correlation between image quality in terms of PSNR and the estimated mean displacements.It can be seen that there is no significant linear relationship between the results from the proposed method and the estimated mean displacements,which demonstrates the robustness of the proposed method to the limited misalignment.

Figure 5:Test of misalignments on the quality of results (a) Estimated mean displacement in mm using FSL eddy current correction.(b) PSNR obtained with proposed-RI-5 grad according to estimated mean displacement.No significant linear relation was found (p-value=0.51)

3.2 Experiments based on GPU

In this section,the device parameters of CPU and GPU are respectively as follows.The CPU is Intel core i7-4600 U with 8 computing cores,and the main frequency is 2.9 GHz.The GPU is NVIDIA Tesla M40 based on Maxwell architecture,which has 3,072 computing cores and 12 GB storage capacity.The SRNLM algorithm was respectively run on CPU and GPU,which should reconstruct five times of super resolution.

The experimental results show that the parallel SRNLM algorithm can get a speed ratio of 26 times as fast as the serial algorithm on the basis of maintaining the algorithm performance.The results of different image processing for a matrix size of 128×128,60 slices and a single direction are shown in Tab.1.

Table 1:The calculating time for a single direction from CPU and GPU

where the speedup ratio is defined as

In which TCis the calculating time from CPU,and TPis the calculating time from GPU.And then,we did experiments for the same matrix with 32 directions.The results are presented in Tab.2.

Table 2:The Calculating Time for 32 Directions from CPU and GPU

It can be seen from Tab.1 and Tab.2 that the processing time on GPU is significantly lower than on CPU,and the highest speedup ratio to the traditional algorithm is more than 26 times.It raises the possibility of applying reconstruction algorithms in actual workplaces.

4 Discussion and conclusion

Firstly,compared with the traditional interpolation algorithm,the proposed framework introduced joint information to improve the weighting scheme yet with a better image reconstruction.Meanwhile,the reconstruction of high-resolution image was further improved by introducing rotationally invariant similarity measure,which ensures a more accurate regularization procedure exists in SR while effectively reducing the computational burden.Secondly,GPU has the advantage of fast computing speed.Considering the high computational complexity of SRNLM reconstruction algorithm,it is transplanted from CPU to GPU.The experimental results show that the parallel SRNLM algorithm can achieve a speed of 19 times faster than the serial algorithm.

Finally,our next goal is to conduct research on multi-GPU parallel computing to improve the processing efficiency of massive data with the popularization of general GPU and the appearance of massive data.

Acknowledgement:This work was supported by the Youth Foundation of Education Department in Sichuan (Grant No.2017JQ0030).

References

Aribal,E.;Asadov,R.;Ramazan,A.;Kaya,H.(2016):Multiparametric breast MRI with 3T:effectivity of combination of contrast enhanced MRI,DWI and 1H single voxel spectroscopy in differentiation of breast tumors.European Journal of Radiology,vol.85,no.5,pp.979-986.

Banerjee,J.;Jawahar,C.V.(2008):Super-resolution of text images using edge-directed tangent field.Proceedings of the 8th IAPR International Workshop on Document Analysis Systems,pp.76-83.

Chang,L.C.;El-Araby,E.;Dang,V.Q.;Dao,L.H.(2014):GPU acceleration of nonlinear diffusion tensor estimation using CUDA and MPI.Neurocomputing,vol.135,pp.328-338.

Ciccarone,A.;Gulino,P.;Esposito,M.;Defilippi,C.(2016):Fibrosing liver diseases in paediatric age:MRI investigation by diagnostic maps of slow diffusion and fast diffusion generated from a multiple b values DWI.Physical Medical,vol.32,no.1,pp.126-134.

Coupé,P.;Manjón,J.V.;Chamberland,M.;Descoteaux,M.;Hiba,B.(2013):Collaborative patch-based super-resolution for diffusion-weighted images.NeuroImage,vol.83,pp.245-261.

Eklund,A.;Dufort,P.;Forsberg,D.;LaConte,S.M.(2013):Medical image processing on the GPU-past,present and future.Medical Image Analysis,vol.17,pp.1073-1094.

Hasan,K.M.;Walimuni,I.S.;Abid,H.;Klaus,R.(2011):A review of diffusion tensor magnetic resonance imaging computational methods and software tools.Computers in Biology and Medicine,vol.41,no.12,pp.1062-1072.

Hejduk,B.;Jedrzejewska,M.;Billewicz-Bobek,B.;Rutkowski,T.;Skladowski,K.(2011):DWI ADC as a predictive factor of the response to radiochemotherapy in head and neck cancer.Radiotherapy and Oncology,vol.98,no.1,pp.45-52.

Johansen-Berg,H.;Behrens,T.(2013):Diffusion MRI.Academic Press,USA.

Manjón,J.V.;Coupé,P.;Buades,A.;Collins,D.L.;Robles,M.(2012):New methods for MRI denoising based on sparseness and self-similarity.Medical Image Analysis,vol.16,no.1,pp.18-27.

Manjón,J.V.;Coupé,P.;Buades,A.;Fonov,V.;Collins,D.L.et al.(2010):Non-local MRI upsampling.Medical Image Analysis,vol.14,no.6,pp.784-792.

Minosse,S.;Marzi,S.;Piludu,F.;Vidiri,A.(2017):Correlation study between DKI and conventional DWI in brain and head and neck tumors.Magnetic Resonance Imaging,vol.42,pp.114-122.

Mori,S.;van Zijl,P.C.M.(2002):Fiber tracking:principles and strategies-a technical review.NMR in Biomedicine,vol.15,no.7,pp.468-480.

Pratx,G.;Xing,L.(2011):GPU computing in medical physics:a review.Medical Analysis,vol.38,no.5,pp.2685-2697.

Raffelt,D.;Tournier,J.D.;Rose,S.;Ridgway,G.R.;Henderson,R.et al.(2012):Apparent fibre density:a novel measure for the analysis of diffusion-weighted magnetic resonance images.NeuroImage,vol.59,no.4,pp.3976-3994.

Roy,S.;Carass,A.;Prince,J.L.(2013):Magnetic resonance image example-based contrast synthesis.IEEE Transactions on Medical Imaging,vol.32,no.12,pp.2348-2363.

Smith,S.M.;Jenkinson,M.;Woolrich,M.W.;Beckmann,C.F.;Behrens,T.E.et al.(2004):Advances in functional and structural MR image analysis and implementation as FSL.NeuroImage,vol.23,no.1,pp.208-219.

Trinh,D.H.;Luong,M.;Dibos,F.;Rocchisani,J.M.;Pham,C.D.et al.(2014):Novel example-based method for super-resolution and denoising of medical images.IEEE Transaction on Image Process,vol.23,no.4,pp.1882-1895.

Vemulapalli,R.;Nguyen,H.V.;Zhou,S.K.(2015):Unsupervised cross-modal synthesis of subject-specific scans.Proceedings of the IEEE International Conference on Computer Vision,pp.630-638.

Wang,Y.H.;Qiao,J.;Li,J.B.;Fu,P.;Chu,S.et al.(2014):Sparse representation-based MRI super-resolution reconstruction.Measurement,vol.47,pp.946-953.

Wang,Z.;Bovik,A.C.;Sheikh,H.R.;Simoncelli,E.P.(2004):Image quality assessment:from error visibility to structural similarity.IEEE Transaction on Image Process,vol.23,no.4,pp.600-612.

Yao,K.;Troupis,J.M.(2016):Diffusion-weighted imaging and the skeletal system:a literature review.Clinical Radiology,vol.71,no.11,pp.1071-1082.

Yue,L.;Shen,H.;Li,J.;Yuan,Q.;Zhang,H.et al.(2016):Image super-resolution:the techniques,applications,and future.Signal Processing,vol.128,pp.389-408.

Zeineh,M.M.;Holdsworth,S.;Skare,S.;Atlas,S.W.;Bammer,R.(2012):Ultra-high resolution diffusion tensor imaging of the microscopic pathways of the medial temporal lobe.NeuroImage,vol.62,no.3,pp.2065-2082.

Zhou,Q.L.;Qiu,Y.B.;Li,L.;Lu,J.F.;Yuan,W.Q.et al.(2018):Steganography using reversible texture synthesis based on seeded region growing and LSB.Computers,Materials ＆ Continua,vol.55,no.1,pp.151-163.

Computers Materials&Continua

2019年12期