The two-dimensional discrete wavelet transform (DWT) can be applied in the heart of many image-processing algorithms. Until recently, several studies have compared the performance of such transform on parallel architectures, for example, on graphics processing units (GPUs). All these studies however considered only separable calculation schedules.