Saturday 8 January 2011

Performance comparison of Java2D image operations

Abstract
The motivation for this blog item is to capture notes during some spikes that I made while figuring out which approach to manipulating photographs performs the best in Java2D. For the purposes of this article I am reducing the brightness of a picture as a test case, a simple operation that merely requires the colour values of each pixel to be divided by two. More complicated algorithms can be built up upon the back of this article once we have an understanding of which approach to reading and writing pixels performs the best.

The Test Setup

The test picture

Picture of Tipsy
A 11.7Mb png file, measuring 3296 pixels in width and 2472 pixels in height.

The test machines

Mac Airbook
Intel Dualcore 2.1Ghz 2Gb ramNv8d8an GeForce 9400M
Win Desktop
AMD DualCore 2.6Ghz 2Gb ram Nvidia Quadro FX1500
Both machines are running Java 1.6.

The Code Spikes

Eight different approaches to accessing pixels in Java2D will be described below, with the results of running them on two different machines provided; One a windows based machine and the other OSX. Both running Java 1.6. The JVM will also have an effect on the performance of each of these algorithms, to keep this article focused I will keep all of the JVM options set to default.

Approach 1

Read a single pixel in as a series of integers from WritableRaster.getPixel, and write them out using setPixel. An instance of WritableRaster can be retrieved from BufferedImage.getRaster().
final WritableRaster inRaster  = inImage.getRaster();
final WritableRaster outRaster = outImage.getRaster();
 
int[] pixel = new int[3];
for ( int x=0; x<imagewidth ; x++ ) {
    for ( int y=0; y<imageHeight; y++ ) {
        pixel = inRaster.getPixel( x, y, pixel );
 
        pixel[0] = pixel[0]/2;
        pixel[1] = pixel[1]/2;
        pixel[2] = pixel[2]/2;
 
        outRaster.setPixel( x, y, pixel );
    }
}
 
MacOSX (airbook)
718 ms
Windows XP
985 ms

Approach 2

Read each pixel in as an encoded integer from BufferedImage.getRGB(x,y), and write them out using setRGB.


for ( int x=0; x<imagewidth ; x++ ) {
    for ( int y=0; y<imageHeight; y++ ) {
        int rgb = inImage.getRGB( x, y );
 
        int alpha = ((rgb >> 24) & 0xff);
        int red = ((rgb >> 16) & 0xff);
        int green = ((rgb >> 8) & 0xff);
        int blue = ((rgb ) & 0xff);
 
        int rgb2 = (alpha < < 24) | ((red/2) << 16) | ((green/2) << 8) | (blue/2); 
        outImage.setRGB(x, y, rgb2);
    }
}

MacOSX (airbook)
1495 ms
Windows XP
2219 ms
Using getRGB is twice as slow as getPixel, and provides no particular benefits. Lets avoid this approach and see if we can optimise getPixel any.

Approach 3


As Approach 1 however rather than reading a pixel as a set of three integers, it reads them in as three floats.

final WritableRaster inRaster   = inImage.getRaster();
final WritableRaster outRaster = outImage.getRaster();

final WritableRaster inRaster   = inImage.getRaster();
final WritableRaster outRaster = outImage.getRaster();
 
float[] pixel = new float[3];                 
for ( int x=0; x<imagewidth ; x++ ) {          
    for ( int y=0; y<imageHeight; y++ ) {     
        pixel = inRaster.getPixel( x, y, pixel );   
 
        pixel[0] = pixel[0]/2;                
        pixel[1] = pixel[1]/2;                
        pixel[2] = pixel[2]/2;                
 
        outRaster.setPixel( x, y, pixel );          
    }                                         
}
MacOSX (airbook)
901 ms
Windows XP
1203 ms
A little slower than the reading the values per component. This surprised me, either the noise of the test has hidden the improvement or there is some overhead here that is not good. Further trials are needed to differentiate these two possibilities. But before we explore this further lets try the approach provided for by Java2D, the ConvolutionOp.

Approach 4

Swing provides a class specifically designed for convolution operations, such as reducing the brightness of a picture. Convolution is the term given to a group of image processing algorithms that average a group of pixels together to create a new value for a single pixel. The following code uses a 1x1 matrix to take an average of only the pixel that will be replaced.
float[] DARKEN = {1.0f};
 
Kernel kernel = new Kernel(1, 1, DARKEN);
ConvolveOp cop = new ConvolveOp(kernel,ConvolveOp.EDGE_NO_OP, null);


cop.filter(inImage, outImage); 
MacOSX (airbook)
184 ms
Windows XP
172 ms
The convolution implementation provided by Swing was significantly faster than the per pixel baseline that was tried first. This was only to be expected given that the Sun Engineers would have spent time tuning the code for precisely this type of use.

Approach 5

To see if the per pixel approach can be improved on I took the faster of the two approaches tried which used integers and used the getPixels method that is capable of reading multiple pixels at a time. This spike will show whether there is much overhead in accessing the pixels one at a time via getPixel verses a bulk fetch and set of multiple pixels. The batch size has been set to match the width of the picture.
final WritableRaster inRaster   = inImage.getRaster();
final WritableRaster outRaster = outImage.getRaster();
 
int[] pixels = new int[3*imageWidth];                                                                                             
for ( int y=0; y<imageheight ; y++ ) {                            
    pixels = inRaster.getPixels( 0, y, imageWidth, 1, pixels );        
 
    for ( int x=0; x<imageWidth; x++ ) {                         
        int m = x*3;                                             
        pixels[m+0] = pixels[m+0]/2;                             
        pixels[m+1] = pixels[m+1]/2;                             
        pixels[m+2] = pixels[m+2]/2;                             
    }                                                            
 
    outRaster.setPixels( 0, y, imageWidth, 1, pixels );                
}
MacOSX (airbook)
534 ms
Windows XP
578 ms

Approach 6

Reading an entire row of pixels in at a time was faster than accessing a single pixel at a time. However it is still not approaching the performance of the ConvolutionOp. Perhaps fetching two rows at a time will be faster still?
final WritableRaster inRaster   = inImage.getRaster();
final WritableRaster outRaster = outImage.getRaster();

final WritableRaster inRaster   = inImage.getRaster();
final WritableRaster outRaster = outImage.getRaster();
 
int[] pixels = new int[3*imageWidth*2];                                                                             
for ( int y=0; y<imageheight ; y+=2 ) {                    
    pixels = inRaster.getPixels( 0, y, imageWidth, 2, pixels ); 
 
    for ( int x=0; x<imageWidth; x++ ) {                  
        int m = x*3;                                      
        pixels[m+0] = pixels[m+0]/2;                      
        pixels[m+1] = pixels[m+1]/2;                      
        pixels[m+2] = pixels[m+2]/2;                      
 
        int n = m+imageWidth*3;                           
        pixels[n+0] = pixels[n+0]/2;                      
        pixels[n+1] = pixels[n+1]/2;                      
        pixels[n+2] = pixels[n+2]/2;                      
    }                                                     
 
    outRaster.setPixels( 0, y, imageWidth, 2, pixels );         
}
MacOSX (airbook)
429 ms
Windows XP
453 ms

Approach 7

Reading in two rows of pixels at a time was for the most part faster than reading a row at a time. As with all of the timings taken Java varies greatly each time, so out of curiosity I wanted to know how much slower processing half a row at a time would be.
final WritableRaster inRaster   = inImage.getRaster();
final WritableRaster outRaster = outImage.getRaster();
 
int    halfWidth = imageWidth/2;                                    
int[] pixels       = new int[3*halfWidth];                             
 
for ( int y=0; y<imageheight ; y++ ) {                            
    pixels = inRaster.getPixels( 0, y, halfWidth, 1, pixels );         
 
    for ( int x=0; x<halfWidth; x++ ) {                          
        int m = x*3;                                             
        pixels[m+0] = pixels[m+0]/2;                             
        pixels[m+1] = pixels[m+1]/2;                             
        pixels[m+2] = pixels[m+2]/2;                             
    }                                                            
 
    outRaster.setPixels( 0, y, halfWidth, 1, pixels );                 
 
    pixels = inRaster.getPixels( halfWidth, y, halfWidth, 1, pixels ); 
 
    for ( int x=0; x<halfWidth; x++ ) {                          
        int m = x*3;                                             
        pixels[m+0] = pixels[m+0]/2;                             
        pixels[m+1] = pixels[m+1]/2;                             
        pixels[m+2] = pixels[m+2]/2;                             
    }                                                            
 
    outRaster.setPixels( halfWidth, y, halfWidth, 1, pixels );         
}
MacOSX (airbook)
418 ms
Windows XP
453 ms
This time the spike was faster than processing two rows of pixels at a time, so it would appear that two rows is faster than one row at a time and half a row is faster still. Huh? What is going on here? It would appear that the noise in the timing of the operations is greater than the performance improvement seen by varying the number of pixels processed at a time. Clearly reading multiple is preferable to one at a time, but after that it is not competing significantly with the ConvolveOp which is still King.

Approach 8

Investigating the getPixel methods was interesting but if we are to see a significant improvement in performance a totally different approach is going to be needed. For the last spike I tried to bypass as many of Java2Ds layers as possible and access the picture data directly via the DataBuffer class. It is still a long way from the hardware, as is common in Java however it will give us an idea how much overhead the Java2d classes BufferedImage and WritableRaster add.
// For this approach to work it is important that both DataBuffers use the same picture encoding under the hood as each other,
// otherwise the picture will corrupt
final WritableRaster outRaster = inImage.getRaster().createCompatibleWritableRaster(imageWidth, imageHeight);
 
DataBuffer in   =  inRaster.getDataBuffer();                                                 
DataBuffer out = outRaster.getDataBuffer();                                                
 
 
int size = in.getSize();                                                                   
for ( int i=0; i<size ; i++ ) {                                                             
    out.setElem( 0, i, in.getElem(0, i)/2 );                                               
}                                                                                          
 
BufferedImage outImage = new BufferedImage(inImage.getColorModel(), outRaster, true, null);

MacOSX (airbook)
71 ms
Windows XP
63 ms

Conclusions

Comparing performance of different image processing approaches in Java has been a challenge. Java is unable to give anything close to constant time for processing the same image, during the coarse of running the above code fragments I saw variations ranging from 700ms to 7000ms. To help smooth the results I placed a call to System.gc() between each spike and reran the tests many times to bed the system in and to take an average ignoring any really wild values. This behaviour makes Java very unsuitable for any type of real time image processing applications. The variance of Java's performance being said, there are some very clear trends in the results. Specifically if you are implementing a convolution algorithm and performance is not your key concern then the ConvolveOp is excellent, however if performance is vital then with some extra effort in understanding the encoding of the DataBuffer used by your image then you can get a 2-5x performance boost over ConvolveOp by accessing the DataBuffer directly. If the algorithm that you are working with does not boil down to a convolution then you will do okay to read in a line of pixels at a time using getPixels (avoid getRGB) however the clear winner was the DataBuffer.

After thoughts

I wrote this article out of curiosity about Java's image processing capabilities, it is widely recognised that there are better languages for the job. I have found it possible to do reasonable image processing effects for Java applications in pure Java but I would not consider it for anything that needed real time response times. Staying in the Java realm for the moment it would be interesting to also compare SWT with the approaches tested in this article, as well as using JNDI access to hardware such as graphics cards. However at this point of accessing hardware it removes the main reason why I considered Java; platform independence. Perhaps I will try C# or good ol' C next.

Appendix

No comments:

Post a Comment