Monday, 31 January 2011

Advice on learning Spring

I was recently asked for advice on how to go about learning Spring, specifically materials that are in addition to the main Spring documentation. This got me thinking as to why some people pick up Spring in a matter of hours while others have to work at it. As with most material there are two parts to Spring, firstly the nuts and bolts of the framework itself and secondly the patterns and practices behind it. This split is so common in computing (and beyond) that it formed the basis of which University that I went to, I found a long time ago that focusing on the patterns, practices and theory gave me skills that transferred easily between different technologies. Spring is no different.

Given this line of thought I broke my reply into two parts, for the nut and bolts of Spring read the first chapter on the Spring documentation 'Core Technologies - The IoC container'. I have not found anything better than this. However do not expect to fully 'get' it on the first pass, or depending on your experiences even the third or forth. Despite Spring's reputed simplicity, it is obscured behind both its own size and as hinted previously 'patterns, practices and theory'. No real prizes for guessing that the main practice behind Spring is Dependency Injection (DI), and Inversion of Control (IOC); they are both referred to frequently within the Spring documentation and dependency injection particularly is a technique that dates back long before Spring and its ilk.

So how does one really learn Dependency Injection and Inversion of Control? The academically inclined may go back to studying Object Orientated (OO) techniques, such as abstraction, encapsulation, polymorphism and things like object cohesion and coupling. The more practically minded person will benefit from doing, and for this I strongly recommend Test Driven Design (TDD), and mocking. Every test that you write following TDD principles is practicing the core design skills needed for efficient and effective use of Spring. When you have mastered the ability to break a design cleanly along its lines of abstraction, then the tests will become focused, simple and robust. At this point the objects being tested will slide into Spring or any other IOC container very easily.

These skills are easy to describe, but difficult to master. Don't be scared to try once, then throw it away and to redo. Most people are scared of the time that this represents, especially if they are working on a project with tight deadlines. However I prefer to view it as an investment, and I make sure that I build time into any estimates that I provide. One can never expect to become quick at this if they always stick to their first design. A form of deadlock happens when we stick to our first designs, we never experiment to find better solutions and eventually we enter a vicious cycle of fire fighting spaghetti code. Once that happens a project will become paralysed. So do yourself a favour, come up with three designs to every problem; and then go with the forth. Lastly I recommend that when you have had a few goes at this, find somebody who is more practiced than yourself and you respect, ask them for feedback. Rome was not built by one man alone.

Other material

Logging best practices

This post is a brain dump of the practices that have proved useful to me over time while maintaining production Java systems. Please share your experiences, especially if you have come across any great time savers not already mentioned here.

The Practices

The scale of the system that you are working on does affect how valuable each of the individual practices below would be to you, some of the practices that I mention were introduced due to experiences working with server farms in excess of 150 JVMs running in a very 'chatty' SOA architecture and so those practices may not be as valuable in, say a single process Swing application. However I have come to the conclusion that once these practices are wrapped into a reusable set of classes and bedded into the mind set of the developers then they cost next to nothing to use; and so I recommend putting in place at least a placeholder for each practice at the start of a project as retrofitting large code basis is often more expensive than putting in a good placeholder early.

Easy log file access

The more effort and time involved in gaining access to a systems log file(s), then the natural consequence is that the log file will be used less frequently. I have worked in environments where developers had to request a copy of the production log files in order to begin investigation into a problem. Such requests would take between 1 hour and 3 days to be actioned. This barrier to entry blocks all pre-emptive problem solving from happening, as well as gaining timely feedback from experiments used to reproduce and track down a problem.

My ideal is to have real time search access to all production logs, tools suck as Splunk make this very straight forward. Especially as it is capable of combining logs from multiple servers. If a licensed solution is not viable then roll up your sleeves and write a few scripts to achieve the same results.

Self contained messages

Each individual message needs to be complete in its own right. That is, a call to the LOG class should be one per key event being reported on. All of the
information for that event must be contained in that call, such as stack traces and values of key data that describe the uniqueness of the event that has just occurred. Do not place stack traces in a separate call to the log than the message describing its context.

If the information or data is spread across multiple calls to the LOG class then it becomes likely
in a multithreaded environment that the messages will get interlaced making them more difficult to decipher. Naturally this interlacing does not occur often during development cycle as a developer working by him/her self tends to only send one request into the system at a time.

Logs telling the Story of what happened

Context is King

A single line in the log file describes a single event, a series of events tells a story, and in order to understand the meaning of the story it needs to flow; the relationships between each event needs to be clear and concise.

To achieve this with Log4J I use the MDC class to push context information that is important to the story being described, such as: the name of the user who made the request, a unique request id that was generated when the request started, the thread id at time of the message being generated, and the security principle that the request was running as at that time. In environments consisting of multiple co-operating servers you may also find the following values useful in the MDC: the name of the SOA service that generated the logged event, the machine name where the logged event was generated. Which pieces of data you place on the MDC will depend on your context, remember to make sure that each item pays for itself. It is not a free journey, and we don't want to give away any free tickets.

Context across machine boundaries

In multi machine environments being able to tie the story together becomes harder, and yet all the more important. Consider passing some of the values that you you push onto he MDC between machine boundaries as part of the remote calls, I make it a matter of point to always pass the user name and request id across machine boundaries. This means that if an error is reported to the user, they can then be given an incident number to report to the support staff. This incident number is the request id mentioned previously and will allow a developer to reconstruct the full story of what the user was doing as the error occurred from the log files.

Zero tolerance to errors

On projects that I have worked on that have had no compile time warnings, it has always been easy to spot when one has introduced a warning and not wanting to be the one to be caught 'peeing in the proverbial pool' I tend to fix new warnings fairly promptly. If on the other hand there are lots of warnings already in place then my brain shuts off and I no longer notice them, I won't carry on the previous comparison but I do find that after a build has more than a few warnings they quickly start to multiply with out even being noticed. The same phenomena happens with errors in production logs, it is very easy to spot and respond to them when they are rare however when they have become frequent then they become the norm and in my experience it is a significant sign that the project is struggling with a vicious circle that needs to be broken. If you are lucky to be working on a green field project, then it is best to start this practice early.

Logging as part of QA

The log files are part of the interface to a system, think of it as an API in its own right. They may not be used by the end customer, but in circumstances when they are used to support the end customer then they do have an audience. To ensure quality of any interface to a system it must be both tested and used frequently. Failure to perform these checks results in the rapid build up of dust, decay and entropy. No body enjoys house cleaning, so automating these tests can greatly reduce the burden. The usual suspects apply here, unit testing, mocking of the Log interface and TDD.

Logging levels

Tools such as log4j support multiple coarse grains of categorising the messages logged. It makes a significant difference to the quality of the log files if developers agree when to use each of the common log levels early on in the project, and adhere to them. This consistency helps to reduce spam in the log files and helps turn off the noise when going into production. Personally I start off using DEBUG for all of my messages and then increase its level when I can describe the benefit for doing so. The following table captures the type of explanations that I look for with each logging level. This table is intended to help get thoughts flowing, and is by no means a summary of the only rules that a team could use.

Level	Usage	Expected action	Example
FATAL	Reporting problems that until fixed will cost the company money and will carry on costing the company money until resolved. Once a problem has been reported do not immediately repeat the same message.	Fix ASAP. Even if that means waking people up in the middle of the night. As such false alarms and frequent alerts will not be taken to kindly.	The database is down and all user requests are being rejected.
ERROR	A problem has occurred that was not automatically recovered, but it is not critical to the revenue stream of the company.	In low to medium volumes ERROR messages will trigger investigation in a timely manner. High volumes may be escalated to FATAL. NB: A user entering 'foo' into a number only field should never be reported as an error at this level, it should be recovered automatically at the GUI level.	Updating a users details failed due to an unexpected database error.
WARN	A problem has occurred that was both unexpected and automatically recovered.	Actively monitor with the goal of pre-empting problems that could escalate. Frequent reoccurring problems will need to be fed back to development for resolution.	Connection to the exchange rate server has gone down but it is not causing an immediate problem due to caches or disk space is down to the last 10% etc.
INFO	Report key system state changes and business level events.	Used to answer infrequent queries about the behaviour of the system and its users.	Jim deposited 53 pounds sterling.
DEBUG	Detailed information about each request.	Used to investigate the causes of a problem. Will usually be disabled in production by default as it will output a lot of contextual information.	Value of field x is 92.4, or 'starting process Y' and 'finished process Y in 95ms'

Minimising Logging Overhead

Logging is not free. It costs the developer time to add it to a system, it costs the system time to execute the log statements and it costs the poor soul who has to investigate/support the system time to understand the logs. As a consequence of this think twice before logging a new event, is it really needed. If it is then how expensive is it to produce the event and how often will it appear in the logs?

don't log unless the event adds to the story being told, what you don't know what the story is? go back to context is king and do not collect two hundred pounds
if the event is very common, bulk them together; don't spam
if the message to be logged is a constant string use LOG.level("msg")
if the message requires much string comparisons and runtime performance is important us isXXXEnabled
to improve readability and performance I sometimes use an interface to reduce the number of ifs; also useful if the logs are to be translatable

Machine parsable logs

To make processing logs easier write to the logs in a consistent format. When embedding data into the logs provide consistent names for the data etc.

For example, one approach that I have worked with is to output log messages like this:

[user=Jim] has logged on.

Runtime adjustable levels

As discussed under 'Minimising logging overheads' we want to avoid spamming the log files. However when diagnosing a problem we need data, when reproducing problems we sometimes need a lot of data. For this reason it is very useful to have a mechanism that can increase the log levels on a machine, or group of machines without having to reboot the machine. The most flexible approach is to be able to do this per user; remember to not give untrusted access to such a mechanism.

Archive the logs

You do not want to run out of disk space in production. Monitor the amount of data on the disk drives, and archive off logs for future reference.

Useful Tools

Log4J
Custom shell scripts written in grep/sed/awk, perl, ruby, etc
Splunk

Saturday, 8 January 2011

Performance comparison of Java2D image operations

Abstract

The motivation for this blog item is to capture notes during some spikes that I made while figuring out which approach to manipulating photographs performs the best in Java2D. For the purposes of this article I am reducing the brightness of a picture as a test case, a simple operation that merely requires the colour values of each pixel to be divided by two. More complicated algorithms can be built up upon the back of this article once we have an understanding of which approach to reading and writing pixels performs the best.

The Test Setup

The test picture

A 11.7Mb png file, measuring 3296 pixels in width and 2472 pixels in height.

The test machines

Mac Airbook: Intel Dualcore 2.1Ghz 2Gb ramNv8d8an GeForce 9400M
Win Desktop: AMD DualCore 2.6Ghz 2Gb ram Nvidia Quadro FX1500

MacOSX (airbook): 718 ms
Windows XP: 985 ms

Approach 2

Read each pixel in as an encoded integer from BufferedImage.getRGB(x,y), and write them out using setRGB.

for ( int x=0; x<imagewidth ; x++ ) {
    for ( int y=0; y<imageHeight; y++ ) {
        int rgb = inImage.getRGB( x, y );
 
        int alpha = ((rgb >> 24) & 0xff);
        int red = ((rgb >> 16) & 0xff);
        int green = ((rgb >> 8) & 0xff);
        int blue = ((rgb ) & 0xff);
 
        int rgb2 = (alpha < < 24) | ((red/2) << 16) | ((green/2) << 8) | (blue/2); 
        outImage.setRGB(x, y, rgb2);
    }
}

MacOSX (airbook): 1495 ms
Windows XP: 2219 ms

Using getRGB is twice as slow as getPixel, and provides no particular benefits. Lets avoid this approach and see if we can optimise getPixel any.

Approach 3

As Approach 1 however rather than reading a pixel as a set of three integers, it reads them in as three floats.

final WritableRaster inRaster   = inImage.getRaster();
final WritableRaster outRaster = outImage.getRaster();

final WritableRaster inRaster   = inImage.getRaster();
final WritableRaster outRaster = outImage.getRaster();
 
float[] pixel = new float[3];                 
for ( int x=0; x<imagewidth ; x++ ) {          
    for ( int y=0; y<imageHeight; y++ ) {     
        pixel = inRaster.getPixel( x, y, pixel );   
 
        pixel[0] = pixel[0]/2;                
        pixel[1] = pixel[1]/2;                
        pixel[2] = pixel[2]/2;                
 
        outRaster.setPixel( x, y, pixel );          
    }                                         
}
MacOSX (airbook)
901 ms
Windows XP
1203 ms
 
A little slower than the reading the values per component. This surprised me, either the noise of the test has hidden the improvement or there is some overhead here that is not good. Further trials are needed to differentiate these two possibilities. But before we explore this further lets try the approach provided for by Java2D, the ConvolutionOp.  Approach 4
Swing provides a class specifically designed for convolution operations, such as reducing the brightness of a picture. Convolution is the term given to a group of image processing algorithms that average a group of pixels together to create a new value for a single pixel.    The following code uses a 1x1 matrix to take an average of only the pixel that will be replaced.    
float[] DARKEN = {1.0f};
 
Kernel kernel = new Kernel(1, 1, DARKEN);
ConvolveOp cop = new ConvolveOp(kernel,ConvolveOp.EDGE_NO_OP, null);


cop.filter(inImage, outImage); 
MacOSX (airbook)
184 ms
Windows XP
172 ms
 
The convolution implementation provided by Swing was significantly faster than the per pixel baseline that was tried first. This was only to be expected given that the Sun Engineers would have spent time tuning the code for precisely this type of use.  Approach 5
To see if the per pixel approach can be improved on I took the faster of the two approaches tried which used integers and used the getPixels method that is capable of reading multiple pixels at a time.    This spike will show whether there is much overhead in accessing the pixels one at a time via getPixel verses a bulk fetch and set of multiple pixels. The batch size has been set to match the width of the picture.      final WritableRaster inRaster   = inImage.getRaster();
final WritableRaster outRaster = outImage.getRaster();
 
int[] pixels = new int[3*imageWidth];                                                                                             
for ( int y=0; y<imageheight ; y++ ) {                            
    pixels = inRaster.getPixels( 0, y, imageWidth, 1, pixels );        
 
    for ( int x=0; x<imageWidth; x++ ) {                         
        int m = x*3;                                             
        pixels[m+0] = pixels[m+0]/2;                             
        pixels[m+1] = pixels[m+1]/2;                             
        pixels[m+2] = pixels[m+2]/2;                             
    }                                                            
 
    outRaster.setPixels( 0, y, imageWidth, 1, pixels );                
}
MacOSX (airbook)
534 ms
Windows XP
578 ms
 
Approach 6
Reading an entire row of pixels in at a time was faster than accessing a single pixel at a time. However it is still not approaching the performance of the ConvolutionOp. Perhaps fetching two rows at a time will be faster still?  final WritableRaster inRaster   = inImage.getRaster();
final WritableRaster outRaster = outImage.getRaster();

final WritableRaster inRaster   = inImage.getRaster();
final WritableRaster outRaster = outImage.getRaster();
 
int[] pixels = new int[3*imageWidth*2];                                                                             
for ( int y=0; y<imageheight ; y+=2 ) {                    
    pixels = inRaster.getPixels( 0, y, imageWidth, 2, pixels ); 
 
    for ( int x=0; x<imageWidth; x++ ) {                  
        int m = x*3;                                      
        pixels[m+0] = pixels[m+0]/2;                      
        pixels[m+1] = pixels[m+1]/2;                      
        pixels[m+2] = pixels[m+2]/2;                      
 
        int n = m+imageWidth*3;                           
        pixels[n+0] = pixels[n+0]/2;                      
        pixels[n+1] = pixels[n+1]/2;                      
        pixels[n+2] = pixels[n+2]/2;                      
    }                                                     
 
    outRaster.setPixels( 0, y, imageWidth, 2, pixels );         
}
MacOSX (airbook)
429 ms
Windows XP
453 ms
 
Approach 7
Reading in two rows of pixels at a time was for the most part faster than reading a row at a time. As with all of the timings taken Java varies greatly each time, so out of curiosity I wanted to know how much slower processing half a row at a time would be.      final WritableRaster inRaster   = inImage.getRaster();
final WritableRaster outRaster = outImage.getRaster();
 
int    halfWidth = imageWidth/2;                                    
int[] pixels       = new int[3*halfWidth];                             
 
for ( int y=0; y<imageheight ; y++ ) {                            
    pixels = inRaster.getPixels( 0, y, halfWidth, 1, pixels );         
 
    for ( int x=0; x<halfWidth; x++ ) {                          
        int m = x*3;                                             
        pixels[m+0] = pixels[m+0]/2;                             
        pixels[m+1] = pixels[m+1]/2;                             
        pixels[m+2] = pixels[m+2]/2;                             
    }                                                            
 
    outRaster.setPixels( 0, y, halfWidth, 1, pixels );                 
 
    pixels = inRaster.getPixels( halfWidth, y, halfWidth, 1, pixels ); 
 
    for ( int x=0; x<halfWidth; x++ ) {                          
        int m = x*3;                                             
        pixels[m+0] = pixels[m+0]/2;                             
        pixels[m+1] = pixels[m+1]/2;                             
        pixels[m+2] = pixels[m+2]/2;                             
    }                                                            
 
    outRaster.setPixels( halfWidth, y, halfWidth, 1, pixels );         
}
MacOSX (airbook)
418 ms
Windows XP
453 ms
 
This time the spike was faster than processing two rows of pixels at a time, so it would appear that two rows is faster than one row at a time and half a row is faster still. Huh? What is going on here? It would appear that the noise in the timing of the operations is greater than the performance improvement seen by  varying the number of pixels processed at a time. Clearly reading multiple is preferable to one at a time, but after that it is not competing significantly with the ConvolveOp which is still King.  Approach 8
Investigating the getPixel methods was interesting but if we are to see a significant improvement in performance a totally different approach is going to be needed. For the last spike I tried to bypass as many of Java2Ds layers as possible and access the picture data directly via the DataBuffer class. It is still a long way from the hardware, as is common in Java however it will give us an idea how much overhead the Java2d classes BufferedImage and WritableRaster add.        // For this approach to work it is important that both DataBuffers use the same picture encoding under the hood as each other,
// otherwise the picture will corrupt
final WritableRaster outRaster = inImage.getRaster().createCompatibleWritableRaster(imageWidth, imageHeight);
 
DataBuffer in   =  inRaster.getDataBuffer();                                                 
DataBuffer out = outRaster.getDataBuffer();                                                
 
 
int size = in.getSize();                                                                   
for ( int i=0; i<size ; i++ ) {                                                             
    out.setElem( 0, i, in.getElem(0, i)/2 );                                               
}                                                                                          
 
BufferedImage outImage = new BufferedImage(inImage.getColorModel(), outRaster, true, null);

MacOSX (airbook)
71 ms
Windows XP
63 ms
 
Conclusions
Comparing performance of different image processing approaches in Java has been a challenge. Java is unable to give anything close to constant time for processing the same image, during the coarse of running the above code fragments I saw variations ranging from 700ms to 7000ms. To help smooth the results I placed a call to System.gc() between each spike and reran the tests many times to bed the system in and to take an average ignoring any really wild values. This behaviour makes Java very unsuitable for any type of real time image processing applications.    The variance of Java's performance being said, there are some very clear trends in the results. Specifically if you are implementing a convolution algorithm and performance is not your key concern then the ConvolveOp is excellent, however if performance is vital then with some extra effort in understanding the encoding of the DataBuffer used by your image then you can get a 2-5x performance boost over ConvolveOp by accessing the DataBuffer directly. If the algorithm that you are working with does not boil down to a convolution then you will do okay to read in a line of pixels at a time using getPixels (avoid getRGB) however the clear winner was the DataBuffer.  After thoughts
I wrote this article out of curiosity about Java's image processing capabilities, it is widely recognised that there are better languages for the job. I have found it possible to do reasonable image processing effects for Java applications in pure Java but I would not consider it for anything that needed real time response times. Staying in the Java realm for the moment it would be interesting to also compare SWT with the approaches tested in this article, as well as using JNDI access to hardware such as graphics cards. However at this point of accessing hardware it removes the main reason why I considered Java; platform independence. Perhaps I will try C# or good ol' C next.  Appendix
Wikipedia definition of Convolution
Compiled spikes jar file
Source code for all of the spikes

CK's Musings