JAW Speak

Jonathan Andrew Wolter

Archive for the ‘code’ Category

Improving developers enthusiasm for unit tests, using bubble charts

with 9 comments

Reading time: 4 – 6 minutes

The Idea
Visualize your team’s code to get people enthused and more responsible for unit testing. See how commits contribute to or degrade testability. See massive, untested checkins from some developers, and encourage them to change their behavior for frequent, well tested commits.

Several years ago I worked with Miško Hevery, who came up with an unreleased prototype to visualize commits on a bubble chart. The prototype didn’t make it as an open source project, and Misko’s been so busy I don’t know what happened to it. I was talking to him recently, and we think there are lots of good ideas in it that other people may want to use. We’d like to share these ideas, in hopes it interests someone in the open source community.
testability-bubbles-overview
This is a standard Flex Bubble Chart. This was a rich flash widget with filtering, sorting, and other UI controls.

  • Each bubble is a commit
  • The X axis is a timeline
  • The size of bubbles is the size of each checkin
  • The colors of bubbles are unique per developer
  • The Y axis represents the ratio of (test code / total code) changed in that checkin.

Size is measured in lines of code, an imperfect measure, but it proved useful for his internal use.

Interpreting It
We found that showing this to a few people generally got a big interest. All the developers wanted to see their bubbles. People would remember certain commits they have made, and relive or dispute with each other why certain ones had no test code committed.

testability-bubbles-big commits

Best of all, though, when used regularly, it can encourage better behavior, and expose bad behavior.

testability-bubbles-frequency

Using It
Developers often leave patterns. When reading code, I’ve often thought “Ahh, this code looks like so-and-so wrote it.” I look it up, and sure enough I’ve recognized their style. When you have many people checking in code, you often have subtly different styles. And some add technical debt to the overall codebase. This tool is a hard to dispute visual aid to encourage better style (developer testing).

Many retrospectives I lead start with a timeline onto which people place sticky notes of positive, negative, and puzzling events. This bubble chart can be projected on the wall, and milestones can be annotated. It is even more effective if you also annotating merges, refactorings, and when stories were completed.

If you add filtering per user story and/or cyclometric complexity, this can be a client friendly way to justifying to business people a big refactoring, why they need to pay down tech debt, or why a story was so costly.

While you must be careful with a tool like this, you can give it to managers for a mission control style view of what is going on. But beware of letting anyone fall into the trap of thinking people with more commits are doing more work. Frequent commits is probably a good sign, but with a tool like this, developers must continue to communicate so there are no misunderstandings or erroneous conclusions made by non-technical managers.

Some developers may claim a big code change with no test coverage was a refactoring, however even refactorings change tests in passing (Eclipse/Intellij takes care of the updates even if the developer is not applying scrutiny). Thus the real cause for the large commit with no tests deserves investigation.

Enhancements
Many other features existed in the real version, and a new implementation could add additional innovations to explore and communicate what is happening on your project.
testability-bubbles-enhancements

  • Filter by user
  • Filter by size or ratio to highlight the most suspicious commits
  • Filter by path in the commit tree
  • Showing each bubble as a pie chart of different file types, and each of their respective tests.
  • Display trend line per team or per area of code
  • Use complexity or test coverage metrics instead of lines of code
  • Add merge and refactoring commit visualizations
  • Color coding commits to stories, and adding sorting and filters to identify stories with high suspected tech debt.
  • Tie in bug fixes and trace them back to the original commits’ bubbles

We hope some of you also have created similar visualizations, or can add ideas to where you would use something like this. Thanks also to Paul Hammant for inspiration and suggestions in this article.

Written by Jonathan

July 16th, 2011 at 7:35 pm

Web Caching Design: Cache keys should be deterministic

without comments

Reading time: 2 – 2 minutes

Following up on my previous post on Session State, there are a few conceptual ways to think about caches that I want to cover.

Cached items can be placed in session. That may be the easiest, but soon it will expose limitations. For instance, session may be serialized as one big blob. If so, you won’t be able to have multiple concurrent threads populating into the same cache. Imagine when logging in, you want to have a background asynchronous thread lookup and populate data.

The key with cache is the keys that are cached need to be independent. A session may be okay to store as a serialized object graph. But cache keys could be primed in multiple synchronous threads, so a single graph could involve locking or overwriting parts of session. If cache entries are deep in a graph, you’re almost certain to have collisions and overwritten data.

IMHO, the most important thing is: Cache keys should be deterministic. For instance, if I want to cache all of a user’s future trips (and that is a slow call), I want to be able to look in the cache without first looking up the key to the cache in some other store. I want to say “Hey, given user 12345, look in a known place in the cache (such as “U12345:futureTrips”) to see if some other thread already populated the cache value.” This does mean you need to count more uniquely addressable cache locations in your application logic, but the extra accounting is well worth the flexibility it gives you. For instance, it allows background threads to populate that cache item, without overwriting anything else.

Written by Jonathan

December 6th, 2010 at 4:20 am

Posted in architecture, code

Spring Slow Autowiring by Type getBeanNamesForType fix 10x Speed Boost >3600ms to <100ms

with 8 comments

Reading time: < 1 minute

We’re using Spring MVC, configured mostly by annotations, a custom scope for FactoryBeans so they don’t get created once per request, and autowiring by type. This makes for simple code, where the configuration lives right along where it is configured, however after a certain number of autowired beans, performance was abysmal.

Loading the main front page in Firefox, with firebug showing slowness:
baseline-firebug

Click on to read the full post.
Read the rest of this entry »

Written by Jonathan

November 28th, 2010 at 4:25 am

Posted in code, java

Tagged with

Spring Bean Autowiring into Handler/Controller Method Arguments

with 2 comments

Reading time: 2 – 4 minutes

We request scope controllers in our Spring MVC web application, and then spring injects in collaborators. Each request builds up an object graph of mostly request scoped components, they service the request, and are then garbage collected. Examples of collaborators are Services, request-scoped objects like User, or objects from session. Declare these injectables as dependencies of your controller, and the IoC framework will resolve those instances for you. Using Spring’s FactoryBean, you can have custom logic around retrieving from a DB or service call. (Also, if you do request scoping for everything you will probably want to make your FactoryBeans caching, or create a custom scope, so Spring doesn’t recreate each FactoryBean, and make a remote call possibly, on every object injection.)

Instantiate a graph of request-scoped objects per request. Is that crazy? Not really, because garbage collection and object instantiation are very fast on modern JVM’s. I argue that it it helps you have cleaner and more maintainable code. Designs have better object orientation, tests don’t require as many mocks, and it’s easier to obey the single responsibility principle.

None of this is new, though. Here’s where it gets interesting for us. We have @RequestMapping methods on a controller, but only one of them needs the injected collaborator. If it is slow to retrieve the collaborator (such as a CustomerPreferences we get from a remote call), we don’t want to call it every time. Sometimes this means you need two controllers, other times you want to let any spring bean be injected into a handler method.

We extended Spring to inject any spring bean into a controller/handler’s @RequestMapping or @ModelAttribute annotated methods. You benefit with only injecting the bean into the handler method that needs it, possibly preventing remote calls to lookup said dependency if it were a field on a controller with multiple @RequestMapping’s.

Here’s a sample controller’s handler method. Before:

	@RequestMapping(value = Paths.ACCOUNT, method = GET)
	public String showAccount(Map model) {
	      model.put("prefs", customerService.getCustomerPrefs(session.getCustomerId()));
	      return "account";
	}

There is an extension point WebAttributeResolver for you to add your own logic to resolve method parameters (such as autowire them as spring resolvable beans). After:

	@RequestMapping(value = Paths.ACCOUNT, method = GET)
	public String showAccount(Map model,
	          @AutowiredHandler CustomerPreferences customerPreferences) {
	      // the CustomerPreferences has a CustomerPreferencesFactoryBean which
	      // knows how to make a remote service call and retrieve the prefs.
	      model.put("prefs", customerPreferences)); // easier to test
	      return "account";
	}

This lets you inject any spring bean into fields/methods annotated with @ModelAttribute or @RequestMapping.

Another very helpful way to do this is to automatically inject User or other request-specific objects.

References for further discussion/examples on the topic:
http://karthikg.wordpress.com/2009/10/12/athandlerinterceptor-for-spring-mvc/
http://karthikg.wordpress.com/2010/02/03/taking-spring-mvc-controller-method-injection-a-step-further/

Written by Jonathan

November 20th, 2010 at 4:13 am

Posted in code, java

Tagged with

Some Problems with Branch Based Development, And Recommendations

without comments

Reading time: 3 – 4 minutes

As I have previously written, I dislike branch based development (especially when it involves subversion and long lived feature branches). Sometimes projects have multiple teams concurrently working in the same codebase, each on different release schedules. For example, team “Delta Force” will go live in a year, with some super top secret and amazing functionality. (Do not get me started on how bad an idea this usually is. I believe in short, frequent releases and rapid customer feedback). All the while, team “Alpha Squadron” is working on periodic releases into production every few weeks or months. Nothing from Delta can go live with Alpha’s releases. How do you enable these teams to cooperate? Some may suggest a long lived feature branch for Delta Force’s code. And then “just” merge the changes from Alpha down to Delta. I believe this is a Bad Idea.

But, if you’re forced into this scenario, at least explain to people the importance of frequently merging (i.e. on a daily basis). The longer you wait between merges, the greater the danger of increasing complexity and merge difficulties.
why-branch-and-merge-is-bad-1

Rather than using a long lived branch, I suggest the following solution: Trunk Based Development. This means you do not create actual branches. Everyone is on the same trunk. As my colleague Paul Hammant phrases it, you Branch By Abstraction. My other colleague Martin Fowler calls this Feature Toggles. First, this enables refactoring, because everyone is working in one codebase. Second, as feature toggles become permanently turned on, you can remove them, and the conceptual divergence between the two “branches” drops. This is a Good Thing.

why-branch-and-merge-is-bad-2

Once again, if you have a long lived branch, and even if you frequently merge, the divergence will grow as the not-reintegrated changes accumulate in the long lived branch. This divergence is RISK. It inhibits refactoring and encourages technical debt.

Prefer Trunk Based Development. It is not perfect, as there still is additional complexity in the codebase. But you can mitigate this with polymorphism instead of if conditionals, and have multiple continuous integration pipelines for all deployable feature toggle combinations.

why-branch-and-merge-is-bad-3
By using Trunk based development, we make it easier to do the right thing. This is an example of the Boy Scout Rule. The Boy Scouts of America have a simple rule that we can apply to our profession. “Leave the campground cleaner than you found it.”

If we all checked-in our code a little cleaner than when we checked it out, the code simply could not rot. The cleanup doesn’t have to be something big. Change one variable name for the better, break up one function that’s a little too large, eliminate one small bit of duplication, clean up one if statement.

Long lived feature branches make this difficult, because any changes you make to one branch needs to be replicated, or you may have merge difficulties. This is especially a problem for a “long lived receiving” branch, which by it’s nature does not reintegrate its changes into the mainline. (Thus that branch is limited in what it can refactor).

Written by Jonathan

November 7th, 2010 at 8:00 am

Subversion Branch Merging and Tree Conflicts

with 2 comments

Reading time: 2 – 4 minutes

I’ve written several times about how I recommend avoiding long lived feature branches. But, if you have to use these, you might run into a problem with subversion identifying unmergable tree conflicts. I think this has been fixed in current versions of subversion, but let me walk through a conceptual scenario of what was happening when you get this error message:

Error message:
svn: Attempt to add tree conflict that already exists
svn: Error reading spooled REPORT request response

(Using svn, version 1.6.4 (r38063) compiled Aug 7 2009, 03:47:20)

It starts when you have a branch and a cherry-picked merge from the branch to trunk.
svn-merge-tree-conflicts-1

Next, you want to merge all of the branch, to trunk. And you’re using subversion merge tracking.
svn-merge-tree-conflicts-2

Subversion really does two merges: one up to the cherry pick, one after the cherry pick.
svn-merge-tree-conflicts-3

But, if there were tree conflicts in both the merge before and after the cherry pick, subversion (as of an older version we used last year) would die. I hope this is out of date, and you can avoid the problem with newer versions.
svn-merge-tree-conflicts-4

I had to recompile subversion from source, hacking in some special logging and then forcing it to continue after the merge conflict. Read more on the mailing list here and search for jawspeak or paul_hammant. With my patch, it would log a special message, continue, and not die.

svn-merge-tree-conflicts-5

Resolve it by manually fixing the tree conflicts (some of these scripts might help you), and mark the conflicts as resolved.
svn-merge-tree-conflicts-6

Moral of the story? Use Trunk Based Development, and not feature branches.

Written by Jonathan

November 6th, 2010 at 3:58 am

Posted in code

Tagged with ,

Subversion Parallel Multi-Branch Development And Merging

with 3 comments

Reading time: 2 – 2 minutes

As discussed in my previous post, I dislike merging-based-development, preferring Trunk Based Development instead. But, sometimes you’re stuck with a long-lived development branch, and you need to merge changes (subversion tree-conflicts and all). At the end of the post, I have several scripts I used to make this easier. Not the prettiest, but saved a lot of pain when we had major refactorings in trunk, and needed to locate and merge the changes to those files in a long lived (read: horrible) dev branch.

Imagine this scenario: Multiple streams of development, with a long-lived “3.0 dev” branch that has never reintegrated with the trunk. (Because 3.0 has new features that won’t go into production for many months).

branch-and-merge-problems-1

There are substantial dangers in this approach. This diagram only touches on the surface of the areas of risk in which a merge could fail. Solution? Trunk based development / branch by abstraction.

branch-and-merge-problems-2

Given this required scenario, I developed a few best practices and scripts for merging. The best practices involved having multiple branches checked out into different directories. And then we would find equivalent files that have moved and merge the tree-conflicts.

Scripts to assist in Subversion 3 way merging.

Custom diff3-cmd configuration setting in svn:

Written by Jonathan

November 3rd, 2010 at 4:21 am

Session State – it’s not complicated, but options are limited

with 2 comments

Reading time: 3 – 4 minutes

Web apps have numerous choices for storing stateful data in between requests. For example, when a user goes through a multi-step wizard, he or she might make choices on page one that need to be propagated to page three. That data could be stored in http session. (It also could get stored into some backend persistence and skip session altogether).

So where does session get stored? There are basically four choices here.

  1. Store state on one app server, i.e. for java in HttpSession. Subsequent requests need to be pinned to that particular app server via your load balancer. If you hit another app server, you will not have that data in session.
  2. Store state in session, and then replicate that session to all or many other app servers. This means you don’t need a load balancer to anchor a user to one app server; multiple requests can either hit any app server (full replication). Or use the load balancer to pin to particular cluster (themselves replicating sessions, and giving higher availability). Replication can be smart so only the deltas of binary data are multicast to the other servers.
  3. Store the state on the client side, via cookies, hidden form fields, query strings, or client side storage in flash or html 5. Rails has an option to store it in cookies automatically. Consider encrypting the session data. However, some of these options can involve a lot of data going back and forth on every request, especially if it’s in a cookie and images/scripts are served from the same domain.
  4. Store no state on app servers, instead write everything in between requests down to backend persistence. Do not necessarily use the concept of http session. Use id’s to look up those entities. Persistence could be a relational database, distributed/replicated key/value storage, etc. Your session data is serialized in one big object graph, or as multiple specific entries.

What you choose is up to many factors, however a few guidelines help:

  1. Try to keep what is in session small.
  2. If possible, keep session state on the client side.
  3. Prefer key/value storage replicated among a small cluster over replicating all session state among all app servers.
  4. Genuinely consider sticky sessions, which home users to a particular app server. Many benefits abound, including what Paul Hammant talks about w.r.t Servlet spec 7.7.2 Appengine’s Blind Spot.
  5. If you serialize an object graph, recognize that when you do a deployment it will probably mean existing sessions are now unable to be deserialized by the newly deployed app. Avoid this by using your load balancer to swing new traffic into the new deployments, monitor errors, and then let the old sessions expire before switching all users over to the new deployment. Bleed them over.
  6. Session is not for caching. It may be tempting to store data in session for caching purposes, but soon you will need a cache.
  7. Store in session what is absolutely necessary for that user, but not more. See caches, above.

Update: see also  http://www.ibm.com/developerworks/websphere/library/bestpractices/httpsession_performance_serialization.html

Written by Jonathan

July 30th, 2010 at 3:39 pm

Posted in architecture, code, java

Tagged with

Spring Request Scoped FactoryBean returning null gets cached forever

without comments

Reading time: 2 – 4 minutes

We are using FactoryBeans extensively to own the responsibility of interesting work in constructing our object graph. These are request scoped, because they may depend on other objects that were themselves created by FactoryBeans specific to this request. For example, we have a Customer which needs an Account, so CustomerFactoryBean depends on the field Account to be injected. (We use field or setter injection because otherwise creating a factory bean that depends on another object that is created by a factory bean may create a spring exception about circular dependencies.)

package com.example.dotcom;
 
import com.example.common.Account;
import com.example.dotcom.session.CustomSession;
import org.springframework.beans.factory.FactoryBean;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Scope;
 
// the scope is crucial, this controls the scope of the factory bean
@Scope("request")
public class AccountFactoryBean implements FactoryBean {
 
       // this is itself created from another factory bean
       @Autowired
       CustomSession session;
 
	@Override
	public Object getObject() throws Exception {
		// Sometimes this is going to return null, other times an instance.
                // The fix is to always return a non-null instance (despite what the
                // javadoc says). Use the Null Object pattern.
                return session.get(Account.class);
 
 
		// Note: you will need to implement caching in here or in 
		// a custom scope (which we did). Or every time you need
		// to inject the Account it will call getObject(), which
		// could be problematic if it was a slow operation.
	}
 
	@Override
	public Class getObjectType() {
		return Account.class;
	}
 
	@Override
	public boolean isSingleton() {
		return false;
	}
}

We are using autowiring by type, but we encountered a problem when the result of a factory bean’s getObject() is sometimes null. If the first call to that request scoped factory bean returned null, it would never call getObject() again in future requests. Upon investigation, you can see there is a subtle caching of null return values from factory beans. (Even if they are request scoped.) Might be a bug, I don’t know. But the work around is probably a good idea to implement anyways: use Null Objects instead of passing around null.

Here is the problematic code from Spring that caches the null return value: AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement. There is a field cached and cachedFieldValue. These are set if the value is null, and then on subsequent calls they will return the null value without calling getObject on the factory bean.

I recommend not returning null in factory beans, or else it might be cached by Spring and your factory bean will not be called again when that object is needed.

Written by Jonathan

July 27th, 2010 at 11:18 pm

Posted in code, java

Tagged with

Less Hate with Maven Part 2: The Wrapper Script

without comments

Reading time: < 1 minute

I previously wrote about useful debugging techniques with maven. Our maven builds have become complex, with Branch By Abstraction, and about 40 devs working simultaneously on the codebase in 2 continents. We have at least 3 profiles for each of the branch abstractions that are currently running in the codebase. I’m one of the tech leads, and in order to keep the team’s build consistent and easy to remember, we have a wrapper script (thanks to Cosmin).

Here it is:

Written by Jonathan

May 28th, 2010 at 9:03 am

Posted in automation, code, thoughtworks

Tagged with