In this post, I want to explore a problem I am facing at my day job - when to generate a heavy dynamic lookup structure the app needs to process requests. We will look at the different options, reason through the pros and cons, and then also look at implementations of the different options in an example project.
Naturally, the problem is simplified and stripped down to the bare essentials so readers can follow along.
The problem
The problem we are facing is that in our application we have one core component that is used in almost every request to process it.
The component needs a heavy lookup structure, but the problem is that the lookup structure is not static.
It changes depending on some factors, like the environment of the stage, configuration the users do, etc.
In some cases, it can take minutes, and in theory, there is no upper limit on how long this can take.
Now the problem is how or rather when to initialize this structure.
The requirements for the problem are:
- We want to provide a good user experience meaning we want users’ requests to be completed as fast as possible.
- We want to keep the structure up to date, so there should be a way to reinitialize it
The options
There seem to be two fundamental options available:
- Create the lookup structure on startup
- Create the lookup structure on request
Let’s have a look at each separately.
Create the lookup structure on startup
With this option, we create the lookup structure when the application is starting up. We then store it and use it for every subsequent request coming in.
Pros:
- every user request has the structure it needs and is therefore fast
Cons:
- the startup takes as long as the structure creation needs
- we need to restart the application when the structure changes
Create the lookup structure on request
With this option, we create the lookup structure when the request comes in. There are a couple of levels here going from worst to best:
- We create the lookup structure on every request
- We create the lookup structure if it is not there
- We create the lookup structure if it is not there, but use synchronization to do it only once
The first level is horrible because every request would take however long the structure creation takes plus anything we do in the request on top of that.
The second level means that every request that comes in after a restart but before the structure is created goes into creating the structure. So all the requests coming in before the structure is initialized will take a long time, and after the requests will be fast.
The third level means that every request that comes in after a restart but before the structure is created will try to create the structure, but will wait if another thread is already creating it. Here the first request will take a long time, and every request after will take a shorter amount of time because they will reuse the result from the first request. And all the requests after will be fast.
Because the third level is best we will analyze the pros and cons of that:
Pros:
- the startup of the application is quick
- we can invalidate the structure while the app is running
Cons:
- all the requests coming in before the structure is created will take a long time
Solution
Now that we have looked at the options, which approach is the solution to our problem?
Well… as always in software engineering - it depends :)
Given our problem, neither option is perfect for everything we would like to have - fast startup, fast requests, structure reinitialization. So we have to make some compromises.
Option 1 is great from the user experience point of view because we don’t have periods of requests taking a long time. However, this comes at the cost of a slower startup time for the application. But once the application is running the requests are all processed fast. Another drawback is the reinitialization of the structure. We have a way to do it by restarting the application but it is not great. And on top of that, we need to monitor the lookup data source and restart when the data changes.
Option 2 is better from the operational point of view, as we can code the logic to update the structure when the data changes. It also gives us a fast startup time, but we have periods where requests take a long time to process, which is not the greatest user experience.
The final decision comes down to what is more important.
In my day job, we know exactly when the structure needs to be reinitialized and the structure is so fundamental to the application that option 1 seems like the better approach for us.
Example project
For our example project we will look at an NBA game service, that will return all the games for the current season for a given team. The heavy lookup structure in this case is a map of all the games in the season per team. Our lookup structure will just be an example, so we will be simulating it taking a long time with our good friend Thread.sleep().
The application is a very simple Spring Boot web application that has one REST controller exposing a single endpoint:
@RestController
public class GameScheduleController {
private final GameScheduleService gameScheduleService;
public GameScheduleController(GameScheduleService gameScheduleService) {
this.gameScheduleService = gameScheduleService;
}
@GetMapping("/getGamesForTeam")
@ResponseStatus(HttpStatus.OK)
@CrossOrigin(origins = "*")
public List<NBAGame> getGamesForTeamVersion1(@RequestParam("team") String team) {
long startTime = System.currentTimeMillis();
NBATeam nbaTeam = NBATeam.fromString(team);
List<NBAGame> games = gameScheduleService.getRegularSeasonGamesForTeam(nbaTeam);
long endTime = System.currentTimeMillis();
System.out.println(
String.format("Request for team '%s' took '%s' milliseconds", team, endTime - startTime)
);
return games;
}
}
The controller connects to a service, which will be the main component in our case. This is the component that needs the lookup structure to serve requests coming in. Let’s have a look at both options:
Creating the lookup structure on startup:
@Service
public class GameScheduleService {
private Map<NBATeam, List<NBAGame>> gameLookupStructure;
public GameScheduleService() {
initializeLookup();
}
private void initializeLookup() {
List<NBAGame> games = parseGameScheduleFile();
Map<NBATeam, List<NBAGame>> lookupStructure = new HashMap<>();
for (NBAGame game : games) {
// add game to home team
lookupStructure.putIfAbsent(game.getHomeTeam(), new ArrayList<>());
lookupStructure.get(game.getHomeTeam()).add(game);
// add game to away team
lookupStructure.putIfAbsent(game.getAwayTeam(), new ArrayList<>());
lookupStructure.get(game.getAwayTeam()).add(game);
}
gameLookupStructure = lookupStructure;
}
public List<NBAGame> getRegularSeasonGamesForTeam(NBATeam team) {
return gameLookupStructure.get(team);
}
// Parsing method
...
}
Creating the lookup structure on request:
@Service
public class GameScheduleService {
private volatile Map<NBATeam, List<NBAGame>> gameLookupStructure;
public List<NBAGame> getRegularSeasonGamesForTeam(NBATeam team) {
initLookupStructureIfNeeded();
return gameLookupStructure.get(team);
}
private void initLookupStructureIfNeeded() {
if (gameLookupStructure == null) {
createLookupStructure();
}
}
private synchronized void createLookupStructure() {
if (gameLookupStructure != null) {
return;
}
List<NBAGame> games = parseGameScheduleFile();
Map<NBATeam, List<NBAGame>> lookupStructure = new HashMap<>();
for (NBAGame game : games) {
// add game to home team
lookupStructure.putIfAbsent(game.getHomeTeam(), new ArrayList<>());
lookupStructure.get(game.getHomeTeam()).add(game);
// add game to away team
lookupStructure.putIfAbsent(game.getAwayTeam(), new ArrayList<>());
lookupStructure.get(game.getAwayTeam()).add(game);
}
gameLookupStructure = lookupStructure;
}
// Parsing method
...
}
On request service has an initLookupStructureIfNeeded method that gets called for every request that loads the structure if needed. On startup service has an initializeLookupStructure in the constructor, so the lookup structure gets created when the bean gets created.
If we start each version and hit the application with a couple of requests we can notice the difference in behaviours.
Application startup time for version 1 (on request):
Started LookupStructureExampleApplication in 2.013 seconds (process running for 2.382)
Application startup time for version 2 (on startup):
Started LookupStructureExampleApplication in 62.369 seconds (process running for 62.807)
Request times for version 1 (on request):
Request for team 'Lakers' took '60091' milliseconds
Request for team 'Mavericks' took '46883' milliseconds
Request for team 'Nuggets' took '40834' milliseconds
Request for team 'Celtics' took '0' milliseconds
Request times for version 2 (on startup):
Request for team 'Lakers' took '0' milliseconds
Request for team 'Mavericks' took '0' milliseconds
Request for team 'Nuggets' took '0' milliseconds
So we can see the difference between the startup times and the first couple of requests. And remember, that usually everything is a tradeoff in software engineering, so revisit your requirements and choose the option that best fits either most of your constraints or the most important ones.
You can browse and get the full example repository on GitHub.
Conclusion
In this post, we looked at a real-life problem that you might encounter. We looked at options on how/when to initialize a lookup structure that your application might need to serve requests and we discussed the tradeoffs. And we rounded things out by having a look at an implementation of the two options in an example application.
This is it for the post about when to initialize a heavy lookup structure in your application.
Thank you for reading through, I hope you found this article useful.