Over 6 months ago I started my MSc thesis – and now the project’s finished. This is a cursory but technical article where I share some of the results with you. A quick recap:
There are many ways of controlling a greenhouse’s climate, but most greenhouses these days use a combination of rule-based control and proportional control. Rule-based control is literally a bunch of ‘if-then’ statements. Many greenhouses have hundreds of these rules – and all of these put together can be seen as a decision tree. Proportional control is, as the name suggests, control where the response is proportional to what’s measured. For example, a window might open proportional to the indoor temperature – the higher the temperature, the more the window gets opened, up to a limit.
Genetic algorithms are a type of optimisation technique. They mimic natural selection by starting with a ‘population’ of potential solutions and selecting and crossing them until the best one is found. In my case, the so-called ‘fitness function’ was the extent to which the realised climate differed from the desired climate. In other words, the smaller the deviation, the better. This was done with a crop/climate model in MATLAB.
My project was about applying genetic algorithms to greenhouse climate control in two ways:
- Optimising the 14 parameters in an existing controller
- Evolving new controllers by evolving new decision trees (adding new rules)
What’s a good climate?
Of course, this depends on who you ask and what you want to grow. In my case though, I wanted the controllers to achieve an arbitrary climate as accurately as possible.
However, climate consists of many factors. The main ones are temperature, CO2 concentration and humidity. If all three of these are where we want them, great. But you can imagine there could be a trade-off between these factors, especially temperature and CO2.
Imagine a greenhouse in the middle of a hot summer. If you were a grower wanting to control temperature, you would probably have the windows fully open. However, this would make it harder to control CO2 levels. Likewise, if you shut the windows to keep CO2 in, temperature would skyrocket.
Because of this, I ran my genetic algorithm on not one, but three fitness functions – one for temperature, one for CO2, and one that combined temperature, CO2 and humidity. This was to give an idea of the trade-off between temperature and CO2.
How good is a good controller really?
The idea was to evolve the best controllers possible. But it’s no good evolving an amazing controller for one week, if when you apply it to another week, it doesn’t work at all. In machine learning, this is known as overfitting. Overfitting happens when your model becomes specialised on the data it has been trained with, but can’t generalise to new situations. It is like a student that just memorises information rather than understanding the general idea.
To avoid overfitting, I decided to apply a technique inspired by bagging, a technique used in machine learning. Bagging is short for bootstrap aggregation. Essentially, you split your data into subsets and train the model on each of these subsets. This gets you multiple models, which you then combine into one, more generalisable model.
In my case, this meant evolving three decision trees (one for each week). Then, I combined these three into a controller that did the average of these three decision trees – a bit like a majority-voting system. For example, if two out of the three decision trees said turning on the lights would be good, the lights were turned on.
I then applied the controllers achieved through bagging on a fourth and fifth (unseen) week. Would these be more generalisable?
Here’s an overview of the results, in summer and winter. The main thing you see is that most of the time, simply optimising those 14 parameters in the existing controller led to a better result than adding fancy new rules. For simplicity, I left the third optimisation criterion out, but the message is: keep it simple, stupid!
But that’s only part of the story. Remember the temperature-CO2 trade-off? In winter, there isn’t much of a trade-off, but in summer, the trade-off is huge. This makes sense, since keeping the greenhouse cool means losing CO2, as explained above. See below for an example of the difference between summer and winter:
In multi-objective optimisation, a Pareto front is a set of points on the edge of the trade-off plot. If you are looking at just two optimisation objectives, it only makes sense to be on the Pareto front. It’s the line where no objective can be improved without sacrificing the other.
Even though the simple method outperformed the complicated one, adding new rules to the controller was able to push the Pareto front further.
Now, the next part. We have seen how climate control can be improved if we know the weather and run the genetic algorithm on a specific week. But how do these evolved controllers perform on an unseen week?
Controllers from the first method (just optimising the 14 parameters) did quite badly, worse than the default controller. So it looks like these controllers cannot be applied to new situations. However, controllers from the second method did perform better than the default controller. Even though method 1 produces a better training result, method 2 produces a better testing result. That said, method 2 is still not always reliable. Sometimes, it still performs worse than the default controller.
What about bagging, where we combined these optimal decision trees? It turns out that bagging resulted in the best performance of all! That is, for most cases. In summer, for the combined objective of CO2 and temperature, it performed far worse. I have my suspicions about why that is but I’ll leave that beyond the scope of this article (and perhaps in the scope of the comments section).
For me, my internship at Priva. But for this idea, it would be interesting to see if it works for other optimisation objectives. For example, why not optimise for yield or efficiency, instead of climate control accuracy? After all, that’s our main goal. And on that note, other systems could be optimised using this technique as well, e.g. indoor poultry.
The genetic algorithm also needs refining. There are so many settings you can change for the genetic algorithm itself, so playing around with those would be interesting (disclaimer: it would require a lot of computing time – one optimisation takes longer than a day to run…).
Many thanks to everyone who helped me out along the way – in particular, my three supervisors: Simon van Mourik, Anna Petropoulou and Ilias Tsafaras, who were very patient at the beginning when the approach of this project was unclear. Also thanks to Bert van ’t Ooster and David Katzin for helping me speed up the model, and Sam Blaauw for giving me remote access to the university’s computers as I finished the last simulations from home. Lastly, thanks to the others doing their thesis for countless games of Shithead (and Stiften).
Let me know if you’re interested in reading the whole thing!