About a year ago, I was finishing off my BSc thesis, about minimising nutrient emissions from greenhouses through modelling. Less than a year on, I’ve just come to finalising the proposal for my MSc thesis. How time flies!
This one will be about climate control, again in greenhouses. I was interested in doing something with machine learning, rather than mechanistic modelling based on mechanistic mathematical equations.
Modern greenhouses are full of actuators – lighting, shading, heating, opening and closing windows, adding CO2; the list goes on. Controlling these is tricky work.
In the past, this was done by growers. Growers would use their knowledge and green thumb to adjust settings throughout the day, based on conditions. A lot of this has been computerised. After all, growers respond to conditions that can be measured, and use actuators that can be automatically adjusted.
Most greenhouse climate control systems literally mimic grower knowledge, with hundreds of ‘if-then’ statements*. If temperature is above 25 ºC, open the windows; if CO2 concentration is below 200 ppm, then add more CO2 – you get the idea. This is known as rule-based control. Thermostats work the same way.
Hundreds of rules in rule-based control, put together, can be seen as one big decision tree. The threshold values and structure of this tree are unlikely to be optimal. In fact, they aren’t optimal. Growers with these systems still have to intervene regularly.
This is what I would like to tackle in my MSc thesis. To what extent can we optimise these decision trees, and how useful would that be?
I’m not the first student to look at optimising rule-based control. Last year an MSc student called Max van den Hemel looked at optimising the threshold values for the if-then statements in rule-based climate control.
Max’s results looked promising, and improved greenhouse performance when tested on a model. However, Max highlighted two main limitations in his approach and suggested rule-based control could be further improved:
- Max only looked at threshold values, and not tree structure/hierarchy.
- The optimisation approach Max used was a local one, not a global one. This means he might not have found optimal threshold values either (if that doesn’t make sense, it should in a minute).
Optimising decision trees is difficult, because there are so many parameters to play with. The threshold values are already a lot, but when you start looking at tree structure, there are even more possibilities. More parameters means orders-of-magnitude more possibilities, making optimisation very hard. This is known as the curse of dimensionality.
Local optimisation, which Max used, works by taking the gradient of a function and finding where this gradient is zero. If the gradient is zero, we are at a maximum or a minimum. The problem is that you have to know what the derivative of that function is to find the gradient, which is hard when the function is complicated (e.g. yield as a function of the temperature threshold). Also, local optimisation is, well, only local. It gets fooled by small hills and misses the mountains.
An alternative to gradient-based optimisation is enumerative optimisation. This works by trying as many combinations as possible, finding an optimum by brute force. This is OK (and kind of what I used in my BSc thesis) but gets painfully slow for complicated problems.
So what to do about this?
Genetic algorithms are an optimisation technique that mimic Darwinian natural selection to find better solutions.
There’s a lot of chaos and complexity in genetic algorithms – but in all that randomness, better-performing individuals get selected more often, which leads to outstanding solutions. GAs have been around for about 50 years and have been used in design (as well as fun games like BoxCar2D, where you can see a GA in action).
How does it work? You create a population with a ‘genetic makeup’. The genetic makeup of each individual, or chromosome, is encoded on a string of ones and zeros. This string is the DNA of the individual and translates into, in our case, a decision tree.
Each individual is tested against a fitness function. The fittest are then allowed to reproduce. During reproduction, their genes cross over, creating new combinations and new chromosomes. Random mutations also get added to the offspring’s genetic makeup.
That’s genetic algorithms in a nutshell. There are some great articles on this online, as well as a book from 1989 by David Goldberg I’ve been using, which gives a helpful overview of the technical details of GAs.
My MSc thesis will be about applying genetic algorithms to improve decision trees in greenhouse climate control, using a greenhouse climate and yield model for a fitness function. The algorithm will improve not only the threshold values of the tree, but also tree hierarchy/structure.
I will first look at using genetic algorithms to optimise decision trees such that the error between the desired climate and realised climate is minimised. Once that works, I hope to expand this and look at optimising something like yield or efficiency.
Then, time allowing, I will look into these questions more in-depth. Do the trees suggested by the genetic algorithm work in different situations? Are they sensitive to inaccurate sensors? Do they improve performance in greenhouses with different specs or does the GA need to be run all over again for every new greenhouse built?
That’s it for now. My proposal is as good as done, so now it’s time to start looking at models and setting things up in MATLAB. On the 5th of November (remember, remember) I will be giving my start seminar to outline the project.
If any of you reading this have questions or tips on how I could approach this, or are interested in reading the full proposal, I’d be happy to hear from you!
*Rule-based control mainly applies to on/off actuators. If an actuator has many possible values, like the temperature of a heating pipe, rule-based control is probably not the best thing to use. You could use some form of proportional control, depending on the system. Mark and Jeremy explain the difference between on/off control and proportional control in this video.
Excellent post Alex! I wish you all the best in mutating those populations 🙂
P.S.: Love the Darwin photo. I wonder what he’d think about GA.
Cheers, Luka 🙂
Yeah, I wonder whether he ever thought his theory would be used for non-biological and quite abstract things! Similar to how it wasn’t obvious to people in the days of Charles Babbage that computers could be used for art/music rather than just numbers and analysis.
Hope your presentation went well.
My comment may come too late – but did you consider Machine Learning as well? I am not sure what pros/cons you may find between one or the other option. But ML is quite active these days and there are a lot of tools and stuff you can leverage.
A few examples of this is the open-source platform from NVIDIA call RAPIDS for ML. You can even go to Deep Learning and the amount of options even grows further.
My two cents
Thanks, Carlos! Genetic algorithms can be seen as a form of machine learning, but that depends on who you ask so I’m being a bit pedantic here 😉
I did get similar feedback to yours though, especially for the decision tree part. Decision trees used for classification or regression can often overfit for the data, and the same could apply in my case (could overfit for the model or the specific weather for that year). So I plan to apply bagging to the trees.
Thanks for sharing NVIDIA’s RAPIDS – GPU-accelerated processing could be very helpful! Right now the model’s speed is definitely the limiting factor. Simulating a week takes 10 seconds, which is a lot of time since the GA will be doing hundreds of runs, if not thousands.
Deep learning would be awesome – I hope to learn more about this in a future project, maybe my internship. Won’t switch to deep learning for my thesis but would be interesting to explore!