Back in 1947, computer science pioneer Alan Turing foresaw machines that could learn from experience. 73 years later, everyone seems to be yapping on about ‘big data’ and ‘machine learning’, as I have on this blog as well.
Where conventional programming starts with data and a programme to produce an output, machine learning starts with data and an output to produce a programme.
LetsGrow is a worldwide control room for growers – or put more concretely, an online platform for sharing data and knowledge. Even that description is far from concrete, though:
LetsGrow’s platform does three main things:
- Collect data – from sensors, climate computers, and so on. LetsGrow’s API connects growers to third parties.
- Add value – through models, artificial intelligence, calculations, and expert knowledge.
- Supply information – whether it be decision support, consulting, or optimisation.
The first bottleneck in using data is taking measurements. Seemingly trivial changes to hardware can make a big difference. Thanks to LoRAWAN technology, sensors can be placed in greenhouses wirelessly. This allows for far more sensors, on the order of magnitude of 10 per hectare.
Another area where this bottleneck proves a challenge is in the standardisation of data. Through LetsGrow’s Data App, crop registration is done consistently and in the right format.
There was one thing in Peter’s presentation that wasn’t new, but a well-put summary of what to do with all this data and knowledge. This is the approach used at LetsGrow:
- Graphs – what happened?
- Data/models – why did it happen?
- Predictive modelling – what will happen?
- Optimisation – what is the best that could happen?
All of this is still very conceptual. Let’s go through a specific example presented by Peter:
An example with tomatoes
It’s not an accident that in Dutch greenhouse horticulture, all the innovative ideas are tried to tomatoes first. It’s a big market both in The Netherlands and in the rest of the world. Using data and knowledge, LetsGrow has been developing models to predict yield in tomatoes.
Why is predicting yield important? For one, it allows growers to have more certainty about their expected yields. This helps them achieve better contracts and better prices.
LetsGrow’s main procedure for developing models is as follows:
- Acquire data
- Clean and prepare data
- Train model
- Test model on new data
- Improve the model, or change its inputs
LetsGrow’s tool of choice is Microsoft Azure, because of its ease of use and the fact that it has plenty of models that can already be used off-the-shelf. Out of these five steps, cleaning and preparation is the most important. As they say, ‘garbage in, garbage out’.
The first model was used to predict yield, and is very simple – so simple it doesn’t use solar radiation as an input (and as the old adage says, 1% more light is 1% more production, so it’s not a trivial factor). That said, this model gets 75-85% accuracy. The problem is that this isn’t high enough. Also, calibrating this model takes too long.
The second model predicts the size of the fruit set and helps growers plan their yields accordingly, based on three main factors: temperature, radiation, current production, current crop fruit load, and the current crop’s fruit set. Surprisingly carbon dioxide concentration is not used in this model – Peter said that although it is an important factor, it wasn’t as important as these other factors for this model (or maybe in this dataset). This model is able to achieve about 90% accuracy on average. It also looks 1 and 4 weeks ahead. Of course, the 4-week projections are less certain than the 1-week projections, but this should be helpful.
Despite this impressive performance, Peter mentioned three big challenges at the end of his presentation:
The first challenge is achieving a high data resolution. Most data is recorded every 5 minutes, but for some things this is simply not doable, like plant registration. Plant registration is done once a week. Because of this, they tend to average their high-resolution data to fit this weekly measurement.
The second challenge is data quality. When data has a lot of noise, the model is bound to overfit, responding too much to irrelevant noise rather than the general patterns. This is why domain knowledge remains crucial.
The third challenge is disturbances. This is sort of similar to data quality, but instead of sensor errors and overfitting, the cause of this challenge is external circumstances. For example, if the tomato price goes up, the grower may harvest more than usual. This is fine, but growers should update LetsGrow on this so they can take it into account!
What’s next? To overcome the three challenges above, companies are trying to automate plant monitoring. My friend Felipe’s MSc thesis is about this topic, estimating tomato yield using images.
Peter mentioned Gearbox, a company making hardware to automate plant registration. Not only is this faster and less labour-intensive, it also improves consistency. You can tell when the head grower took a measurement – they always stand out.
Another development presented by Peter was HortiKey’s Plantalyzer. HortiKey is a company that also does data-driven consulting, similar to LetsGrow. Plantalyzer gathers data at night (to avoid bumping into employees) and takes a new picture every 10 cm.
These projects should lead to big improvements as the data becomes more high-resolution. After all, garbage in, garbage out. Even Alan Turing would have been able to tell you that I’m sure. It’ll be exciting to see what can be achieved.