4 December 2024
In this article we pursue an ambitious goal: Create an entire, non-trivial optimization model using Artificial Intelligence (AI).
Our approach mimics a user who is familiar with the situation, and has some experience with formulating optimization models, but is not familiar with implementing an optimization model in Python code. We want the AI to do all the coding for us.
It's not surprising that others precede us in pursuing this goal. For example:
We provide an overview of a system which uses artificial intelligence and database techniques to help a knowledgeable user formulate large linear programs. The system automates many of the tedious processes associated with large-scale modeling […]
Initially, the system will be most suitable for expert users; eventually we hope that it will become intelligent enough to help managers or students with a minimal exposure to linear programming techniques.
"An intelligent system for formulating linear programs", Murphy & Stohr
What may be surprising is that Murphy & Stohr's article was published in 1985, almost 40 years ago. The AI tool they used, a Prolog rules-based expert system, is very different to the Large Language Model (LLM) AI tools currently in vogue. Murphy & Stohr report some success in using their AI system, though their approach did not become commonly used. Nonetheless, their goal then was much the same as it is for us now.
We report our experience of using Copilot to write a model for us. Rather than presenting a verbatim transcript, which is very long, we focus on what went well and what didn't go well as the model evolves. We also summarize general lessons from the process.
Download the models
The model described in this article is built in Python, using the Pyomo library.
The files are available on GitHub.
Choice of an AI
In a previous article, we used the Claude AI to help us write part of an optimization model in Pyomo. Our conclusion was that "Overall, our experience with this model suggests that coding using an AI can be useful, but only as an assistant for small and specific tasks, rather than for writing substantial pieces of code. We liken the process of using Claude AI to working with a knowledgeable but over-confident junior analyst – useful, but Claude can't be trusted to work independently. Yet."
In this article, our goal is more ambitious: we want an AI to write all the code.
The free versions of both ChatGPT and Claude allow only a few questions to be asked before a stand-down time of several hours. We expect the development of this model will require many questions, so ChatGPT and Claude are not suitable for our purpose.
Conversely, the Copilot AI available through Windows 11 allows many questions in a conversion. Therefore, we choose the Copilot AI for developing this model.
Just for fun, we ask Copilot to draw Claude AI, ChatGPT, and Copilot using a "clay" style. The result, shown in Figure 1, is a remarkable image.
Situation
We choose a situation that isn't a standard textbook model, but for which there does exist an academic literature: Optimal crop rotation. Therefore, we expect that the AI will have some existing material to guide it. We want the model to be written using the Python Pyomo library. Python is a widely used programming language, so a lot of Python code has been used in training LLM AIs. Similarly, there is plenty of Pyomo content available to an AI.
We ask Copilot to "Design a crop rotation optimization model". Copilot responds with a comprehensive specification for this situation. Copilot's envisioning of the situation, shown in Figure 2, is a odd mixture of cliché and bizarre exaggeration.
Copilot provides a title, "Optimal Crop Rotation Planning", and defines the objective function as "Maximize the overall yield and profit from a set of agricultural fields over multiple seasons". Five constraints are suggested:
- Each field can only be planted with one type of crop per season.
- Certain crops should not be planted consecutively on the same field to prevent soil depletion and pest buildup.
- The total area planted with each crop should not exceed the available land area.
- The crop rotation plan must meet the demand for each type of crop.
- The cost of planting, maintaining, and harvesting crops should be within the budget.
Copilot also shows the mathematical formulation, defines sample data, and suggests a variety of solution methods.
Before starting this conversation, we had been discussing with Copilot a variety of unrelated non-linear optimization models. Copilot's suggested solution methods for the crop rotation model all relate to non-linear models, though its crop rotation formulation is linear. Therefore, the solution methods it suggests are not well-suited to the proposed model. It seems that we need to be careful to consider the conversation history when asking Copilot questions.
Even so, Copilot's model specification is a good starting point, so we proceed.
Developing the crop rotation model in Pyomo
First attempt at coding a model
We ask Copilot to "Make a model in Pyomo". Like most of our prompts, we keep the questions and instructions concise. This approach usually works well, but occasionally the lack of precise detail leads us astray.
Figure 3 shows Copilot's first attempt at coding a model for crop rotation. Despite our brief prompt, the response has all the parts we expect, including: importing the Pyomo library, sample data, an objective function, constraints, call to a solver, and printing the solution.
Copilot doesn't know how our modelling environment is set up
The first issue we encounter is that Copilot assumes we're using the CPLEX solver. We do not have Cplex installed, so the program in Figure 3 doesn't run in our modelling environment.
We ask Copilot to change to the HiGHS solver, which is installed. But that doesn't work because Copilot simply changes 'cplex'
to 'highs'
. We need to be more specific, telling Copilot to use 'appsi_highs'
instead.
This issue isn't Copilot's fault – it can't be expected to know how our modelling environment is configured. However, this does highlight an issue with using a general-purpose AI that has little or no situational awareness.
Creating dummy data is difficult
Now that we have an appropriate solver, we attempt to solve the model for the first time. But the model is infeasible!
That's because the data Copilot made up is inconsistent with the constraints. To be fair, creating dummy data is sometimes difficult.
In addition to being infeasible, Copilot's initial data lacks variety, so it may not encompass the range of behaviour we want to see. To add more variety, we asked Copilot to expand the data to have 5 years, include more demand variety across the seasons, and have lower overall minimum demand (to make the model feasible).
First solution
After getting Copilot to expand and adjust the sample data, which took a couple of attempts, we finally get a solution from the model. The planting schedule is shown Figure 4.
What does constraint 3 do?
At this point we notice that Constraint 3, area_constraint_rule
, is redundant. Therefore, we ask Copilot to remove that constraint.
Perhaps a constraint on the planting area would be useful in a different situation, but not given the current set of constraints. Copilot's inclusion of this constraint hints at a bigger issue: Copilot doesn't understand the situation in the sense that we would normally attribute to intelligence. Instead, it just writes things that are consistent with its training data.
Removing this constraint leads the solver to return an alternative optima.
Plot the solution too, please
The output list of the crop to plant in each field is useful, but difficult to visualize. So, we ask Copilot to write code that plots the solution. The result is shown in Figure 5.
For an inexplicable reason, Copilot replaced the printed solution list code with the plot code, so we had to ask for the printing code to be restored – though it wrote entirely different code, rather than restoring the previous code. It seems that "undo" is a difficult concept for an AI.
We could then see a problem: The plot does not match the printed solution.
Our approach is to be a non-programmer, so we didn't look at why the plot is wrong, instead we simply told Copilot "The plot does not match the printed solution". Copilot changed the plot code to produce Figure 6, which is correct (though the solution differs from Figure 4 because it is an alternative optima).
Interestingly, the colours chosen by Copilot for each crop seem appropriate – e.g., yellow for corn – though it is unclear if that is a coincidence or not.
Note that the format of Figure 6 differs slightly from Figure 5 – there are no gaps between seasons. This subtle change alerts us to a general issue that occurs several more times during the development: When asked to make a code change, Copilot may do things we didn't ask for. This "feature" seems to be inherent in how LLMs work. That is, given a slightly different prompt, or a prompt in a different context (such as relating to existing code), the AI might produce a partially or entirely different response. In this case, Copilot corrected the error (which we asked for) and introduced a superficial formatting change (which we didn't ask for). In other cases it might introduce bugs. Beware.
Time to expand the model
We have a working model that prints the optimal solution and shows the crop rotation plan on a plot. Now it is time to expand the model, to see how well Copilot performs at adding features and revising existing code.
To add more variety, we ask Copilot to expand the data to have 8 fields and add a constraint that limits the number of fields in which a crop is planted each season (to ensure variety in the crops we take to market).
Previously, the fields had been defined as fields = ['field1', 'field2', 'field3']
. With more fields, Copilot changes the definition to be fields = list(range(1, 9))
, though we didn't tell it explicitly to do that. It also updates usage of the variable throughout the model. This data structure is like the definition of the seasons, which is reasonable.
We also ask Copilot to express field areas in hectares, crop yield in tonnes per hectare, and profit in dollars per season (including the planting cost). Copilot's code changes all work smoothly. We're impressed by Copilot's ability to add constraints and make consistent changes to the units used in multiple parts of the code.
Since we expanded the timeline, we ask Copilot to "Calculate the total profit as an NPV as a discount rate of 5% per annum". Despite our poor grammar, this works surprisingly well. The revised objective function is shown in Figure 7. It wasn't obvious earlier, but Copilot assumes that a year consists of two seasons. This can be seen in the /2
part of discount rate calculation and the fact that the results show 10 seasons for 5 years. We hadn't specified a particular seasonal pattern, but Copilot's assumption is reasonable, so we continue as is.
After these changes, the crop rotation plan solution is as shown in Figure 8.
Things go a bit awry
While expanding the model, things go a bit awry when Copilot introduces an odd error.
Copilot originally defined the season set as model.S = pyo.Set(initialize=seasons)
, where seasons = list(range(1, 11))
. That works fine.
For no obvious reason, and seemingly unrelated to the changes we ask for, Copilot changes that line to model.S = pyo.Set(initialize(seasons)
, which is not valid and produces a syntax error. When Copilot is advised about the error it corrects the code, which seems OK.
But in every subsequent model version Copilot reintroduces the same error, or a variation where it adds another closing parenthesis, even when we repeatedly tell it not to. Worse, a few model versions later, Copilot starts making this error for both the seasons and fields sets. Perplexing.
This repeated error becomes annoying, so we eventually stop correcting Copilot and just fix the code ourselves each time Copilot creates a new model version. Having to fix the code ourselves isn't entirely in the spirit of our goal to get Copilot to write all the code, but it is less frustrating.
We make a mistake. Copilot happily complies
After all the changes to expand the model, Figure 8 above shows that some seasons have adjacent fields that are planted with the same crop. To reduce the spread of disease and pest migration from one field to another, it is good practice to have adjacent fields grow different crops.
So, we ask Copilot to "Plant a different crop in each field each season". This is a poorly worded prompt that is ambiguous and doesn't clearly articulate what we want. That's our mistake.
As it almost always does, Copilot happily responds to our prompt with a change to the model. The constraint it adds is identical to the existing planting_constraint_rule
, but with a different name. This duplicate constraint doesn't do any significant harm to the model, but it also doesn't do what we intend.
If we weren't carefully checking Copilot's changes at each step, then Copilot's compliance with our mistaken prompt might have caused problems later. There are two lessons here: firstly, we need to take care when writing prompts; and secondly, we need to be careful to check that Copilot does what we intend rather than what we ask.
Down a rabbit hole of repeated failure
Having identified our mistake, we attempt a more carefully worded prompt: "For each season, require adjacent fields to have different crops". This is a tricky request, as our model has no information about how the fields are physically arranged. Nonetheless, Copilot makes a sensible suggestion, saying "We'll assume that the fields are arranged in a linear sequence for simplicity". A more sophisticated approach would have a matrix that defines adjacency of fields. But that's an extension for another time, so we accept Copilot's surprisingly astute suggestion.
However, Copilot's implementation of this constraint is not sensible at all. It writes the constraint as shown in Figure 9.
When we run the revised model, the HiGHS solver reports a rather unhelpful error message "pyomo.contrib.appsi.solvers.highs.DegreeError: Highs interface does not support expressions of degree None", which we tell Copilot.
Copilot's response is "It looks like the appsi_highs solver does not support certain types of expressions, which might be causing the issue. Let's switch to a different solver that can handle these expressions, such as glpk." Switching to use the glpk solver is a bit random and, despite Copilot's assertion, glpk doesn't handle this type of constraint either.
One issue is that the constraint multiplies allocation variables for adjacent fields, which makes the constraint non-linear. Copilot's suggested revision doesn't address the problem. We tell Copilot to revert to using HiGHS and make the constraint linear. But that request sends us down a rabbit hole of repeated failure. Copilot introduces variations of the constraint that become increasingly complex and confused.
Eventually, after failing multiple times to get a working constraint, we tell Copilot the answer: "The adjacent_fields_different_crops_rule constraint does not work. Make it like the rotation_constraint_rule constraint except for fields". This prompts Copilot to make a sensible pair of almost identical constraints for adjacent fields across seasons and within a season, shown in Figure 10. These constraints work as required.
This diversion took quite some time. It would have been a lot faster for us to write the code for this constraint. But that is contrary to our goal of writing no code for this model, so we persist much longer than is reasonable with trying to coax Copilot into writing a working constraint.
Improve the output and revise some features
Since we expanded the number of fields, the printed results are difficult to read because the lists are quite long. So, we ask Copilot to print the results as two dimensional tables, rather than lists. We also ask for more output tables, including the "NPV Profit from each field for each season (with row and column totals)", "Nominal Profit from each field for each season", and "Surplus of production in excess of demand (Crop by Season)".
Copilots adds these tables but, as shown in Figure 11, the table format is messy.
After several failed attempts to get Copilot to make nicely formatted tables, Copilot suggests using the tabulate
library. The result is shown in Figure 12. We don't especially like this format, but it does what we asked for so we can't really complain.
Copilot automatically changes the other output tables to use the same format, so we now have a comprehensive set of results.
Such a table format might be OK for display on screen, but it isn't useful for further analysis. So, we ask Copilot to also output a csv file of the results, which it does flawlessly.
While we're making changes, we also ask Copilot to reverse the order of the fields in the plot, to match the order in the tables (i.e., with 1 at the top, rather than at the bottom). The plot legend also overlaps the fields, so we get the legend moved outside the plot area. Copilot makes those changes OK.
We hit a limit: Copilot gives up responding in full
By this stage, the code is getting quite long. This is a problem because Copilot starts providing partial responses. That is, while responding with revised code, Copilot simply stops part way through. It must be prompted to show the rest of the code. This happens repeatedly. It seems that we reached a limit in the length of reply that Copilot will print.
Copilot introduces a bug
One of the extra tables we ask Copilot to create is intended to show the excess production over the minimum demand for each crop. The initial version is shown in Figure 13. The problem with this table is that some of the surplus values are negative, even though the model has a constraint that production must be greater than or equal to demand, so the surplus cannot be negative.
The issue is shown in Figure 14, which lists Copilot's code for calculating several results. When calculating the surplus (on the last line), Copilot's code deducts the demand each iteration.
But the demand should be deducted just once. When advised about this error, Copilot calculates the surplus as surplus[c][s] = production[c][s] - demand[c][s]
, as shown in Figure 15, which is correct.
Make the crop rotation plan repeat
Since this model is intended to be a crop rotation plan, we ask Copilot to "Add constraints to make the last two seasons the same as the first two seasons". This seems like a simple request, but leads to a series of problems and much frustration as Copilot's code either contains errors or simply doesn't do what we want.
Eventually, by writing a series of very specific and detailed instructions, we get constraints that work as intended. We suspect that the code is getting too long and complex for Copilot to modify accurately. It would have been much easier to write the code ourselves rather than continue to prompt Copilot until it provides working code.
In any case, the result is shown in Figure 16. The additional constraints ensure that the crops planted in seasons 1 and 9 are the same, and in seasons 2 and 10 are the same, while complying with the other constraints and maximizing the NPV of profit over the planning horizon.
Note that the y-axis labels are now like "Field 1", rather than like "1" or "field 1" as they were in previous plots. This is another change that Copilot snuck in without being prompted to.
Just one more constraint: Require fields to be fallow
Good crop rotation practice includes leaving a field fallow (left unplanted during a growing season) from time to time. Therefore, we ask for just one more constraint "Add a constraint that requires each field to be fallow at least once across all seasons".
Adding this constraint turns out to be problematic for Copilot to implement. We need to provide a series of detailed instructions, focussing on specific parts of the code separately: first adding correct constraints, then printing the results, and finally modifying the plot to reflect the new type of result. Like for the previous constraint, it appears that the program is getting too complex for Copilot to make all necessary changes at the first attempt.
But after a few more rounds of prompts, we have a working model including fallow fields, with the solution as shown in Figure 17.
Refactor the code to use good practices
To make the model easier to read and maintain, we ask Copilot to make the code more modular. It does a partial job, in some cases failing to pass all parameters to the new functions, and it doesn't properly account for the change in scope that results from putting some code into functions. It takes multiple attempts to get Copilot to correct the code. Again, it would have been easier and faster to make some code adjustments manually.
Similarly, we ask Copilot to extract hard-coded values (e.g., 0.05
) and make them global constants (e.g., DISCOUNT_RATE
). Copilot makes several inconsistent and partial changes to the code, so it took several iterations to get working code.
Final version of the model
After many changes, a few bugs, and an occasional rabbit hole, we finally have a complete model.
There is more that we could do with this model. For example, the tables start at season 1 while the plot starts with season 0. Also, the code has minimal commenting and some of the comments, like "Add all possible labels to the legend", reflects a change that Copilot made rather than being a useful comment. But we've done enough.
Copilot wrote all 269 lines of code, though we had to edit a few lines (some repeatedly) to get everything working properly. The final program is more than four times longer than the first version (which was 63 lines), with twice as many constraints, more complex objective, several additional formatted results, a plot, csv output, and other features. The full code is available on GitHub.
Images in this article
All the images in this article were either draw by Copilot or plotted using code written by Copilot.
Copilot's envisioning of its fellow AIs (Figure 1), was created with the prompt "Make an image with the logos for Claude AI, ChatGPT, and Copilot in a row", then selecting the "Claymation" option. It is a work of art – and a bit creepy.
When prompted to "Draw an image of Copilot writing a Python crop rotation model in a field", one of the images that Copilot drew is shown at the top of this article and again in Figure 18. This is an odd image, being both literal and surreal. The copilot, who looks like an astronaut, is literally in a field, writing code on the screen with a pencil, while surrounded by strange objects.
In many ways, the images created by Copilot are a good visual representation of working with an AI.
Summary of our experience with Copilot
Our goal was to "Create an entire, non-trivial optimization model using Artificial Intelligence (AI)".
Did we succeed? Yes, with the caveat that we did a small amount of editing. However, it would have been faster and easier to do much of the coding ourselves.
What went well:
- Copilot's initial model specification is a good, comprehensive starting point for model development.
- When an issue occurs, Copilot usually corrects it on the first attempt.
- We're impressed by Copilot's ability to add constraints and sometimes make consistent changes in multiple parts of the code. For example, adding yield and discounting to the objective function and output.
- Even when our prompts contain spelling or grammatical errors, Copilot usually makes a sensible interpretation.
- Given a tricky request to require adjacent fields to have different crops, Copilot makes a sensible suggestion even though the model has no previous concept of adjacency.
- Adding some new features, like writing the results to a csv file, is surprisingly smooth. The code works at the first attempt.
- Even when given a minimal prompt, Copilot's knack for producing a sensible response is remarkable. But a note of caution: given a slightly different prompt, or a prompt in a different context (such as relating to existing code), Copilot might produce a partially or entirely different response.
- In most cases, Copilot includes a helpful description of the changes it made.
- Copilot can write complex code that, in some cases, saves a significant amount of model development time.
What didn't go well:
- Copilot spontaneously, and then repeatedly, introduced a syntax error that we had to manually correct each time.
- The model specification suggested solution methods that are not ideal for this model. Presumedly the methods were influenced by a previous, unrelated conversation about non-linear models.
- We shouldn't expect Copilot to know how our modelling environment is set up, so it may write code that doesn't work for us.
- The AI is often very literal. For example, we asked it to change the solver from Cplex to HiGHS, which it did. But it missed that fact that a more usual definition for using the HiGHS solver is via the name 'appsi_highs'. Sometimes we need to write very precise prompts to get the response we want.
- The initial dummy data is infeasible. It took a couple of iterations to get Copilot to create feasible data.
- One of the constraints in the initial specification sounds reasonable but was redundant in the implementation.
- When asked to make a plot of the solution, Copilot made a plot but it did not correctly represent the solution.
- There is a tendency for Copilot to make stealth changes that are unrelated to the prompt. For example, in the plot code, it changed the y-axis labels multiple times and changed the gap between seasons.
- Copilot sometimes removes or replaces existing code without mentioning the change. For example, it replaced the solution printing code with the plot code. When asked to restore the printing, it did so but wrote different code rather than undoing the change.
- In a couple of instances, Copilot went down a rabbit hole of complexity and confusion to produce unhelpful, incorrect, and even nonsense code changes. In this situation, we had to tell Copilot a solution via specific and detailed prompts – not quite writing the code ourselves, but close.
- Given an ambiguous and poorly worded prompt, Copilot happily complied. It would be much better for Copilot to be more sceptical and ask if that's really what we want.
- Copilot casually introduces a non-linear constraint that our solver can't handle. When advised of this issue, Copilot unhelpfully suggests an alternative solver that also can't handle it.
- Once the code gets quite long, Copilot gives up responding in full. It seems that we reached a limit in the length of reply.
- Introduction of bugs is relatively common. For example, deducting the demand multiple times when calculating the surplus. The code runs, but it produces an incorrect result.
- Prompts that require multiple changes throughout the code, such as refactoring the code to use functions, usually take several attempts to get right as Copilot makes inconsistent/incomplete revisions. This is especially the case towards the end of the development process, when the code is quite long and complex.
As a general observation, we need to remember that Copilot doesn't understand the situation in the sense that we would normally attribute to intelligence. The term "artificial intelligence" is, in many ways, a misnomer.
Overall, Copilot is a useful, though occasionally frustrating, programming assistant. It can write significant pieces of code with little guidance, provided the code isn't too complex and the change doesn't involve too many separate parts of the code. This is both a blessing and a curse, because Copilot cannot be trusted to always write correct code. We need to be careful and precise in writing prompts. Then we must verify what the AI does at every step, to ensure that the code works as intended.
Conclusion
In this article, we describe the result of using Copilot to create an entire, non-trivial optimization model, with the AI doing all the programming.
Overall, we achieve our goal. Some parts of the process are smooth, with Copilot being a very helpful and useful programming assistant. However, other parts of the process are frustrating and occasionally nonsensical. In total, the process takes longer than it would have if we wrote the model without AI – primarily due to the time taken to dig Copilot out of the rabbit holes that it goes down.
The abilities of LLM AI have improved substantially and rapidly. As a programming assistant, AI can certainly be useful for both general programming and writing an optimization model. When it works, it works well. But there are cases where the AI doesn't work well and attempts to coerce it into doing what we want sometimes go awry. In such cases, sidelining the AI and doing the coding ourselves would be more effective and efficient.
Despite frequent recent predictions that computer programming will soon be an obsolete skill, such as AI could make coders obsolete in two years, AWS Chief Predicts, our experience with Copilot suggests that time is not close. While we're undoubtedly closer than we were in 1985, AI is not there yet. Even if or when AI tools become advanced enough to handle most or all programming, optimization model development is much more than just programming. As modellers, we don't expect to be replaced by an AI any time soon.
If you would like to know more about this model, or you want help with your own models, then please contact us.
References
Murphy, F. & Stohr, E. A. (1985). "An intelligent system for formulating linear programs". NYU Working Paper No. IS-85-40, August.