Pilot to Copilot: Where is my typo

Pilot to Copilot: Where is my typo ?

After my first experience with the GitHub CoPilot I wanted to go beyond simple ad hoc code generation. My idea was (or still is) to integrate the CoPilot with little additional human actions to implement a feature. The process would look like this:

Write a BDD style feature file
Use the CoPilot to generate the python code for the feature
Generate the step definitions for testing the feature

In an ideal case only minor adjustment from my side (like putting code into the correct files) would be needed.

The Setup

I wrote a very simple Gherkin scenario for a Python function that could add and subtract numbers. The scenario which can be found here looked like this:

Feature: A simple calculator

Scenario Outline: Basic calculation
  Given I create a calculator instance
  When I call the method calculate with the paramneters <number_a>, <number_b> and the <sign>
  Then Then the method should return be <result>
  Examples:
  | number_a | number_b | sign | result |
  |        1 |        1 |    + |      2 |
  |        1 |        1 |    - |      0 |

So really no rocket since. The full source code of this example can be found here .

Using the new chat function to generate code

Using the new chat feature I selected the secnario and prompted the CoPilot to generate a python class for the scenario. The prompt looked like this:

Creata a python class that fulfills this specification

Talking about being precise when prompting, the generated code not only did subtraction and addition but also multiplication and division. So I got a little bit more specific in the next prompt:

Creata a python class that only fulfills this specification

Note the only in the prompt. The generated code which can be found here seemed much better this time:

class Calculator:
    def calculate(self, number_a, number_b, sign):
        if sign == '+':
            return number_a + number_b
        elif sign == '-':
            return number_a - number_b
        else:
            raise ValueError("Invalid sign")

For the time being I was happy with the result and put the code in a file called calculator.py. Next I needed step definitions for the scenario.

Generating step definitions

So back to the prompt, highlight the senario again, start a code chat with the following prompt:

Create python step implemenations for the scenario using python's behave framework and the class Calculator

This time it seemed like the CoPilot’s code was spot on. The generated which can be found here looked like this:

from behave import given, when, then
from calculator import Calculator

@given('I create a calculator instance')
def step_impl(context):
  context.calculator = Calculator()

@when('I call the method calculate with the parameters {number_a:d}, {number_b:d} and the {sign}')
def step_impl(context, number_a, number_b, sign):
  context.result = context.calculator.calculate(number_a, number_b, sign)

@then('the method should return {result:d}')
def step_impl(context, result):
  assert context.result == result

After putting the files into the file steps/steps.py I was ready to run the scenario.

Running the scenario

I used the behave framework to run the scenario but the run failed with the following error message:

2 steps passed, 0 failed, 0 skipped, 4 undefined
Took 0m0.000s

You can implement step definitions for undefined steps with these snippets:

@when(u'I call the method calculate with the paramneters 1, 1 and the +')
def step_impl(context):
    raise NotImplementedError(u'STEP: When I call the method calculate with the paramneters 1, 1 and the +')

@then(u'Then the method should return be 0')
def step_impl(context):
    raise NotImplementedError(u'STEP: Then Then the method should return be 0')

The expression(s) for the step implementations generated by the CoPilot obviously did not match the gherkin specification. So what was the problem ?

Automatic spell checking ?

First I thought the problem might be related to the parameter types in the expression and I tried to change them from digit to string. But if you look closely at the generated code you will see that the CoPilot generated the following expression:

@when('I call the method calculate with the parameters {number_a:d}, {number_b:d} and the {sign}')

while the actual gherkin sentence in the scenario was:

When I call the method calculate with the paramenters <number_a>, <number_b> and the <sign>

Notice the typo in the name paramenters ? The CoPilot generated the word parameters instead of paramenters which was a typo I made right at the start when writing the scenario.

This kind of auto correction also was the cause for the second error message. The CoPilot generated the following expression:

@then('the method should return {result:d}')

again the gherkin statement in the scenario looked different:

Then the method should return <result>

Notice the additional word Then, which the CoPilot removed. After fixing the typo and removing the additional word Then, the scenario ran fine.

Conclusion

Well, the most obvious conclusion is that I make far too many typos ;-)

But seriously, I think the CoPilot did a good job in generating the code. But this example also shows that the CoPilot does not really understand the context of the code it generates but just sees it as another language. It does not really understand the meaning of the code it generates.

This again emphasizes that the code generated by the CoPilot needs to be carefully reviewed before using it.

Pilot to Copilot: Where is my typo - Sun, Oct 8, 2023