A few weeks ago I was invited to the technical preview for GitHub Copilot. For those not familiar with the concept, Copilot uses OpenAI’s GPT-3 language model trained on a huge dataset of code primarily drawn from GitHub’s own repositories in order to provide a suggestion engine for code as you’re writing it. In a nutshell, Copilot looks at your codebase as a whole as well as whatever you’re currently writing and suggests code you might want to write next. This can include function calls based on variable names, snippets of code drawn from elsewhere in your project, or even entire functions written from a single comment describing the desired functionality. Where it’s particularly useful is in its understanding of APIs – its understanding of code extends beyond basic language structures to include what seems like most APIs under the sun, as well as functions and APIs that only exist within your own internal codebase!

Now that I’ve familiarised myself with Copilot, I have some thoughts on my experience of the platform and where these sorts of tools might go in the future.

My experience with Copilot

For the last few weeks I’ve been using Copilot on-and-off for any code I’ve needed to write, particularly during my first real foray into machine learning: using Tensorflow for my work on satellite authentication (more on that to come, watch this space). My past experiences with Tensorflow (admittedly several years ago) had been a bit of a nightmare of drawing API calls from tutorials which turned out to be long deprecated, not fully understanding why certain calls happened the way they did, and general confusion at the state of the library.

I was determined that this time would be different – I had several more years of experience under my belt, a better understanding of machine learning (despite my general avoidance of the field if I can possibly help it), and colleagues’ code which I knew already worked and could use for inspiration. However, I was surprised to discover that one of the things that helped me the most was Copilot’s assistance. For a number of tasks such as structuring models, all I needed to do was describe my function and provide a type signature and Copilot would do the rest, cutting out a substantial amount of documentation searching. Almost all of the following function was generated simply by prompting Copilot:

# Create a tensorflow model with an input layer, 3 convolutional layers, and a dense layer.
def create_model(input_len, feature_count, output_features, group_size=1):
    input = layers.Input((input_len * group_size, feature_count))
    conv1 = layers.Conv1D(32, 3, padding='same', activation='relu')(input)
    conv2 = layers.Conv1D(64, 3, padding='same', activation='relu')(conv1)
    conv3 = layers.Conv1D(16, 5, padding='same', activation='relu')(conv2)
    pool = layers.MaxPooling1D(3, strides=1, padding='same')(conv3)
    flat = layers.Flatten()(pool)
    dense = layers.Dense(64, activation='relu')(flat)
    # One hot output
    output = layers.Dense(output_features, activation='softmax')(dense)
    model = models.Model(inputs=input, outputs=output)
    model.compile(
        optimizer='adam',
        loss='binary_crossentropy',
        metrics=['accuracy']
    )
    return model

Even in the areas where I considered myself more experienced, such as handling datasets, Copilot was able to help out. For instance, I needed to convert my output dataset to a one-hot representation, a task which would ordinarily send me in the direction of Google and Stack Overflow. This time, all I had to do was write a comment describing the functionality I wanted and Copilot took care of the rest:

# Convert input to onehot with shape (None, num_features)
def convert_to_onehot(y, num_features):
    return np.eye(num_features)[y.reshape(-1)]

I come across tasks like these all the time, where I need to do something that likely only takes a single function call, but I don’t need to do frequently enough for it to be worth memorising the relevant functions. It seems that Copilot shines in cases like these – it can easily pull these API calls from its knowledge banks, saving me a few clicks. These small time savings do add up, but more importantly it allows me to get on with work without leaving the context of my editor or breaking my current chain of thought.

I’ve noticed only a few areas where Copilot doesn’t seem to work so well. The first of these is that it has a tendency to suggest entire functions which already exist elsewhere in the code, if your code preceding it is too similar. This can be fixed without too much hassle by providing additional comments giving context to the new code. The second issue is that it often wants to only suggest a single solution (or several functionally identical options). This is usually not a huge issue, but the workflow of alternating between writing more detailed prompts and waiting for new suggestions can feel a bit tedious. Finally, and this is very much a personal preference, it often feels like the inline code suggestions appear too quickly. It can feel like I need to rush my typing if I want to finish a thought before Copilot jumps in with its own suggestions. This could be remedied by either selectively toggling Copilot as I work, or disabling inline suggestions and only using the dedicated suggestion panel, but it feels like an option to increase delay (or manually ask for inline suggestions rather than having them always provided) would make my experience a bit smoother.

What does our future hold?

It’s easy to look at a tool like Copilot and wonder if this will be the thing to do away with programmers once and for all. After all, if an AI can write code, what do we need humans for?

It’s important that we remember that this isn’t the first time something like this has happened – any time a new system, tool, process, or idea comes around that significantly changes how a certain task is done, there is significant change but skilled work remains a requirement throughout. We’ve seen this in the introduction of higher-level programming languages doing away with the necessity of writing assembly instructions (or C!) for the majority of tasks, we see it in the increase of automated software tasks in recent years, and we even see it when looking back at the industrial revolution replacing huge amounts of what was previously manual labour with machine-automated work. In each of these cases, the state of the art was dramatically shifted but the underlying tasks remained, and those who understood how the original systems worked in detail were in a good position to transfer to the new norms. It seems as though the future of programming will involve “AI pair programming”, in which tools like Copilot are used to aid the construction of new projects with a human directing operation at a higher level.

In this way, AI is used as a sort of “metaprogramming language”, turning a higher-level description of a system into actual code that implements the system’s functionality. This is not too dissimilar from the higher-level programming languages we have today, compiling or interpreting more human-readable code into fast machine-readable instructions. The main difference here is that the functionality of programming languages are explicitly defined and well-documented, whereas when using AI we have to rely on a more opaque model and hope that it gives us the desired results. These results will get better over time, but in its current state it’s vital to have a good amount of human oversight to make sure the generated code is free from bugs and security holes, and actually implements the requested functionality.

Of course, there is also the narrower question of precisely what it is that Copilot is replacing, and what niche it fills. In their current state, tools like these can’t engineer entire systems, but instead act as an extension to the minds of experienced engineers to allow them to build projects more seamlessly. Copilot doesn’t replace the actual engineering which goes into programming projects¹, but it certainly seems as though it can do away with a lot of now-unnecessary googling and checking Stack Overflow for answers to simple API problems.

This raises another question: is this necessarily something we want to do away with? Documentation and Stack Overflow act as sources of truth for programming and engineering problems, providing answers to millions of questions which have been asked before. If we instead start relying on AI models for our source of truth, it’s possible that less effort will be put into maintaining more easily-indexable knowledge bases, making it harder to find specific answers. While it is possible to get answers out of a model like Copilot through directed prompts, it can’t match an indexed database of answers submitted and voted on by experts, particularly when people are just setting out in the field and need guidance beyond anonymous lines of code suggested by a model. It’s unlikely Stack Overflow is going to go away, but it is important that we don’t allow concrete documentation to fall to the wayside as AI tools become more powerful.

Closing thoughts

Although Copilot is currently occupying a unique niche in an otherwise-uncontested market for AI programming asssistants, it is all but guaranteed that competitors will arrive in the not-too-distant future. Thanks to its access to GitHub’s codebase and OpenAI’s models, Copilot has been given a head-start as the first tool of its kind with enough training to be able to actually be useful, but this gap will be bridged by other organisations before too long.

There is also the question of copyright and ownership of code to be considered. GitHub grants ownership of any code generated by Copilot to the user, which does away with arguments of ownership with GitHub, but it is possible that the model could generate code identical to something that has already been written. What happens if your codebase includes Copilot-generated code which matches lines found in software licensed under copyleft licenses such as the GNU GPL (requiring derivative projects be similarly copyleft-licensed)? As a person with very little experience in law, I’m woefully under-equipped to answer these questions. Thankfully, this is already a topic under intense discussion by experts who know these topics like the back of their hand. I imagine we’ll see further interesting debate about this unfold as tools like Copilot become more commonplace.

For now, I’ll continue using Copilot on and off where it seems useful² and keep an eye on debate within the free software community as well as future advancements in this field. We’re in the middle of a really exciting time for these technologies, and it seems like things are just getting started – I can’t wait to see what comes next.

Once again, it seems that Dijkstra has beaten all of us to the point with his On the foolishness of “natural language programming”, an excellent short piece on the importance of formal well-defined languages in computing. ↩
I tried to write parts of this blog post using Copilot, but it didn’t turn out well – turns out it’s not so good at generating prose, and I found that it got in my way more often than it helped me. ↩