Open AI Codex Challenge Seen By the Participants

On the 12th of August, Open AI hosted a hackathon for all those interested in trying out Codex. Codex is a new generation of their GPT-3 algorithm that can translate plain English commands into code.

We at Serokell thought it would be interesting to try this out: right now free access to the beta is accessible only to a small group of people. One of our teammates got access to it after being on the waiting list for over a year.

What was the format?  

The point of the challenge was to solve 5 small tasks that were the same for everyone to test the system. To be fair, they were quite simple – maybe because Codex can’t solve complex problems. To give an example, one of the tasks was to use pandas’ functionality to calculate the number of days between two dates in a string. There was a simple task dedicated to algorithms as well: for a binary tree it was needed to restore the original message. 

Our main motivation was to see what Codex can do, how well it understands tasks, and monitor the logic of its decisions. Spoiler alert: not everything was as great and smooth as during the Open AI demo!

What was the problem?

The first problem was connected with server lagging – maybe the company wasn’t ready for such a huge number of participants (a couple of thousands). Because of that, we wasted a lot of time trying to reconnect. Interestingly enough: the leaderboard had a weird logic. The solutions were rated by the time of completion, not by the time needed to solve the problem. So people who were late for the beginning of the challenge were apriori low on the scoreboard. 

To us, it seemed that Codex is not a very smart coder. First of all, it made quite many syntax mistakes. It can easily forget the closing bracket or introduce extra columns. Because of that, the code becomes incorrect. It really takes time and effort to catch these errors!

Secondly, it seems that Codex doesn’t know how to work with data types. You as a programmer have to be very careful, or the model will mess things up. 

For instance, in the previous example of a task that simply is counting days between dates, Codex messed up the sequence of actions for us. It forgot to convert string to date and tried to perform an operation with it as it is. 

Finally, the solutions that Codex proposes are not optimal. It’s a huge part of being a good programmer: to understand the task, break it down into realizable pieces and implement the most optimal solution in terms of execution time and memory. Codex does come up with some solutions but they’re far from being the most optimal ones. For example, when working with the tree, it wrote a while cycle instead of a for cycle and added extra conditions that weren’t in the initial task. Everyone knows that writing while loop instead of for loop is kind of a big no-no. 

Conclusion

All that said, it’s worth saying that Codex can’t be used as a no-code alternative to real programming. It’s unclear who Open AI is targeting with this solution. Non-programmers can’t use it, for the reasons mentioned above. Programmers would prefer to write code from scratch than sit and edit brackets in the Codex code. 

Before it was said that Codex will be behind the Copilot, the initiative was realized together with GitHub. But firstly, it doesn’t work as an autocomplete tool like PyCharm. The majority of the team doesn’t write code in GitHub and uses it simply for project management. So it’s unclear what Open AI is going to do with Codex.

Anyhow, it’s an interesting initiative that has the potential to greatly improve the more people use it. So perhaps in the future, it will become a super user-friendly alternative to no-code solutions for non-programmers.

Source Prolead brokers usa