BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

5 AI Tools That Can Generate Code To Help Programmers

Following
This article is more than 2 years old.

One of the most recent advancements in natural language processing (NLP) is the emergence of large language models (LLMs) that are built using vast datasets with enormous amounts of data. There are several LLMs that are available, such as Google's BERT and OpenAI's GPT-2 and GPT-3. With these models, it is possible to generate everything from simple essays to actual financial models with these models.

AI startups including OpenAI, Hugging Face, Cohere, AI21 Labs are pushing the boundaries of LLM by training models with billions of parameters.

Here are five AI-based code generators based on the large language models that can generate high-quality code:

1. OpenAI Codex

OpenAI Codex is the model based on GPT-3 that powers GitHub Copilot - a tool from GitHub to generate code within mainstream development environments including VS Code, Neovim, JetBrains, and even in the cloud with GitHub Codespaces. It claims to write code in at least a dozen languages, including JavaScript, Go, Perl, PHP, Ruby, Swift and TypeScript, and even BASH. The model is trained on billions of lines of code available in the public domain, such as GitHub repositories.

OpenAI made the model available through a private beta to developers and platform companies to build tools and integration.

2. Tabnine

While Tabnine is not an end-to-end code generator, it puts the auto-completion feature of the integrated development environment (IDE) on steroids. Developed in Rust by Jacob Jackson when he was a student at the University of Waterloo, Tabnine has evolved into a fully-fledged, AI-based code completion tool.

Tabnine supports over 20 languages and 15 editors, including popular IDEs like VS Code, IntelliJ, Android Studio, and even Vim. It is available at the price of $432 per year for a team of 3 developers.

3. CodeT5

CodeT5 is an open source programming language model built by researchers at SalesForce. It is based on Google’s T5 (Text-to-Text Transfer Transformer) framework. In order to train CodeT5, the team sourced over 8.35 million instances of code, including user comments, from publicly accessible GitHub repositories. A majority of these datasets were derived from the CodeSearchNet dataset, which includes Ruby, JavaScript, Go, Python, PHP, C, and C#, in addition to two C and C# datasets from BigQuery.

CodeT5 can potentially bring three capabilities to software programming:

  • Text-to-code generation: generate code based on the natural language description
  • Code autocompletion: complete the whole function of code given the target function name
  • Code summarization: generate the summary of a function in natural language description

4. Polycoder

Polycoder is an open source alternative to OpenAI’s Codex. Developed by the researchers at Carnegie Mellon University, the model is based on OpenAI’s GPT-2, which is trained on a 249 GB codebase written in 12 programming languages. According to PolyCoder's authors, the program is capable of writing C with greater accuracy than any other model, including Codex.

While most of the code generators are not open source, Polycoder is one of the first open source code generation models.

5. Cogram

Cogram, a Y-Combinator, Berlin-based Startup, is a code generation tool aimed at data scientists and Python programmers using SQL queries and Jupyter Notebooks. Data scientists can write queries in the English language that the tool translates into complex SQL queries with joins and grouping. It supports SQLite, PostgreSQL, MySQL, and Amazon Redshift.

Python and Julia developers can integrate Cogram with Jupyter Notebooks to auto-generate code. The tool can generate contextual code for a specific task based on the comments. Data scientists can even generate visualizations based on mainstream Python modules such as Matplotlib, Plotly, or Seaborn.

Follow me on Twitter or LinkedInCheck out my website