Here’s why GPT-4 outperforms GPT3.5, LLMs in code debugging

The rise in artificial intelligence (AI) popularity has likely led many to wonder if this is just the next tech craze that will be over in six months.

However, a recent benchmarking test conducted by CatId revealed just how far GPT-4 has come — suggesting that it could be a game-changer for the web3 ecosystem.

AI code debugging test

The data below showcases several tests across available open-source Large Language Models (LLMs) akin to OpenAI’s ChatGPT-3.5 and GPT-4. CatId tested the same sample of C+ code across each model and recorded false alarms for errors and the number of bugs identified.

LLaMa 65B (4-bit GPTQ) model: 1 false alarms in 15 good examples.  Detects 0 of 13 bugs.
Baize 30B (8-bit) model: 0 false alarms in 15 good examples.  Detects 1 of 13 bugs.
Galpaca 30B (8-bit) model: 0 false alarms in 15 good examples.  Detects 1 of 13 bugs.
Koala 13B (8-bit) model: 0 false alarms in 15 good examples.  Detects 0 of 13 bugs.
Vicuna 13B (8-bit) model: 2 false alarms in 15 good examples.  Detects 1 of 13 bugs.
Vicuna 7B (FP16) model: 1 false alarms in 15 good examples.  Detects 0 of 13 bugs.

GPT 3.5: 0 false alarms in 15 good examples.  Detects 7 of 13 bugs.
GPT 4: 0 false alarms in 15 good examples.  Detects 13 of 13 bugs.

The open-source LLMs only caught 3 out of 13 bugs across six models while identifying four false positives. Meanwhile, GPT-3.5 caught 7 of the 13, and OpenAi’s latest offering, GPT-4, detected all 13 out of 13 bugs with no false alarms.

The leap forward in bug detection could be game-changing for smart contract deployment in web3, aside from the countless other web2 sectors that will massively benefit. For example, web3 connects digital activity and property with financial instruments, giving it the moniker, ‘the Internet of Value.’ Therefore, it is vitally important that all code executed on the smart contracts that power web3 is free from all bugs and vulnerabilities. A single point of entry for a bad actor can lead to billions of dollars being lost in moments.

GPT-4 and AutoGPT

The impressive results from GPT-4 demonstrate that the current hype is warranted. Furthermore, the ability of AI to aid in ensuring the security and stability of the evolving web3 ecosystem is within reach.

Applications such as AutoGPT have spun up, allowing OpenAI to create other AI agents to delegate work tasks. It also uses Pinecone for vector indexing to gain access to both long and short-term memory storage, thus addressing token limitations of GPT-4. Several times last week, the app trended on Twitter globally from people spinning up their own AI agent armies worldwide.

Using AutoGPT as a benchmark, developing a similar or forked application to continuously monitor, detect bugs, and suggest resolutions to the code in upgradeable smart contracts may be possible. These edits could be manually approved by developers or even by a DAO, ensuring that there is a ‘human in the loop’ to authorize code deployment.

A similar workflow could also be created for deploying smart contracts through bug review and simulated transactions.

Reality check?

However, technical limitations would need to be resolved before AI-managed smart contracts can be deployed to production environments. While Catid’s results reveal the test’s scope is limited, focusing on a short piece of code where GPT-4 excels.

In the real world, applications contain multiple files of complex code with countless dependencies, which would quickly exceed the limitations of GPT-4. Unfortunately, this means that GPT-4’s performance in practical situations may not be as impressive as the test suggests.

Yet, it is now clear that the question is no longer whether a flawless AI code writer/debugger is feasible; the question is now what ethical, regulatory, and agency concerns arise. Furthermore, applications like AutoGPT are already reasonably close to being able to autonomously manage a codebase through the use of vectors and additional AI agents. The limitations lie mainly in the robustness and scalability of the application — which can get stuck in loops.

The game is changing

GPT-4 has only been out a month and already, there is an abundance of new public AI projects — like AutoGPT and Elon Musk’s X.AI— reimagining the future conversation on tech.

The crypto industry seems prime to leverage the power of models like GPT-4 as smart contracts offering an ideal use case to create genuinely autonomous and decentralized financial products.

How long will it take to see the first truly autonomous DAO with no humans in the loop?

The post Here’s why GPT-4 outperforms GPT3.5, LLMs in code debugging appeared first on CryptoSlate.

Source: https://cryptoslate.com/heres-why-gpt-4-outperforms-gpt3-5-llms-in-code-debugging/