DeepSeek, the AI that surprised the world with its productivity, is now scrutinized for storing unprotected data in publicly accessible databases. The data leak puts a question on all AI agents that quickly adopted the language model.
Just days after gaining popularity, DeepSeek was exposed as risky, leaking data logs into a publicly accessible, unprotected ClickHouse database. The exposed data contained chat logs and sensitive user information and could expose accounts and passwords. ClickHouse, an analytics database tool by Russian IT company Yandex, was used to carry the traffic without extra protection.
The leaked chat data could contain passwords and local files, though the researchers at cyber security firm Wiz have not made queries for potentially sensitive information. It is possible the database contained API secret keys. Overall, more than 1M lines of data were discovered by the researchers after tracking all connections open during a DeepSeek session.
From this point – validating the exposure was pretty straight forward, looking at ClickHouse API – we were able to access the HTTP API which allows straight forward querying the MySQL database
There was a significant exposure of data, especially from the log_stream table, over… pic.twitter.com/NhS2gyfBpJ
— Nagli (@galnagli) January 29, 2025
The leak was discovered after tracking the main chat interface, discovering up to 30 subdomains that carried out mostly harmless technical tasks for the AI tool. After the vulnerability testing was completed, some of the most contentious databases are now protected and inaccessible.
DeepSeek’s team reached out after the testing, stating that none of the discoveries were made public before repairing the security. However, hours before the database leak went public, another anonymous X user pointed to an issue with unsecured databases, which at that time was considered to be a bait or scam attempt.
At this point, it is unknown if any other threat actors have found anything of value in the data. However, the data gathering from AI chats also underscores the potential loss of privacy when using the tool. There are multiple endpoints for engaging with DeepSeek, from its official site and app to local hosting or any other wrapper used with the LLM and reasoning engine.
Will the DeepSeek leak affect AI agents?
Language models can be used with more privacy by running them locally, for which DeepSeek is well-suited. Each new AI agent personality has its own tools for wrapping the language model and presenting it to the users.
Venice.AI, one of the most prominent AI agents, claims to offer maximum confidentiality. However, users have discovered Venice.AI also sends plain text data to its central servers, though at least not using additional public-facing tools.
Contrary to their claim there is no Private Ai in https://t.co/bR5IfvDdvI
All Inference requests go to their Central Server in plain text, zero encryption or privacy.
All the received buffers are in plain text as well.
Verify it yourself by checking the Network Tab in chrome. pic.twitter.com/WDRxog6E5O
— Smit (@0xSmit) January 27, 2025
The approach of Venice.AI is still relatively more confidential compared to the DeepSeek data leak.
Additionally, Venice.AI reportedly answers some queries without the censorship usually imposed on the centralized DeepSeek site. Although it is still in its early stages, it intends to become a hub for building additional AI agents by providing the language model and resources.
The rush to create more agents using DeepSeek may be a vector that spreads other unknown risks from the language model and reasoning engine. Almost hourly, new agents are announced, claiming to use DeepSeek’s capabilities for better content at a lower cost.
DeepSeek clashes with ‘Made in USA’ crypto ethos
Building AI agents and tokenizing them while relying on DeepSeek is seen as an inherent risk for using a relatively new and untested language model. DeepSeek became the most downloaded app in the past few days, but the crypto community called for caution when using this LLM to build products.
The most extreme view sees the DeepSeek model as inherently risky, even when used as a self-hosted LLM.
Self hosting deepseek or using it on US servers like @perplexity will not fully protect you…
Eliminating their moderation checks (censorship) that happen after generation, or elimating data collection (by self hosting it on US servers) does not protect you from other LLM… pic.twitter.com/dxSdSQTSAA
— Ryan !! (@ryan_trat) January 30, 2025
The tokens linked to AI agents are still considered risky and closer to the hype for memes, currently not being considered serious assets to include in the ‘Made in USA’ crypto trend.
However, using DeepSeek may disqualify projects and raise skepticism about their data-gathering capabilities and potential for carrying malware. Agents built with DeepSeek may post flawed information or behave in erratic ways.
An X user posted:
“Agents built with DeepSeek are the perfect chaos agents: loops into infinity, feeds you junk data, and takes you (and their X account) down instead.”
Other general crypto supporters have warned against engaging with DeepSeek at this stage, especially through the official app. Even days before the data leak, cyber security analysts warned of possible spying features, as DeepSeek is linked to the Chinese Communist Party through its founder, Liang Wengfeng.
Following the ban of TikTok in the USA for fears of data collection, the rapid adoption of DeepSeek was considered a similar threat to data security.
Cryptopolitan Academy: FREE Web3 Resume Cheat Sheet – Download Now
Source: https://www.cryptopolitan.com/deepseek-springs-leak-ai-agent-chats-exposed/