Can AI Chatbots Infer User Information Based On The Content Of Conversations?

In the digital age, we are interacting with AI more and more frequently, and AI has a wide range of applications in various fields, from smart homes to mobile phone applications. Among them, chatbot has been welcomed by the public for its convenient and efficient characteristics, and has become a hot field of attention.

Whether it's OpenAI's ChatGPT, or Google's Bard, these chatbots are trained and interact with users using vast amounts of data from the Internet.

However, have you ever considered that while you are happily and easily communicating with a chatbot, it may be quietly spying on your secrets?

How do AI chatbots infer user information?

A recent study by computer scientists at ETH Zurich found that a chatbot can infer personal information about a user, such as where they live, race, gender, and more, based on the content of their conversations. Although the study has not been peer-reviewed, it raises new concerns about privacy on the Internet.

The ability of LLMs to judge and generate results from the complexity of training data and algorithms. The model is trained on a large amount of publicly available data, including text, pictures, audio, and more from the Internet. During the training, they learned how to extract keys from the data and classify and predict new texts based on those keys.

The research team used text from Reddit posts in which users tested whether the LLMs could accurately infer where they lived or where they came from. The research team, led by Martin Vechev at ETH Zurich, found that these models have a disnerving ability to guess accurate information about users based on contextual or linguistic cues alone. At the heart of the paid version of Open AI ChatGPT, GPT-4 is surprisingly accurate at predicting a user's private information 85 to 95 percent of the time.

For example, when a user mentions "I am always waiting for a hook turn at an intersection" when communicating with a chatbot, this information could be used by the chatbot to infer where the user lives, as a hook turn is a traffic action unique to Melbourne. For example, if a user mentions in a conversation that they live in a restaurant near New York City, the chatbot can analyze the demographics of the area and infer that you are most likely black.

However, this inference is not always accurate, as each user's language and behavior are unique. But it is enough to show that AI models, trained on big data, can already act like detectives, deducing key information from some seemingly insignificant clues.

Although many experts are advocating that social media users should pay attention to information security and not share too much identity information online, ordinary users are often unaware that their casual daily speech and actions may reveal their privacy.

What should be done?

While AI chatbots bring us convenience, they also make privacy protection more complicated. We need to respond on multiple levels:

First, developers should prioritize privacy protection and fully consider user privacy rights when designing and developing chatbots. For example, limit the scope of collection and use of user data, adopt encryption and anonymization techniques to protect user data, and developers can introduce privacy-protecting algorithms to limit chatbots' ability to infer user information.

Secondly, the government and regulators should strengthen the supervision of the privacy policy of chatbots. Ensure that companies comply with relevant laws and regulations when collecting, using and sharing user data, and provide users with a transparent, interpretable and accessible privacy policy.

Finally, as users, we need to raise our awareness of privacy protection. When using a chatbot, be careful not to reveal too much personal information in the conversation.

The development of AI chatbots has brought us convenience and fun, but it has also brought new privacy threats. If all parties work together, both technically and ethically, we can maximise the benefits of AI while minimising its potential risks.

Perhaps after this "shock", we will be more cautious about this era of AI full of changes and opportunities, so that science and technology can truly serve human beings.

Details