Introducing VishShield - Using AI to combat Voice Scams

Using modern day advances in conversational AI to tackle the menace of voice phishing & scams.

Nov 04, 2025

Voice Fishing (Vishing)

Voice scams are on rise and posing a serious threat today. Accoding to one estimate global losses from voice phishing (vishing) reached about US $800 million in 2024 [1].

With so many ways to make voice calls: Regular phone call (PSTN / GSM / cellular network), VoIP (Voice over Internet Protocol), App-based internet call (e.g., WhatsApp, Telegram, Signal, FaceTime, Messenger, Viber), Web-based call (WebRTC, browser-based), Enterprise calling systems (e.g., Zoom, Microsoft Teams, Google Meet, Slack Huddles) and SIP-based call (Session Initiation Protocol – used by many business telephony systems) and with AI now able to mimic real voices, vishing will only grow and become a major menace. Blocking a scammer’s phone number no longer works. We need smarter, more effective defenses.

What is Vishing? | Voice Phishing & Prevention Guide

BackStory

I know two people—an ex intern of mine and a relative—who have a rather unusual hobby. They love talking to scammers for hours. Just for fun, they play along without ever revealing any personal details. They keep the conversation going, circling around the topic, and making the scammer believe that a little more effort will get them the prize. It’s strange, hilarious, and oddly fascinating to watch.

My ex intern once convinced the scammer that there flaws in his system & approach and if the scammer is willing to pay, he will help him fix the gaps. The scammer agreed !

A while back, I had a wild idea: what if we could use conversational AI to do exactly what they do? Imagine an AI that can engage scammers endlessly, wasting their time and energy. And if thousands of such AI agents did this at scale, scammers would burn through their time, effort, and resources—making it much harder for them to reach real victims. This is strong deterrent. This is what led to VishShield.

Requirements

For the solution to work, it must ensure:

Th scammer should not be able distinguish if the victim is a real human or a Voice agent
The voice agent must never give the personal & confidential information
Despite (2), the voice agent must keep the scammer super engaged - keep both the immediate and long context, and also must sound gullible enough for scammer to genuinely believe that if he/she makes a little more effort they will succeed in obtaining to confidential information.
Despite (2 & 3), the voice agent must make the conversation go in circle, beating around the bush for long duration but never give personal/confidential information

Demo Calls

Below is two sample calls : here the victim (receiver) is mimicked used VishShield

For first phase we have this in English & Hindi Languages

Scammer - male, Victim - male, Language - English
0:00
-8:38
Scammer - female, Victim - male, Language - Hindi

0:00
-12:03

Solutioning

Its a well known fact that most voice agents or chat bots built in industry are modeled as Goal Oriented Dialogue Systems (GODS) [2]. This is because most voice agents or chat bots in industry are designed to fulfill a end objective (Goal). The purpose of the bot is to carry out a conversation in order to achieve the objective.

In most industrial use cases, the conversations tend to follow more or less a fixed pattern in order to fullfil the goal/objective. To automate the conversation (text or voice) the flow is represented by a flow chart (directed graph). The graph represents various states and arrows depict how these states are connected to each other modelling the flow. The states helps the system to remember what all info it has collected from the end user and what more it must collect in order to fulfill the goal. Read [2] for a deeper technical discussion on this.

Any guesses what makes VishShield slightly tricky? Think a little before reading further. In most flow graphs one state is connected to another state via arrow. This captures the fact that in the flow once the bot executes the subtask in state 1, it should move to executing the subtask in state 2. For example, a assistant on a travelling portal might have state block as “Understand the Intent of the user” - make new booking, modify existing booking, cancel existing booking etc. For cancelling an existing booking, the agent is requires booking reference number. The graph for this flow would have 2 nodes and edge like shown below:

Green nodes represent the positive path that must be executed

Here, green nodes represent the path that is ok to be executed and arrows hvaing no wieghts represents that its default probability of 1.0 (must be executed).

Now, implementing VishShiled - one of the hard contraints is “The system must never give PII/sensitive information”. How should one implement this? The graph & states in it tell the bot system what it must do. Here, we need to tell the system what it must never do. This is where we had to find a workaround.

Secondly, in this case the edge from should be redirected from node 1 to node 3 probabilistically. Why probabilistically? we do not want the system to give confidential information. so it must circumvent the ‘red’ node for this and talk something else. To make it sound natural, ‘this something else’ cannot be fixed (deterministic). In this case, it is best it be (probabilistic over a number of alternatives)

Circumventing the Red node (what must not be done)

Here, red node represents the node that should never be executed. Instead it will go to node 1 with probability 0.5, node 2 with probability 0.1, node 3 with probability 0.3 and node 4 with probability 0.2.

We will soon be releasing more call recordings and a reserach paper on this.

Anuj Gupta & Rachit Jindal

References

[1] “The rising tide of social engineering: Trends, impacts, and multi-layered mitigation strategies,” Rami Almatarneh & Mohammad Aljaidi & Ayoub Alsarhan & Sami Aziz Alshammari & Nayef H. Alshammari, 2025. International Journal of Innovative Research and Scientific Studies, Innovative Research Publishing, vol. 8(3), pages 115-129.

[2] “ChatBots”, Chapter 6, Title Practical Natural Language Processing: A Comprehensive Guide to Building Real-World NLP Systems, By Sowmya Vajjala, Anuj Gupta, Bodhisattwa Majumder, Harshit Surana, Publisher “O’Reilly Media, Inc.”, 2020

A guest post by

Rachit Jindal

Discussion about this post

Ready for more?