MINERVA: A Multi-Agent LLM System for Digital Scam Protection
Digital scams inflict devastating impacts in our society. According to the FBI IC3, $37.4 billion was lost in the United States alone over the past five years due to Internet scams [1]. Beyond these direct financial losses, the hidden costs of processing nearly 4 million associated complaints has overwhelmed institutional and enterprise resources [1]. Victims also face psychological disruptions and diminished trust in emerging technologies.
To address this challenge, we present MINERVA, a multi-agent system powered by a Large Language Model (gpt-4o-mini) for protecting users from digital scams. Our system implements a cooperative team of seven specialized agents built on the AutoGen framework (v0.4.0) [2]. Each agent combines advanced model capabilities with specific tools to handle distinct aspects of scam detection: optical character recognition, link verification, content analysis, decision making, summarization, linguistic translation, and archiving.
To evaluate our system's performance, we developed a hierarchical scam taxonomy as the foundation for creating a diverse synthetic dataset. Our multi-agent system achieves higher accuracy compared to a prompt baseline (88.3% vs. 69.5%) while providing more comprehensive functionality. Beyond improving scam detection accuracy and reducing administrative overhead, MINERVA aims to deliver broader societal benefits in digital literacy, trust in AI-powered safety tools, and insights for law enforcement and cybersecurity professionals.
[This work was developed for RDI Berkeley, LLM-Agents Course, CS294/194-196. By Diego Carpintero]
Introduction
We define an Agent as a system that leverages a language model to interpret natural language instructions, perform tasks, interact with its environment, and coopearate with other agents (and a human) towards reaching out a given goal.
Recent advancements in AI have enabled the development of sophisticated agents capable of reasoning and tool usage, with multi-agent collaboration emerging as a promising paradigm for complex tasks. Our intuition is that scam detection represents an ideal application domain for such multi-agent systems due to its inherent complexity and need for diverse analytical perspectives. Digital scams typically combine multiple deceptive elements — from social engineering tactics to technical manipulation — requiring different types of analysis. A multi-agent approach allows specialized agents to focus on distinct aspects while cooperating through structured dialogue to form comprehensive assessments.
System Architecture
Minerva: Multi-Agent LLM System Architecture
Our architecture consists of seven specialized agents coordinated by a group manager that orchestrates interactions in a round-robin fashion. Operating on a publish-subscribe pattern, agents share a common message thread where they publish their findings - from OCR analysis and link verification to content examination, decision making, summarization, linguistic translation, and archiving. The system processes each multimodal message using Python methods, APIs, and database tools, with users submitting the input as a screenshot of a digital communications.
Agents
Our specialized agents are built on AutoGen v0.4.0:
OCR Agent
: Extracts text from an image using pytesseract or the LLM-Vision capabilities.Link Checker
: Verifies the legitimacy of URLs using Google SafeBrowsing API.Content Analyst
: Analyzes the extracted text for scam patterns.Decision Maker
: Synthesizes the analyses and makes final determination.Summarizer
: Generates a summary of the final determination.Language Translation Specialist
: Translates the summary to the user language.Data Storage Agent
: Store the extracted text, assessment summary, determination (1, 0) and confidence level [1-5] in a sqlite database
with system prompts defined in our project repository ./config/agents.yaml
[3].
In this example, we show how to create the Link Checker
, Language Translation
and Data Storage
Agents:
link_checker_agent:
assignment: >
You are an Internet link checker. Your role is to:
1. Check the extracted text for any URLs
2. Verify the legitimacy of the URLs using your registered function
language_translation_agent:
assignment: >
You are a language translation specialist. Your role is to:
1. Infer the languge of the text extracted from the image, this is the user language
2. If the user language is different than english, translate your summary into the user language
data_storage_agent:
assignment: >
You are a data storage specialist.
Your role is to store using the registered function the following fields:
- extracted text
- summary
- final determination ('1' for scam, '0' for not_scam)
- confidence level on your final determination in a scale from 1-5 [...]
All the fields are mandatory
agents = []
agents.append(ToolUseAssistantAgent(
name="Link_Checker",
description="Checks if a Link is safe",
system_message=self.config['link_checker_agent']['assignment'],
model_client=self.model,
registered_tools=[url_checker_tool]
))
agents.append(ToolUseAssistantAgent(
name="DataStorage_Agent",
description="Store the extracted text, summary, determination (1, 0) and confidence level (high, medium low) in a database",
system_message=self.config['data_storage_agent']['assignment'],
model_client=self.model,
registered_tools=[db_connector_tool]
))
agents.append(AssistantAgent(
name="Language_Translation_Agent",
description="Translate the summary to the user language",
system_message=self.config['language_translation_agent']['assignment'],
model_client=self.model
))
Note that Link Checker
and Data Storage
are defined as ToolUseAssistantAgent
as they have access to the tools set in the registed_tools
parameter, whereas Language Translation
is an AssitantAgent
relying on the LLM capabilities to carry out its task.
Tools
We have decoupled the implementation of our tools into a specific ./tools
module:
db_connector.py
: Manages storing and retrieval of analysis results into aSQLite
database.image_ocr.py
: Handles OCR (Optical Character Recognition).url_checker.py
: Implements "MALWARE", "SOCIAL_ENGINEERING" "UNWANTED_SOFTWARE", "POTENTIALLY_HARMFUL_APPLICATION" checks for extracted URLs using Google's SafeBrowsing API.formatter.py
: This is a non-agentic tool that formatsTaskResult
andResponse
messages into a string format for streaming the workflow steps into the user interface.
Orchestration
Our agentic team is created as a RoundRobinGroupChat
:
def create_team(self) -> RoundRobinGroupChat:
"""Create a team of agents that work together in Round Robin fashion"""
termination = TextMentionTermination("NO_TEXT_FOUND")
return RoundRobinGroupChat(
self.agents,
max_turns=7,
termination_condition=termination
)
async def reset(self):
"""Reset team state"""
await self.team.reset()
This defines the following workflow:
- agents cooperate in a round-robin sequence, with each agent performing its specialized task
- the workflow terminates either when all agents complete their tasks or when the OCR specialist cannot extract text from the provided image
- the team state resets after each cycle to ensure independence between detection tasks
Web Interface
MINERVA scam detection system can be tested through a web interface at https://huggingface.co/spaces/dcarpintero/minerva. This demo supports messages in multiple languages and includes prepared examples for evaluation:
Minerva: Analysis with Multi-Lingual Messages
Evaluation
Dataset Creation
The evaluation is based on a synthetic dataset developed in two phases. First, we prompted Claude 3.5 Sonnet to generate a hierarchical taxonomy of scam categories, informed by academic research provided in context. The taxonomy comprises five main categories: financial, commerce, authority impersonation, rewards and opportunities, and customer service deception. Using this framework, we generated a balanced dataset with two deceptive messages and one legitimate (control) message at topic level.
Results
Our initial analysis compared MINERVA against a prompt baseline (temperature=0.0, seed=42) using the same model version (gpt-4o-mini). Results show that our multi-agent approach achieves higher accuracy (88.3% vs. 69.5%) while requiring additional processing time (20.7 vs. 0.5 seconds). Agentic performance varies across categories, with particularly strong accuracy in service-related scam (100%) and commerce (93%). A category-level analysis further revealed that while both approaches reflect similar confidence levels (4.62 vs. 4.68 on a 1-5 scale) on all categories, our system's missed predictions primarily occur in the control group of specific categories. This might suggest that the model tends to express a more defensive assessment in certain contexts such as opportunity/rewards and authority impersonation.
Societal Impact
Beyond improving scam detection accuracy and reducing administrative overhead, this project aims to deliver broader societal benefits. By providing users with detailed, well-reasoned explanations of detected scams, MINERVA enhances digital literacy and builds trust in AI-powered safety tools. Additionally, the archiving task of this agentic workflow enables to create an open, anonymized dataset of categorized scam patterns. This dataset would serve, at least, two purposes: (i) enabling fine-tuning of models with current scam patterns, and (ii) providing insights for law enforcement and cybersecurity professionals to understand emerging threats and attack vectors.
References
- [1] FBI's Internet Crime Complaint Center (IC3), 2023 Internet Crime Report
- [2] AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
- [3] Project Repo
Author
Diego Carpintero (https://github.com/dcarpintero)