AI Agents Are Here. What Now?
Introduction
The sudden, rapid advancement of LLM capabilities – such as writing fluent sentences and achieving increasingly high scores on benchmarks – has led AI developers and businesses alike to look towards what comes next: What game-changing technology is just on the horizon? One technology very recently taking off is “AI agents”, systems that can take actions in the digital world aligned with a deployer’s goals. Most of today’s AI agents are built by incorporating large language models (LLMs) into larger systems that can perform multiple functions. A fundamental idea underlying this new wave of technology is that computer programs no longer need to function as human-controlled tools, confined to specialized tasks: They can now combine multiple tasks without human input.
This transition marks a fundamental shift to systems capable of creating context-specific plans in non-deterministic environments. Many modern AI agents do not merely perform pre-defined actions, but are designed to analyze novel situations, develop relevant goals, and take previously undefined actions to achieve objectives.
In this piece, we briefly overview what AI agents are and detail the ethical values at play, documenting tradeoffs in AI agent benefits and risks. We then suggest paths forward to bring about a future where AI agents are as beneficial as possible for society. For an introduction to the technical aspects of agents, please see our recent developer blogpost. For an introduction to agents written before modern generative AI (that is largely still applicable today) please see Wooldridge and Jennings, 1995.
Our analysis reveals that risks to people increase with a system’s level of autonomy: The more control a user cedes, the more risks arise from the system. Particularly concerning are risks for the safety of individuals that arise from the same benefits that motivate AI agent development, such as freeing developers from having to predict all actions a system may take. Further compounding the issue, some safety harms open the door for other types of harm – such as harms of privacy and security – and inappropriate trust in unsafe systems enables a snowball effect of yet further harms. As such, we recommend that fully autonomous AI agents are not developed. For example, AI agents that can write and execute their own code, beyond constrained code options controlled by the developer, will be endowed with the ability to override all human control. In contrast, semi-autonomous AI agents may have benefits that outweigh risks, depending on the level of autonomy, the tasks available to the system, and the nature of individuals’ control over it. We now turn to these topics in-depth.
What is an AI agent?
Overview
There is no clear consensus on what an “AI agent” is, but a commonality across recently introduced AI agents is that they are “agentic”, that is, they act with some level of autonomy: given the specification of a goal, they can decompose it into subtasks and execute each without direct human intervention. For example, an ideal AI agent could respond to a high-level request such as “help me write better blogposts” by independently breaking this task down into retrieving writing on the web that is similar to your previous blog topics; creating documents with outlines for new blog posts; and providing initial writing within each. Recent work on AI agents has made possible software with a broader range of functionality and more flexibility in how it can be used than in the past, with recent systems deployed for everything from organizing meetings (example1, example2, example3, example4) to creating personalized social media posts (example), without explicit instructions on how to do so.
All recently introduced AI agents we’ve surveyed for this newsletter are built on machine learning models, and most specifically use large language models (LLMs) to drive their actions, which is a new, novel approach for computer software. Aside from being built on machine learning, today’s AI agents share similarities with those in the past, and in some cases realize previous theoretical ideas of what agents might be like: acting with autonomy, demonstrating (perceived) social ability, and appropriately balancing reactive and proactive actions.
These characteristics have gradations: Different AI agents have different levels of capabilities, and may work in isolation or in concert with other agents towards a goal. As such, AI agents may be said to be more or less autonomous (or agentic), and the extent to which something is an agent may be viewed on a continuous spectrum. This fluid notion of AI agent has led to recent confusions and misunderstandings about what AI agents are, which we hope to bring some clarity to here. A table detailing the varying levels of AI agent is provided below.
Agentic Level | Description | Who's in Control | What that's Called | Example Code |
---|---|---|---|---|
☆☆☆☆ | Model has no impact on program flow | 👤 The developer controls all possible functions a system can do and when they are done. | Simple processor | print_llm_output(llm_response) |
★☆☆☆ | Model determines basic control flow | 👤 The developer controls all possible functions a system can do; the system controls when to do each. | Router | if llm_decision(): path_a() else: path_b() |
★★☆☆ | Model determines how function is executed | 👤 💻 The developer controls all possible functions a system can do and when they are done; the system controls how they are done. | Tool call | run_function(llm_chosen_tool, llm_chosen_args) |
★★★☆ | Model controls iteration and program continuation | 💻 👤 The developer controls high-level functions a system can do; the system controls which to do, when, and how. | Multi-step agent | while llm_should_continue(): execute_next_step() |
★★★★ | Model writes and executes new code | 💻 The developer defines high-level functions a system can do; the system controls all possible functions and when they are done. | Fully autonomous agent | create_and_run_code(user_request) |
Table 1. One example of how systems using machine-learned models, such as LLMs, can be more or less agentic. Systems can also be combined in "multiagent systems," where one agent workflow triggers another, or multiple agents work collectively toward a goal.
Adapted from smolagent blog post, with changes tailored for this blog post.
From an ethics perspective, it is also useful to understand the continuum of autonomy in terms of how control is ceded from people and given to machines. The more autonomous the system, the more we cede human control.
Throughout this piece, we use some anthropomorphising language to describe AI agents, consistent with the language that is currently used to describe them. As was also noted in historic scholarship, describing AI agents using mentalistic language ordinarily applied to humans – such as having knowledge, beliefs, and intentions – can be an issue for appropriately informing users about system abilities. For better or worse, such language serves as an abstraction tool to gloss over more precise details of the technology. Understanding this is critical when grappling with the implications of what these systems are and the role they may play in peoples’ lives: The use of mentalistic language describing AI agents does not entail that these systems have a mind.
The Spectra of AI Agents
AI agents vary on a number of interrelated dimensions:
- Autonomy: Recent “agents” can take at least one step without user input. The term “agent” is currently used to describe everything from single-step prompt-and-response systems (citation) to multi-step customer support systems (example).
- Proactivity: Related to autonomy is proactivity, which refers to the amount of goal-directed behavior that a system can take without a user directly specifying the goal (citation). An example of a particularly “proactive” AI agent is a system that monitors your refrigerator to determine what food you are running out of, and then purchases what you need for you, without your knowledge. Smart thermostats are proactive AI agents that are being increasingly adopted in peoples’ homes, automatically adjusting temperature based on changes in the environment and patterns that they learn about their users’ behavior (example).
- Personification: An AI agent may be designed to be more or less like a specific person or group of people. Recent work in this area (example1, example2, example3) has focused on designing systems after the Big Five personality traits – Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism as a “psychological framework” (citation) for AI. At the end of this spectrum would be “digital twins”, (example non-agentic digital twin). There are currently not agentic digital twins that we are aware of. Why creating agentic digital twins is particularly problematic has recently been discussed by the ethics group at Salesforce, among others (example).
- Personalization: AI agents may use language or perform actions that are aligned to a user’s individual needs, for example, to make investment recommendations based on current market patterns and investments a user has made in the past.
- Tooling: AI agents also have varying amounts of additional resources and tools they have access to. For example, the initial wave of AI agents accessed search engines to answer queries, and further tooling has since been added to allow them to manipulate other tech products, like documents and spreadsheets (example1, example2).
- Versatility: Related to above is how diverse the actions that an agent can take are. This is a function of:
- Domain specificity: How many different domains an agent can operate in. For example, just email, versus email alongside online calendars and documents.
- Task specificity: How many different types of tasks the agent may perform. For example, scheduling a meeting by creating a calendar invite in participants’ calendars (example), versus additionally sending reminder emails about the meeting and providing a summary of what was said to all participants when it’s over (example).
- Modality specificity: How many different modalities that an agent can operate in – text, speech, video, images, forms, code. Some of the most recent AI agents are created to be highly multimodal (example)), and we predict that AI agent development will continue to increase multimodal functionality.
- Software specificity: How many different types of software the agent can interact with, and at what level of depth.
- Adaptibility: Similar to versatility is the extent to which a system can update its action sequences based on new information or changes in context. This is also described as being “dynamic” and “context-aware”.
- Action surfaces: The places where an agent can do things. Traditional chatbots are limited to a chat interface; chat-based agents may additionally be able to surf the web and access spreadsheets and documents (example), and may even be able to do such tasks via controlling items on your computer’s graphical interface, such as by moving around the mouse (example1, example2, example2). There have also been physical applications, such as using a model to power robots (example).
- Request formats: A common theme across AI agents is that a user should be able to input a request for a task to be completed, without specifying fine-grained details on how to achieve it. This can be realized with low-code solutions (example), with human language in text, or with voiced human language (example). AI agents whose requests can be provided in human language are a natural progression from recent successes with LLM-based chatbots: A chat-based “AI agent” goes further than a chatbot because it can operate outside of the chat application.
- Reactivity: This characteristic refers to how long it takes an AI agent to complete its action sequence: Mere moments, or a much longer span of time. A forerunner to this effect can be seen with modern chatbots. For example, ChatGPT responds in mere milliseconds, while Qwen QwQ takes several minutes, iterating through different steps labelled as “Reasoning”.
- Number: Systems can be single-agent or multi-agent, meeting needs of users by working together, in sequence, or in parallel.
Risks, Benefits, and Uses: A Values-Based Analysis
To examine AI agents through an ethical lens, we break down their risks and benefits according to the different values espoused in recent AI agent research and marketing. These are not exhaustive, and are in addition to the risks, harms, and benefits that have been documented for the technology that AI agents are based on – such as LLMs. We intend this section to contribute to the understanding of how to develop AI agents, providing information on the benefits and risks in different development priorities. These values might also inform evaluation protocols (such as red-teaming).
Value: Accuracy
- 🙂 Potential Benefits: By grounding in trusted data, agents can be more accurate than when operating from pure model output alone. This may be done via rule-based approaches or machine learning approaches such as RAG, and time is ripe for novel contributions for ensuring accuracy.
- 😟 Risks: The backbone of modern AI agents is generative AI, which does not distinguish between real and unreal, fact and fiction. For example, large language models are designed to construct text that looks like fluent language – meaning they often produce content that sounds right, but is very wrong. Applied within an AI agent, LLM output could lead to incorrect social media posts, investment decisions, meeting summaries, etc.
Value: Assistiveness
- 🙂 Potential Benefits: Agents are ideally assistive for user needs, supplementing (not supplanting) people. Ideally, they can help increase a user’s speed in completing tasks and their efficiency in finishing multiple tasks simultaneously. Assistive agents may also augment capabilities to minimize negative outcomes, such as an AI agent that helps a blind user navigate busy staircases. AI agents that are well-developed to be assistive could offer their users more freedom and opportunity, help to improve their users’ positive impact within their organizations, or help users to increase their reach on public platforms.
- 😟 Risks: When agents replace people – such as when AI agents are used instead of people at work – this can create job loss and economic impacts that drive a further divide between the people creating technology and the people who have provided data for the technology (often without consent). Further, assistiveness that is poorly designed could lead to harms from overreliance or inappropriate trust.
Value: Consistency
One idea discussed for AI agents is that they can help with consistency, as they can be less affected than people by their surrounding environment. This can be good or bad. We are not aware of rigorous work on the nature of AI agent consistency, although related work has shown that the LLMs many AI agents are based on is highly inconsistent (citation1, citation2). Measuring AI agent consistency will require the development of new evaluation protocols, especially in sensitive domains
- 🙂 Potential Benefits: AI agents are not “affected” by the world in a way that humans are, with inconsistencies caused by mood, hunger, sleep level, or biases in the perception of people (although AI agents perpetuate biases based on the human content they were trained on). Multiple companies have highlighted consistency as a key benefit of AI agents (example1, example2).
- 😟 Risks: The generative component of many AI agents introduces inherent variability in outcomes, even across similar situations. This might affect speed and efficiency, as people must uncover and address an AI agent’s inappropriate inconsistencies. Inconsistencies that go unnoticed may create safety issues. Consistency may also not always be desirable, as it can come in tension with equity. Maintaining consistency across different deployments and chains of actions will likely require an AI agent to record and compare its different interactions – which brings with it risks of surveillance and privacy.
Value: Efficiency
- 🙂 Potential Benefits: A selling point of AI agents is that they can help people to be more efficient – e.g., they’ll organize your documents for you, so you can focus on spending more time with your family or pursuing work you find rewarding.
- 😟 Risks: A potential drawback is that they may make people less efficient, as trying to identify and fix errors that agents introduce – which may be a complex cascade of issues due to agents’ ability to take multiple sequential steps – can be time-consuming, difficult, and stressful.
Value: Equity
AI agents may affect how equitable, fair, and inclusive situations are.
- 🙂 Potential Benefits: AI agents can potentially help “level the playing field”. For example, a meeting assistant might display how much time each person has had to speak. This could be used to promote more equal participation or highlight imbalances across gender or location (example).
- 😟 Risks: The machine learned models underlying modern AI agents are trained on human data; humans data can be inequitable, unfair, exclusionary and worse. Inequitable system outcomes may also emerge due to sample bias in data collection (for example, overrepresenting some countries).
Value: Humanlikeness
- 🙂 Potential Benefits: Systems capable of generating human-like behavior offer the opportunity to run simulations on how different subpopulations might respond to different stimuli. This can be particularly useful in situations where direct human experimentation might cause harm, or when a large volume of simulations help to better solve the experimental question at hand. For example, synthesizing human behavior could be used to predict dating compatibility, or forecast economic changes and political shifts. Another potential benefit currently being researched is that humanlikeness can be useful for ease of communication and even companionship (example).
- 😟 Risks: This benefit can be a double-edged sword: Humanlikeness can lead users to anthropomorphise the system, which may have negative psychological effects such as overreliance (citation), inappropriate trust, dependence, and emotional entanglement, leading to anti-social behavior or self-harm (example). There is concern that AI agent social interaction may contribute to loneliness, but see citation1, citation2 for nuances that may be gleaned from social media use. The phenomenon of uncanny valley adds another layer of complexity - as agents become more humanlike but fall short of perfect human simulation, they can trigger feelings of unease, revulsion, or cognitive dissonance in users.
Value: Interoperability
- 🙂 Potential Benefits: Systems that can operate with others can provide more flexibility and options in what an AI agent can do.
- 😟 Risks: However, this can compromise safety and security, as the more an agent is able to affect and be affected by systems outside of its more limited testing environment brings with it increased risk of malicious code and unintended problematic actions. For example, an agent that is connected to a bank account so that it can easily purchase items on behalf of someone would be in a position to drain the bank account. Because of this concern, tech companies have refrained from releasing AI agents that can make purchases autonomously (citation).
Value: Privacy
- 🙂 Potential Benefits: AI agents may offer some privacy in keeping transactions and tasks wholly confidential, aside from what is monitorable by the AI agent provider.
- 😟 Risks: For agents to work according to the user's expectations, the user may have to provide detailed personal information, such as where they are going, who they are meeting with, and what they are doing. For the agent to be able to act on behalf of the user in a personalized way, it may also have access to applications and information sources that can be used to extract further private information (for example, from contact lists, calendars, etc.). Users can easily give up control of their data - and private information about other people - for efficiency (and even more if there is trust in the agent); if there is a privacy breach, the interconnectivity of different content brought by the AI agent can make things worse. For example, an AI agent with access to phone conversations and social media posting could share highly intimate information to the world.
Value: Relevance
- 🙂 Potential Benefits: One motivation for creating systems that are personalized to individual users is to help ensure that their output is particularly relevant and coherent for the users.
- 😟 Risks: However, this personalization can amplify existing biases and create new ones: As systems adapt to individual users, they risk reinforcing and deepening existing prejudices, creating confirmation bias through selective information retrieval, and establishing echo-chambers that reify problematic viewpoints. The very mechanisms that make agents more relevant to users - their ability to learn from and adapt to user preferences - can inadvertently perpetuate and strengthen societal biases, making the challenge of balancing personalization with responsible AI development particularly difficult.
Value: Safety
- 🙂 Potential Benefits: Robotic AI agents may help save people from bodily harm, such as agents that are capable of diffusing bombs, removing poisons, or operating in manufacturing or industrial settings that are hazardous environments for humans.
- 😟 Risks: The unpredictable nature of agent actions means that seemingly safe individual operations could combine in potentially harmful ways, creating new risks that are difficult to prevent. (This is similar to Instrumental Convergence and the paperclip maximizer problem.) It can also be unclear whether an AI agent might design a process that overrides a given guardrail, or if the way a guardrail is specified inadvertently creates further problems. Therefore, the drive to make agents more capable and efficient - through broader system access, more sophisticated action chains, and reduced human oversight - conflicts with safety considerations. Further, access to broad interfaces (for example, GUIs, as discussed in “Action Surfaces” above) and humanlike behavior gives agents the ability to perform actions similar to a human user with their same level of control without setting off any warning systems – such as manipulating or deleting files, impersonating users on social media, or using stored credit card information to make purchases for whatever ads pop up. Still further safety risks emerge from AI agents’ ability to interact with multiple systems and the by-design lack of human oversight for each action they may take. AI agents may collectively create unsafe outcomes.
Value: Scientific Progress
There is currently debate about whether AI agents are a fundamental step forward in AI development at all, or a “rebranding” of technology that we have had for years – deep learning, heuristics, and pipeline systems. Re-introducing the term “agent” as an umbrella term for modern AI systems that share common traits of producing operations with minimal user input is a useful way to succinctly refer to recent AI applications. However, the term carries with it connotations of freedom and agency that suggest a more fundamental change in AI technology has occurred.
All of the listed values in this section are relevant for scientific progress; most of them are provided with details of potential benefits as well as risks.
Value: Security
- 🙂 Potential Benefits: Potential benefits are similar to those for Privacy.
- 😟 Risks: AI agents present serious security challenges due to their handling of often sensitive data (customer and user information) combined with their safety risks, such as ability to interact with multiple systems and the by-design lack of human oversight for each action they may take. They might share confidential information, even when their goals were set by users acting in good faith. Malicious actors could also potentially hijack or manipulate agents to gain unauthorized access to connected systems, steal sensitive information, or conduct automated attacks at scale. For instance, an agent with access to email systems could be exploited to share confidential data, or an agent integrated with home automation could be compromised to breach physical security.
Value: Speed
- On speed for users:
- 🙂 Potential Benefits: AI agents may help users to get more tasks done more quickly, acting as an additional helping hand for tasks that must be done.
- 😟 Risks: Yet they may also cause more work due to issues in their actions (see Efficiency).
- On speed of systems:
- As with most systems, getting a result quickly can come at the expense of other desirable properties (such as accuracy, quality, low cost, etc.). If history sheds light on what will happen next, it may be the case in the future that slower systems will provide better results overall.
Value: Sustainability
- 🙂 Potential Benefits: AI agents may theoretically help address issues relevant to climate change, such as forecasting the growth of wildfires or flooding in urban areas alongside the analysis of traffic patterns, then suggesting optimal routes and methods of transportation in real-time. A future self-driving AI agent may make such routing decisions directly, and could coordinate with other systems for relevant updates.
- 😟 Risks: Currently, the machine learning models AI agents are based on bring with them negative environmental impacts, such as carbon emissions (citation) and the usage of drinking water (citation). Bigger is not always better (example), and efficient hardware and low-carbon data centers can help reduce this.
Value: Trust
- 🙂 Potential Benefits: We are not aware of any benefits of AI agents relevant to trust. Systems should be constructed to be worthy of our trust, meaning that they are shown to be safe, secure, reliable, etc.
- 😟 Risks: Inappropriate trust leads people to be manipulated, and other risks detailed for Efficiency, Humanlikeness, and Truthfulness. A further risk stems from LLMs’ tendency to create false information (called “hallucinations” or “confabulations”): A system that is right the majority of the time is more likely to be inappropriately trusted when it’s wrong.
Value: Truthfulness
- 🙂 Potential Benefits: We are not aware of any benefits of AI agents relevant to truthfulness.
- 😟 Risks: The deep learning technology AI agents are based off of is well-known to be a source of false information (citation), which can take shape in forms such as as deepfakes or misinformation. AI agents can be used to further entrench such false information, such as by gathering up-to-date information and posting on several platforms. This means that AI agents can be used to provide a false sense of what’s true and what’s false, manipulate people’s beliefs, and widen the impact of non-consensual intimate content. False information propagated by AI agents, personalized for specific people, can also be used to scam them.
AI Agents at HF
At Hugging Face, we have begun introducing the ability for people to build and use AI agents in a number of ways, grounding in values as discussed above. This includes:
- Our recent release of smolagents, which provides tools, tutorials, guided tours, and conceptual guides;
- The AI Cookbook, which contains “recipes” for many kinds of agents:
- Build an agent with tool-calling superpowers 🦸 using Transformers Agents
- Agentic RAG: turbocharge your RAG with query reformulation and self-query! 🚀
- Create a Transformers Agent from any LLM inference provider
- Agent for text-to-SQL with automatic error correction
- Data analyst agent: get your data’s insights in the blink of an eye ✨
- Have several agents collaborate in a multi-agent hierarchy 🤖🤝🤖
- Multi-agent RAG System 🤖🤝🤖
- Our gradio agent user interface, to provide the front-end of agents you build;
- Our gradio code-writing agent, which allows you to try out code ideas in real-time in a coding playground.
- Jupyter Agent, an agent to write and execute code inside a Jupyter notebook.
Recommendations & What Comes Next
The current state of the art of AI “agents” point forward in several clear directions:
- Rigorous evaluation protocols for agents must be designed. An automated benchmark may be informed by the different dimensions of AI agents listed above. A sociotechnical evaluation may be informed by the values.
- Effects of AI agents must be better understood. Individual, organizational, economic, and environmental effects of AI agents ought to be tracked and analyzed in order to inform how they should be further developed (or not). This should include analyses of the effects of AI agents on well-being, social cohesion, job opportunity, access to resources, and contributions to climate change.
- Ripple effects must be better understood. As agents deployed by one user interact with other agents from other users, and they perform actions based on one another’s outputs, it is currently unclear how their ability to meet the user’s goals will be affected.
- Transparency and disclosure must be improved. In order to achieve the positive effects of the values listed above, and minimize their negative effects, it needs to be clear to people when they are talking to an agent and how autonomous it is. Clear disclosure of AI agent interactions requires more than simple notifications – it demands an approach combining technical, design, and psychological considerations. Even when users are explicitly aware they're interacting with an AI agent, they may still experience anthropomorphization or develop unwarranted trust. This challenge calls for transparency mechanisms that operate on multiple levels: clear visual and interface cues that persist throughout interactions, carefully crafted conversation patterns that regularly reinforce the agent's artificial nature, and honest disclosure of the agent's capabilities and limitations in context.
- Open source can make a positive difference. The open source movement could serve as a counterbalance to the concentration of AI agent development in the hands of a few powerful organizations. Consistent with the broader discussion on the values of openness, by democratizing access to agent architectures and evaluation protocols, open initiatives can enable broader participation in shaping how these systems are developed and deployed. This collaborative approach not only accelerates scientific progress through collective improvement but also helps establish community-driven standards for safety and trust. When agent development happens in the open, it becomes harder for any single entity to compromise on relevant and important values like privacy and truthfulness for commercial gain. The transparency inherent in open development also creates natural accountability, as the community can verify agent behavior and ensure that development remains aligned with public interest rather than narrow corporate objectives. This openness is particularly important as agents become more sophisticated and their societal impact grows.
- Developers are likely to create more agentic “base models”. This is clearly foreseeable based on current trends and research patterns, not a recommendation we are providing relevant to ethics. Current agent technology utilizes a collection of recent and older techniques in computer science – near-term future research will likely attempt to train agent models as one monolithic general model, a kind of multimodal model++: Trained to perform actions jointly with learning to model text, images, etc.
Acknowledgements
We thank Bruna Trevelin, Orion Penner, and Aymeric Roucher for contributions to this piece.