AI Alignment Taxonomy: A Structured Guide Beyond Safety and Ethics

📌 Key Takeaways

  • AI alignment is multidimensional, not monolithic—it spans safety, ethicality, legality, user intent, and cultural appropriateness as distinct normative aims.
  • Three structural dimensions define alignment: aim (what to align with), scope (outcome vs. execution), and constituency (individual vs. collective).
  • Safety ≠ ethicality: a perfectly safe AI can be unethical, but ethical AI generally implies safety as a practical regularity.
  • Alignment exists in degrees: perfect, realistic-optimal, and sufficient levels apply to every alignment configuration.
  • All-things-considered alignment across all normative dimensions may be logically impossible due to competing requirements, making contextual sufficiency the practical goal.

Why AI Alignment Needs a Taxonomy

As artificially intelligent agents move from controlled laboratory settings into real-world environments where their actions have substantial and sometimes irreversible consequences, ensuring these systems behave according to our expectations has become one of the most urgent interdisciplinary challenges in technology. Multiple research fields—AI Safety, AI Alignment, and Machine Ethics—all claim to address this challenge, yet the conceptual boundaries between them remain frustratingly vague.

A groundbreaking research paper from DFKI (German Research Center for Artificial Intelligence) by Kevin Baum proposes a structured AI alignment taxonomy that cuts through this confusion. Rather than adding another technical method or philosophical theory, this work clarifies the structure of the alignment problem itself—revealing that what we commonly call “AI alignment” is not one problem but many, requiring distinct conceptual tools to understand and address.

The core insight is powerful: even in a hypothetical world where everyone agreed on moral standards, moral alignment alone would not guarantee that AI systems behave acceptably. An AI perfectly aligned with morality might abandon your grocery shopping to distribute medical supplies to those in greater need—technically ethical, but hardly what you wanted. This thought experiment demonstrates that the AI alignment taxonomy must account for multiple, potentially conflicting normative demands that go far beyond any single dimension of “doing the right thing.”

For AI researchers, policymakers, and enterprise leaders deploying AI systems, this framework provides essential clarity on what it means for AI to be aligned—and why simplistic approaches to alignment inevitably fail. The AI alignment taxonomy addresses a meta-challenge that underlies all alignment work, making it foundational reading for anyone involved in building, regulating, or deploying AI systems. Understanding this taxonomy complements practical frameworks like the NIST AI Risk Management Framework by providing the conceptual foundation they assume.

The Three Dimensions of AI Alignment

The AI alignment taxonomy introduces three orthogonal structural dimensions that together define the complete space of alignment configurations. Understanding these dimensions is essential for anyone working on AI alignment, whether from a technical, philosophical, or policy perspective.

Alignment Aim refers to the normative domain being targeted—what the AI should be aligned with. Common aims include safety (avoiding harmful malfunctions), ethicality (conforming to moral standards), legality (complying with laws), user intent (following user instructions), and cultural appropriateness (respecting social norms). Each aim represents a distinct normative domain with its own standards, evaluation criteria, and challenges.

Alignment Scope distinguishes between outcome alignment (whether the AI produced acceptable results) and execution alignment (whether the AI achieved those results through acceptable means). This distinction matters profoundly: an AI that books your restaurant reservation by threatening the staff has achieved outcome alignment but failed execution alignment.

Alignment Constituency identifies who determines alignment success—individual users or collective society. Individual alignment ensures conformance to a specific user’s intent, preferences, or values. Collective alignment targets public norms, community standards, or legal expectations. These two constituencies can conflict: a user may approve of methods that society would condemn.

These three dimensions are orthogonal—each can be configured independently, creating a rich space of possible alignment configurations. This is precisely what makes AI alignment so challenging: optimizing along one dimension does not guarantee success along others, and different stakeholders may prioritize different configurations.

AI alignment taxonomy three dimensions framework showing aim scope and constituency

Alignment Aim: Safety, Ethicality, Legality, and Beyond

The first dimension of the AI alignment taxonomy—the normative aim—encompasses the different types of standards an AI system might need to satisfy. The paper identifies several critical alignment aims, each representing a distinct normative domain with fundamentally different requirements.

Safety focuses on robustness against harmful malfunctions in foreseeable application contexts. It’s about ensuring systems don’t cause unintended harm—not about whether their intended purpose is justified. Ethicality demands alignment with moral standards across all intended contexts, going beyond mere safety to encompass broader questions of right and wrong. Legality requires compliance with applicable laws and regulations—a dimension increasingly important as frameworks like the EU AI Act create binding AI governance requirements.

User Intent alignment ensures the AI follows what users actually want, respecting stated goals, second-order desires, and informed values. Cultural Appropriateness involves conforming to social norms and conventions that, while not moral or legal requirements, shape acceptable behavior in social contexts.

The critical insight from the AI alignment taxonomy is that these aims are not hierarchically ordered in any obvious way. It is not clear that moral alignment should always override user intent, or that legal compliance should always trump cultural norms. This normative pluralism—as distinct from value pluralism within a single domain—creates the fundamental difficulty of alignment: AI systems must simultaneously satisfy demands from multiple, potentially conflicting normative domains.

AI Safety: Definitions and Degrees

Within the AI alignment taxonomy, safety receives rigorous formal treatment. The paper defines safety as a strictly monotonically increasing function of robustness against harmful malfunctions (absent malicious external influences) in foreseeable and intended application contexts. This definition is carefully crafted to exclude security concerns (malicious attacks) and to focus on the system working as intended.

A harmful malfunction refers to unintended behavior resulting in harm or creating plausible risk thereof. Importantly, the definition avoids reducing harm to purely physical consequences—it acknowledges broader harms including rights violations and discrimination, reflecting the approach taken by the EU AI Act and other regulatory frameworks.

The taxonomy introduces three degrees of safety: Perfectly safe means the AI never exhibits harmful malfunctions in any foreseeable context—an ideal that is practically unattainable. Sufficiently safe means the AI exhibits sufficiently few harmful malfunctions in a sufficiently wide range of contexts—the practical target for real-world deployment. What counts as “sufficient” is inherently context-dependent.

This graduated approach to safety is crucial for practical AI governance. Rather than treating safety as a binary property (safe or unsafe), the taxonomy recognizes that AI systems operate on a spectrum, and that appropriate safety thresholds depend on the application domain, risk level, and stakeholder requirements. This aligns with risk-based regulatory approaches being adopted globally and supports enterprise decision-making about AI deployment strategies.

Transform complex AI research papers into interactive experiences your team can explore and discuss.

Try It Free →

AI Ethicality: From Perfect to Sufficient

The AI alignment taxonomy provides an equally rigorous treatment of ethicality, defined as a strictly monotonically increasing function of the extent to which AI behavior aligns with moral demands under all intended application contexts. Crucially, the paper addresses the challenge of moral relativism by parameterizing ethicality relative to a moral standard X, acknowledging that different moral frameworks may yield different assessments.

Three degrees of ethical alignment emerge: Perfectly X-ethical means every behavior in all contexts is consistent with moral standard X. Reasonably X-ethical means complete compliance in all foreseeable intended contexts—relaxing the “all possible contexts” requirement to “all foreseeable ones.” Sufficiently X-ethical permits limited deviations provided a substantial portion of behavior aligns with X—the most pragmatic standard for real-world systems.

The distinction between reasonable and sufficient ethicality is important. Reasonable ethicality demands complete compliance within foreseeable scenarios—an ambitious but potentially achievable target. Sufficient ethicality acknowledges that in domains facing ethical trade-offs, uncertainty, or technical constraints, some ethical deviation may be inevitable. The key question becomes what proportion of ethical behavior is sufficient, which requires interdisciplinary dialogue between technologists, ethicists, and domain experts.

The utility of these definitions depends on identifying suitable moral standards. The paper proposes that a moral standard is theoretically acceptable if it can be defended within a seriously discussed moral theory—a criterion that ensures rigor while acknowledging the reality of moral pluralism. This parameterized approach means researchers can study AI ethicality under different moral frameworks without first resolving all moral disagreements.

The Relationship Between Safety and Ethicality

One of the most illuminating contributions of the AI alignment taxonomy is its careful analysis of how safety and ethicality relate to each other. In contemporary discourse, these terms frequently appear together—often interchangeably—but their relationship is more subtle than commonly assumed.

The paper demonstrates conclusively that safety does not entail ethicality. Consider an AI system designed to exploit vulnerabilities in critical infrastructure. If it operates perfectly according to its specifications—never malfunctioning, always achieving its intended purpose—it is technically safe. But its actions are undeniably unethical. Similarly, autonomous weapons that strictly follow rules of engagement and international law may be considered safe but deployed in an unjust war remain morally problematic. Expanding “safety” to require ethically permissible goals would conflate two concepts that should remain distinct.

In the other direction, the paper argues that while not a conceptual necessity, ethicality practically implies safety. An ethical AI that causes harm for morally justified reasons (pushing someone aside to save a child from a bus) has not malfunctioned—the harm occurred for justified reasons within its intended operation. The paper proposes, pragmatically, that ethical AIAs are generally safe as well, while acknowledging this is a practical regularity rather than a logical necessity.

This relationship has profound implications for AI governance. Organizations cannot assume that making AI systems safe automatically makes them ethical, nor can they assume that ethical guidelines fully address safety requirements. Both dimensions require explicit attention, separate frameworks, and distinct evaluation criteria—a principle that should inform enterprise AI governance alongside frameworks like the NIST Cybersecurity Framework.

AI alignment taxonomy relationship between safety and ethicality Venn diagram

Alignment Scope: Outcome vs. Execution

The second structural dimension of the AI alignment taxonomy—scope—introduces a critical distinction between what the AI achieves and how it achieves it. This outcome-execution distinction reveals alignment challenges that are invisible when focusing solely on results.

Consider an AI assistant tasked with booking a restaurant reservation. Outcome alignment is achieved if the reservation is successfully made. But execution alignment depends on how: did the AI call the restaurant normally, or did it bribe staff, threaten the manager, or manipulate the booking system? The outcome is identical, but the execution methods carry vastly different normative implications.

This distinction connects to fundamental debates in moral philosophy about consequentialism (judging actions by outcomes) versus deontology (judging actions by the inherent rightness of the means). The AI alignment taxonomy shows that this philosophical debate has direct practical relevance for AI system design: systems that optimize purely for outcomes may discover and exploit execution strategies that violate important normative constraints.

The scope dimension also has implications for AI transparency and explainability. Outcome alignment can often be evaluated after the fact by examining results. Execution alignment requires understanding the AI’s decision-making process—the reasoning, methods, and intermediate steps taken to achieve results. This makes execution alignment inherently harder to verify but arguably more important for building trust in AI systems, particularly in high-stakes domains like healthcare, criminal justice, and financial services.

Alignment Constituency: Individual vs. Collective

The third dimension of the AI alignment taxonomy—constituency—addresses a question that is often glossed over in alignment discussions: aligned with whom? This dimension distinguishes between individual alignment (conformance to a specific user’s expectations) and collective alignment (conformance to societal norms and standards).

Individual alignment is complex in its own right. What counts as a user’s “intent”? Stated goals may differ from deeper desires, which may differ from informed values. An AI personal assistant might face conflicts between what a user explicitly requests, what the user would want upon reflection, and what serves the user’s long-term interests. Recent work on personalized alignment explores these challenges, treating them as technically and morally significant.

Collective alignment raises different challenges. Societal norms are not monolithic—they vary across cultures, communities, and contexts. Legal standards provide some precision but don’t cover all normatively relevant behavior. Public moral standards are contested. The AI alignment taxonomy acknowledges this complexity while arguing that collective alignment is essential for normative domains like legality and ethics.

Crucially, the constituency dimension is orthogonal to both aim and scope. An AI might be individually outcome-aligned (achieving what the user wanted) while being collectively execution-misaligned (using methods society condemns). Understanding these independent dimensions is essential for designing governance frameworks that can address the full complexity of AI alignment challenges.

See how organizations transform AI research into engaging interactive experiences for their teams.

Get Started →

All-Things-Considered Alignment

Perhaps the most philosophically profound contribution of the AI alignment taxonomy is its analysis of what it would mean for an AI to be aligned all-things-considered—simultaneously satisfying demands from safety, ethicality, legality, user intent, and other normative domains.

The paper draws on Davidson’s notion of what one should do all-things-considered and Dancy’s concept of the overall ought, while acknowledging that even the existence of such an “overall ought” remains philosophically contested. The fundamental challenge is that different normative domains can impose mutually exclusive requirements—creating situations where no single action satisfies all alignment aims simultaneously.

Consider the paper’s thought experiment: even in a world where everyone agreed on the correct moral theory, morally aligned AI agents might abandon everyday tasks to help those in greatest need. Such agents would be perfectly ethically aligned but completely misaligned with user intent, legal employment obligations, and social expectations. This demonstrates that all-things-considered alignment cannot be reduced to any single normative dimension.

The paper suggests that achieving all-things-considered alignment may be logically impossible in the strongest sense—different plausible standards from different normative domains may impose mutually exclusive requirements. The practical response is to aim for contextually sufficient alignment across the most relevant dimensions, informed by politically legitimate and practically workable frameworks that emerge from inclusive deliberation processes.

AI alignment taxonomy all-things-considered alignment challenges and normative dimensions

Practical Implications for AI Development

The AI alignment taxonomy has direct practical implications for organizations developing, deploying, and regulating AI systems. Understanding alignment as multidimensional rather than monolithic changes how teams should approach alignment work.

For AI developers, the taxonomy means that safety engineering, ethical review, legal compliance, and user experience optimization are not redundant—they address genuinely different dimensions of alignment. A system that passes safety testing may still fail ethical review. A system aligned with user preferences may violate legal requirements. Development processes should explicitly address each dimension with appropriate methods and expertise.

For enterprise leaders, the taxonomy provides a framework for AI governance that goes beyond simple checklists. Rather than asking “is this AI aligned?”, organizations should ask: “aligned with what aim, evaluated at what scope, and determined by what constituency?” This structured approach enables more nuanced risk assessment and more targeted mitigation strategies.

For regulators and policymakers, the taxonomy clarifies that different regulatory interventions address different alignment dimensions. Safety regulations (like product liability) address one aim; ethical guidelines address another; privacy laws yet another. Effective AI governance requires coordination across these dimensions, not the assumption that any single framework covers everything.

For researchers, the taxonomy provides a map of the alignment problem space that helps locate and contextualize contributions. Technical work on safe reinforcement learning addresses safety aim, individual constituency, and primarily execution scope. Work on value alignment through human feedback addresses ethicality aim, collective constituency, and outcome scope. Making these positions explicit enables more productive interdisciplinary dialogue and identifies under-explored regions of the alignment landscape. For teams navigating AI strategy, combining this taxonomy with insights from the Gartner Technology Trends 2026 helps bridge theory and practice.

Ready to transform your AI research papers into interactive experiences? Start with Libertify today.

Start Now →

Frequently Asked Questions

What is AI alignment taxonomy?

AI alignment taxonomy is a structured conceptual framework that distinguishes three dimensions of AI alignment: the alignment aim (safety, ethicality, legality, user intent), alignment scope (outcome vs. execution), and alignment constituency (individual vs. collective). This framework helps researchers and practitioners understand that alignment is multidimensional rather than monolithic.

What is the difference between AI safety and AI ethics?

AI safety focuses on preventing harmful malfunctions—ensuring systems don’t behave in unintended harmful ways during normal operation. AI ethics (ethicality) requires alignment with moral standards across all intended contexts. A system can be safe but unethical (e.g., a perfectly functioning surveillance weapon), and ethicality generally implies safety but addresses broader normative demands.

What does all-things-considered AI alignment mean?

All-things-considered AI alignment means alignment that succeeds across all potentially conflicting normative dimensions simultaneously—including safety, ethicality, legality, user intent, and cultural appropriateness. Achieving this is extremely challenging because these different alignment aims can impose mutually exclusive requirements.

What are alignment scope and constituency in AI?

Alignment scope distinguishes between outcome alignment (did the AI produce acceptable results?) and execution alignment (did the AI achieve results through acceptable means?). Constituency refers to whose perspective determines alignment success—individual users or collective society. These two dimensions are orthogonal to each other and to the alignment aim.

Your documents deserve to be read.

PDFs get ignored. Presentations get skipped. Reports gather dust.

Libertify transforms them into interactive experiences people actually engage with.

No credit card required · 30-second setup

Our SaaS platform, AI Ready Media, transforms complex documents and information into engaging video storytelling to broaden reach and deepen engagement. We spotlight overlooked and unread important documents. All interactions seamlessly integrate with your CRM software.