Training AI Models on Core Values Improves Their Ethical Behavior, Study Finds
Research from the Anthropic Fellows Program reveals that language models better align with intended ethical values when first trained on explanations of those values before specific behaviors, enhancing reliability even in novel scenarios.
