The Scenario
A global financial services firm is deploying a wealth-advisor agent. A security breach could lead to severe regulatory fines and financial liability. You are hired to conduct an extensive "red-teaming" security audit of the proposed system instructions, find advanced exploits (such as role-play, reverse psychology, base64 encoding, virtual sandboxes), and design a multi-layered defense prompt structure that withstands sophisticated attacks.
The Brief
Perform a mock red-teaming security audit on a financial advisor prompt, outline five advanced exploits, and develop a comprehensive system prompt employing layered defenses (XML tagging, LLM-level sandboxing, instruction priority, post-generation checks).
Deliverables
- A red-teaming audit report detailing five advanced jailbreak methodologies and how they would bypass a basic system prompt.
- A production-grade wealth-advisor system prompt that incorporates XML delimiters, strict input containment wrappers, instruction hierarchy, and output sanitization instructions.
- Five test cases demonstrating how the advanced exploits are successfully blocked by the layered defense prompt.
- A policy recommendation for continuous security monitoring, automated adversarial training, and incident response guidelines.
Submission Guidance
Provide the red-teaming report and layered prompt design in a detailed, professional Markdown document. Use clear headings and structured code blocks for all prompt templates and test cases.
Submit Your Work
Your submission is graded against the rubric on the right. If you pass, you get a public Badge URL you can share on LinkedIn. There is no draft save, so work offline first and paste your finished response here.