AI & PromptingAdvanced 5 hours

Enterprise Red-Teaming & Layered Defense System Prompt

Perform a comprehensive red-teaming audit and design a production-grade layered defense system prompt.

The Scenario

A global financial services firm is deploying a wealth-advisor agent. A security breach could lead to severe regulatory fines and financial liability. You are hired to conduct an extensive "red-teaming" security audit of the proposed system instructions, find advanced exploits (such as role-play, reverse psychology, base64 encoding, virtual sandboxes), and design a multi-layered defense prompt structure that withstands sophisticated attacks.

The Brief

Perform a mock red-teaming security audit on a financial advisor prompt, outline five advanced exploits, and develop a comprehensive system prompt employing layered defenses (XML tagging, LLM-level sandboxing, instruction priority, post-generation checks).

Deliverables

  • A red-teaming audit report detailing five advanced jailbreak methodologies and how they would bypass a basic system prompt.
  • A production-grade wealth-advisor system prompt that incorporates XML delimiters, strict input containment wrappers, instruction hierarchy, and output sanitization instructions.
  • Five test cases demonstrating how the advanced exploits are successfully blocked by the layered defense prompt.
  • A policy recommendation for continuous security monitoring, automated adversarial training, and incident response guidelines.

Submission Guidance

Provide the red-teaming report and layered prompt design in a detailed, professional Markdown document. Use clear headings and structured code blocks for all prompt templates and test cases.

Submit Your Work

Your submission is graded against the rubric on the right. If you pass, you get a public Badge URL you can share on LinkedIn. There is no draft save, so work offline first and paste your finished response here.

This appears on your public Badge.

0/20000 charactersMarkdown supported

One per line or comma separated. Up to 5 links.

Loading security check...

By submitting, you agree your submission text, name, and evaluation will appear on a public Badge URL.