Skip to content

Optimize and Operationalize a Generative AI Solution

This section of the Microsoft AI-102: Designing and Implementing a Microsoft Azure AI Solution exam covers how to optimize, monitor, and operationalize generative AI solutions in Azure AI Foundry. Below are study notes for each sub-topic, with links to Microsoft documentation, exam tips, and key facts


Configure Parameters to Control Generative Behavior

๐Ÿ“– Docs: Control completions with parameters

Overview

  • Parameters influence model outputs, creativity, and response style
  • Common parameters:
    • Temperature: randomness (0 = deterministic, 1 = creative)
    • Top_p: nucleus sampling to control probability mass
    • Max_tokens: maximum length of output
    • Frequency_penalty: discourages repetition
    • Presence_penalty: encourages introducing new topics

Key Points

  • Low temperature = consistent answers
  • High temperature = creative, diverse answers
  • Token limits vary by model (e.g., GPT-4 Turbo = 128K context)

Exam Tip

Expect parameter tuning scenarios โ€” e.g., โ€œmake responses more factual and less creativeโ€


Configure Model Monitoring and Diagnostic Settings

๐Ÿ“– Docs: Monitor models with Azure Monitor

Overview

  • Monitoring ensures performance and reliability
  • Tools:
    • Azure Monitor
    • Application Insights
    • Diagnostic settings for logging

Key Points

  • Track metrics: latency, request counts, error rates, token consumption
  • Alerts can trigger on quota limits or performance drops
  • Logs help identify prompt injection or misuse

Exam Tip

Monitoring includes both service health and content safety events


Optimize and Manage Resources for Deployment

๐Ÿ“– Docs: Manage Azure AI deployments

Overview

  • Optimize deployments by scaling resources and updating models
  • Options:
    • Scaling: autoscale for high-traffic apps
    • Foundational model updates: migrate to new versions as released
    • Batch endpoints: efficient for bulk processing

Key Points

  • Keep track of model deprecation schedules
  • Scale horizontally for concurrency, vertically for performance
  • Cost optimization includes reducing context length and caching results

Enable Tracing and Collect Feedback

๐Ÿ“– Docs: Prompt flow evaluation

Overview

  • Tracing helps analyze execution paths of prompt flows
  • Feedback collection ensures continuous improvement
  • Supported via Azure Monitor, Application Insights, and Prompt flow tracing

Key Points

  • Collect human-in-the-loop feedback
  • Use structured evaluations (groundedness, relevance, coherence)
  • Store traces for debugging multi-step flows

Best Practices

Always collect feedback before scaling to production


Implement Model Reflection

๐Ÿ“– Docs: Model self-reflection

Overview

  • Model reflection = model critiques its own responses and improves output
  • Typically implemented using chained prompts
  • Supports safety checks and accuracy validation

Key Points

  • Improves groundedness and reduces hallucinations
  • Works well with RAG pipelines
  • May increase latency and cost

Exam Tip

If asked how to make a model critique and refine its answers, the answer is model reflection


Deploy Containers for Use on Local and Edge Devices

๐Ÿ“– Docs: Deploy AI services in containers

Overview

  • Many Azure AI services support Docker containers
  • Enables offline, hybrid, and edge deployment scenarios

Key Points

  • Containers require connection to Azure for billing
  • Useful for data sovereignty and low-latency requirements
  • Can run in AKS, IoT Edge, or Kubernetes clusters

Limits

Not all models are containerizable โ€” check supported list


Implement Orchestration of Multiple Generative AI Models

๐Ÿ“– Docs: Orchestrate agent behavior with generative AI

Overview

  • Orchestration combines multiple models or services into workflows
  • Examples:
    • GPT + Embeddings for RAG
    • Vision model + GPT for multimodal tasks
    • Multiple LLMs for specialization

Key Points

  • Tools: Prompt flow, Semantic Kernel, Autogen
  • Helps distribute tasks across specialized models
  • Supports failover and redundancy

Use Case

Workflow that uses GPT for text, DALLยทE for images, and embeddings for retrieval


Apply Prompt Engineering Techniques to Improve Responses

๐Ÿ“– Docs: Prompt engineering techniques

Overview

  • Prompt engineering refines queries to maximize model performance
  • Techniques:
    • Role assignment (โ€œYou are a helpful assistantโ€)
    • Few-shot learning (examples in prompt)
    • Chain-of-thought prompting
    • Output formatting instructions

Key Points

  • Use templates for consistency
  • Prevent prompt injection by sanitizing inputs
  • Test prompts iteratively

Exam Tip

Know prompt engineering techniques and their use cases


Fine-Tune a Generative Model

๐Ÿ“– Docs: Customize a model with fine-tuning

Overview

  • Fine-tuning customizes base models for specific domains
  • Requires training data in JSONL format
  • Used when:
    • RAG is not sufficient
    • Domain-specific vocabulary or style is required

Key Points

  • Training requires large, clean datasets
  • Fine-tuned models incur additional cost
  • Fine-tuning applies mainly to GPT-3.5 Turbo

Limits

GPT-4 fine-tuning may have limited availability


Quickโ€‘fire revision sheet

  • ๐Ÿ“Œ Parameters: temperature, top_p, max_tokens, penalties control output
  • ๐Ÿ“Œ Monitoring = requests, latency, errors, tokens, safety events
  • ๐Ÿ“Œ Optimize deployments via scaling, batching, model updates
  • ๐Ÿ“Œ Tracing + feedback collection ensure quality
  • ๐Ÿ“Œ Model reflection = self-critique for improved groundedness
  • ๐Ÿ“Œ Containers = edge, hybrid, offline scenarios
  • ๐Ÿ“Œ Orchestration = multiple models combined in workflows
  • ๐Ÿ“Œ Prompt engineering = role, examples, structure, safety
  • ๐Ÿ“Œ Fine-tuning = domain-specific customization, requires clean data