Amazon Practice Questions, Discussions & Exam Topics by our Authors
A company wants to develop an ML model by using tabular data from its customers. The data contains meaningful ordered features with sensitive information that should not be discarded. An ML engineer must ensure that the sensitive data ...
Let's analyze each option carefully based on the key requirements:
Key Requirements:
Data contains meaningful ordered features that should not be discarded.
Sensitive data must be masked (not just removed or randomized in a way that destroys order or meaning).
Masking should be done before another team starts model building.
The solution should be practical and manageable within AWS services.
---
Option A: Use Amazon Macie to categorize the sensitive data.
What it does: Amazon Macie is a security service that uses machine learning to automatically discover, classify, and protect sensitive data in AWS.
Pros: It helps identify and categorize sensitive data.
Cons: Macie does not mask or transform data itself. It only discovers and classifies sensitive data.
Use case: Good for identifying sensitive data but not for masking or modifying it.
Verdict: This option does not meet the requirement to mask the data.
---
Option B: Prepare the data by using AWS Glue DataBrew.
What it does: AWS Glue DataBrew is a visual data preparation tool that enables cleaning and normalizing data without writing code.
Pros: DataBrew supports data masking and transformations on tabular data, including masking, obfuscation, and preserving data order and types.
Can handle sensitive data with custom transformations and masking recipes.
Easy to use for non-coders or ETL pipelines.
Use case: Best for data cleaning, masking, and transforming tabular data interactively and efficiently.
Verdict: This option meets the requirements to mask sensitive data while preserving meaningful features.
---
Option C: Run an AWS Batch j...
Author: Amira · Last updated May 7, 2026
An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of the data quality of the models. The ML engineer must ...
Let's analyze each option carefully based on the key requirements:
Requirements Recap:
Asynchronous inference on large datasets.
Scheduled monitoring of data quality.
Alerts when data quality changes occur.
---
Option A:
Deploy models using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor data quality and send alerts.
AWS Glue is primarily an ETL (Extract, Transform, Load) service for data preparation and integration.
While Glue can run scheduled jobs, it’s not primarily designed for deploying ML models or running inference.
CloudWatch alarms can monitor metrics but it’s not specialized for detailed data quality monitoring.
This approach may require heavy custom coding to perform inference and data quality checks.
Best for: ETL pipelines, data integration, but not ideal for model deployment & built-in data quality monitoring.
---
Option B:
Deploy models using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor data quality and send alerts.
AWS Batch can run batch jobs asynchronously on large datasets, which fits the inference requirement.
However, AWS CloudTrail is for logging API calls and user activity, not data quality monitoring.
CloudTrail cannot send alerts on data quality issues; it’s not a monitoring or alerting tool for dataset changes.
This option would require custom data quality monitoring and alerting mechanisms outside of CloudTrail.
Best for: Batch processing workloads but not for automated data quality monitoring or alerting.
---
Option C:
Deploy models using Amazon ECS on AWS Fargate. Use Amazon EventBridge to monitor data quality and send alerts.
ECS Fargate is great for running containerized applications without managing servers.
It can handle asynchronous inference if properly architected.
EventBridge is an event bus service that can trigger actions based on events.
However, EventBridge itself doesn’t monitor data quality directly; it reacts to events.
Data quality monitoring would require custom events generated from monitoring jobs.
So ...
Author: Zara1234 · Last updated May 7, 2026
An ML engineer normalized training data by using min-max normalization in AWS Glue DataBrew. The ML engineer must normalize the production inference data in the same way as the training data before passing the produc...
Let's analyze each option carefully with respect to the requirement: The production inference data must be normalized in the same way as the training data, specifically using min-max normalization.
---
Key Factors to Consider:
Consistency: The model expects input normalized the same way as during training.
Reproducibility: The normalization statistics (min and max values) used during training must be applied to production.
Data distribution: Production data distribution might differ, but changing normalization stats on production data alone breaks consistency.
Real-time or batch inference: Whether inference is real-time or batch can affect whether per-sample or batch normalization is feasible.
---
Option A:
"Apply statistics from a well-known dataset to normalize the production samples."
Problem: Using a completely different dataset's statistics will very likely result in mismatched scaling, leading to poor or invalid model predictions.
Scenario where it might be used: Only if the production data is unavailable or if training data statistics are lost—but this is suboptimal and generally discouraged.
Reject because: It breaks consistency between training and inference data normalization.
---
Option B:
"Keep the min-max normalization statistics from the training set. Use these values to normalize the production samples."
This is the ideal approach.
The model was trained on data scaled with specific min and max values; to maintain input consistency, production data must be scaled with the same min and max.
This guarantees that the model sees data in the same scale as training.
Scenario: This approach is standard practice in ML pipelines for preprocessing.
---
Option C:
"Calculate a new set of min-max normalization statistics ...
Author: Lucas · Last updated May 7, 2026
A company is planning to use Amazon SageMaker to make classification ratings that are based on images. The company has 6 =D0=A2=D0=92 of training data that is stored on an Amazon FSx for NetApp ONTAP system virtual machine (SVM). The SVM is in the same VPC as SageMaker.
An ML engineer must make...
Let's analyze each option carefully based on the scenario and requirements:
---
Scenario Summary:
Training data (6 TB) stored on Amazon FSx for NetApp ONTAP SVM.
FSx ONTAP SVM is in the same VPC as SageMaker.
Need to make the training data accessible for ML models running in SageMaker.
Data is image data used for classification.
Goal: Access the training data efficiently for SageMaker training jobs.
---
Option A:
Mount the FSx for ONTAP file system as a volume to the SageMaker Instance.
Feasibility: SageMaker training instances (or notebooks) can mount NFS shares or FSx volumes if network access allows.
Since the FSx ONTAP SVM is in the same VPC, network connectivity exists.
You can mount the file system via NFS or SMB directly on the SageMaker instance.
This approach allows SageMaker training jobs direct, high-throughput, low-latency access to the 6 TB of image data.
Key factor: FSx ONTAP supports NFS/SMB protocols that can be mounted as file system storage, enabling SageMaker jobs to access data as if local files.
This avoids data copying, reduces delays, and keeps data consistent.
Verdict: This is a straightforward, performant, and recommended approach when FSx is in the same VPC.
---
Option B:
Create an Amazon S3 bucket. Use Mountpoint for Amazon S3 to link the S3 bucket to the FSx for ONTAP file system.
Mountpoint for Amazon S3 is a solution that allows FSx for ONTAP to access S3 as a namespace.
Here, the direction is FSx -> S3.
The requirement is to make FSx data accessible to SageMaker.
This option suggests linking S3 bucket to FSx rather than the other way around.
SageMaker natively works very well with S3 for data access.
However, data is currently on FSx, and using Mountpoint does not directly expose FSx data to SageMaker via S3.
It would require migrating or syncing data from FSx to S3, which can be time-consuming or inefficient for large data sets.
This option adds complexity and latency.
Verdict: Not ideal here because it complicates access and does not directly solve the access problem from SageMaker to FSx.
---
Option C:
Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.
SageMaker Data Wrangler provides GUI and tools for data preparation and catalog management.
A catalog connection typically integrates with databases, data lakes, or catalogs like AWS Glue Data Catalog.
FSx ONTAP is a file system, not a data catalog or database.
Data Wrangler can connect...
Author: Stella · Last updated May 7, 2026
A company regularly receives new training data from the vendor of an ML model. The vendor delivers cleaned and prepared data to the company's Amazon S3 bucket every 3-4 days.
The company has an Amazon SageMaker pipeline to retrain the model. An ML engineer needs to implement a solution to run the pi...
Let's analyze each option carefully, focusing on operational effort, scalability, simplicity, and best practice for triggering workflows on new data uploads in S3.
---
Option A:
Create an S3 Lifecycle rule to transfer the data to the SageMaker training instance and to initiate training.
Lifecycle rules in S3 are primarily designed for data management tasks like transitioning storage classes or deleting old data.
Lifecycle rules cannot trigger training jobs or workflows directly. They manage data retention and movement but are not designed to initiate processing tasks.
Using lifecycle rules to "initiate training" is not supported.
Rejected because lifecycle rules don’t support triggering compute workflows or pipelines.
---
Option B:
Create an AWS Lambda function that scans the S3 bucket. Program the Lambda function to initiate the pipeline when new data is uploaded.
Lambda can be triggered directly by S3 events on object creation (upload). So Lambda can be invoked immediately when new data arrives.
The Lambda function can initiate the SageMaker pipeline programmatically via the SDK.
Requires writing and maintaining code to scan/check for new files and trigger the pipeline.
May involve some operational overhead maintaining the Lambda, handling errors, retries, and scaling.
However, Lambda is a serverless, low-maintenance compute service, so operational effort is low but not zero.
This is a common and flexible approach for event-driven workflows triggered by S3.
---
Option C:
Create an Amazon EventBridge rule that has an event pattern that matches the S3 upload. Configure the pipeline as the target of the rule.
EventBridge can directly listen to S3 events (like `ObjectCreated`) without needing Lambda as a bridge.
EventBridge supports SageMaker pipeline as a direct target, so it can trigger the pipeline automatically and immediately on new uploads.
This ...
Author: Ahmed · Last updated May 7, 2026
An ML engineer is developing a fraud detection model by using the Amazon SageMaker XGBoost algorithm. The model classifies transactions as either fraudulent or legitimate.
During testing, the model excels at identifying fraud in the training dataset. However, the model is inefficient at identif...
Let's analyze the problem and each option carefully:
---
Problem Summary:
The model performs very well on training data (detecting fraud in training transactions).
The model performs poorly on new/unseen data (fails to detect fraud on test/real-world transactions).
This suggests the model is overfitting — it learned training data patterns too well, including noise or specifics that don't generalize.
---
Option A: Increase the learning rate
Learning rate controls how much the model weights are updated during training.
Increasing learning rate can speed up convergence but might cause instability or worse performance if too high.
It won't directly help with overfitting; usually, a high learning rate can hurt performance.
Typically, when overfitting, you'd consider lowering the learning rate or regularizing more, rather than increasing it.
Use case: Increasing learning rate is used to speed training or escape shallow local minima, not to address overfitting.
Conclusion: Not a good option here.
---
Option B: Remove some irrelevant features from the training dataset
Removing irrelevant or noisy features can reduce overfitting because the model has fewer chances to fit noise.
Feature selection improves generalization by keeping only meaningful features.
However, this requires knowing which features are irrelevant or doing feature importance analysis.
This option can be good but may not be the most direct or simplest first step.
Use case:...
Author: Charlotte · Last updated May 7, 2026
A company has a binary classification model in production. An ML engineer needs to develop a new version of the model.
The new model version must maximize correct predictions of positive labels and negative labels. The ML engineer must use a metric to recalib...
Let's analyze the problem carefully:
Problem Statement Recap:
Binary classification model.
Need to maximize correct predictions of both positive and negative labels.
Metric must be used to recalibrate the model to meet these requirements.
---
Understanding the options:
A) Accuracy
Definition: (TP + TN) / (Total samples) — the proportion of all correct predictions (both positive and negative) out of total samples.
Use case: Accuracy measures overall correctness. It balances true positives and true negatives, reflecting the model’s ability to classify both classes correctly.
Limitations: Can be misleading in imbalanced datasets because a model predicting mostly the majority class can still have high accuracy.
Relevance here: Since the goal is to maximize correct predictions for both positive and negative classes, accuracy directly measures that combined correctness. It is a straightforward metric to recalibrate the threshold or model to optimize overall correctness.
---
B) Precision
Definition: TP / (TP + FP) — among predicted positives, how many are truly positive.
Use case: Precision is important when the cost of false positives is high (e.g., spam detection, fraud alerts).
Limitations: Does not consider false negatives or true negatives. Maximizing precision alone can reduce the number of positive predictions, hurting recall.
Relevance here: Focuses only on the positive predictions, ignoring negative class performance, so it doesn’t align with maximizing correct predictions on both classes.
---
C) Recall (Sensitivity, True Positive Ra...
Author: Ishaan · Last updated May 7, 2026
A company is using Amazon SageMaker to create ML models. The company's data scientists need fine-grained control of the ML workflows that they orchestrate. The data scientists also need the ability to visualize SageMaker jobs and workflows as a directed acyclic graph (DAG). The data scientists must keep a running history of model di...
Let's analyze the requirements and options carefully.
---
Requirements:
1. Fine-grained control of ML workflows — data scientists want to orchestrate workflows with detailed control.
2. Ability to visualize SageMaker jobs and workflows as a DAG — clear visualization of dependencies and steps.
3. Running history of model discovery experiments — track experiments over time.
4. Model governance for auditing and compliance — need lineage tracking and auditability.
---
Understanding the options:
AWS CodePipeline:
A general CI/CD service designed mainly for software deployment workflows. While it can integrate with SageMaker, it’s not purpose-built for ML workflows.
It does not inherently support DAG visualization specific to ML workflows or detailed experiment tracking.
SageMaker Pipelines:
A purpose-built ML workflow orchestration service.
Provides native support for defining workflows as directed acyclic graphs (DAGs).
Has tight integration with SageMaker Studio and experiment tracking.
Designed to give data scientists fine-grained control over each step in the ML lifecycle.
Supports ML lineage tracking and experiment management.
SageMaker Experiments:
Helps organize, track, compare, and evaluate ML experiments and model versions.
Mainly focused on experiment tracking and history, but not on orchestrating or visualizing workflows as DAGs.
SageMaker ML Lineage Tracking:
Tracks data and artifacts flow for audit and governance purposes, to see the full lineage of models, datasets, and training jobs.
---
Option-by-option analysis:
Option A: Use AWS CodePipeline + SageMaker Studio + SageMaker ML Lineage Tracking
AWS CodePipeline is not optimized for ML workflows or DAG visualization.
SageMaker ML Lineage Tracking is great for governance and audit but CodePipeline limits fine-grained workflow control and visualization.
Rejected due to poor fit for ML workflow orchestration and DAG visualization.
Option B: Use AWS CodePipeline + SageMaker Experiments
CodePipeline still lacks ML workflow orchestration and DAG visualization.
SageMaker Experiments tracks experiments well but does not orchestrate or visualize workflows as DAGs.
Rejected for the same reasons as A: lack of DAG visualization and fine control.
Option C: Use SageMaker Pipelines + SageMaker Studio + SageMaker ML Lineage Tracking
SageMaker Pipelines supports DAG visualization and fine-grained workflow control.
SageMaker...
Author: Ava · Last updated May 7, 2026
A company wants to reduce the cost of its containerized ML applications. The applications use ML models that run on Amazon EC2 instances, AWS Lambda functions, and an Amazon Elastic Container Service (Amazon ECS) cluster. The EC2 workloads and ECS workloads use Amazon Elastic Block Store (Amazon EBS) volumes to save predictions and artifacts.
An ML engineer must identify resources that are being used ineffic...
Let's analyze each option carefully in the context of the requirement:
---
Problem Summary:
Reduce cost of containerized ML applications.
Applications run on EC2, AWS Lambda, and ECS.
EC2 and ECS use EBS volumes.
Need to identify inefficient resource usage.
Generate recommendations to reduce cost.
Minimize development effort.
---
Option A: Create code to evaluate each instance's memory and compute usage
Pros: Custom, can be tailored to exact needs.
Cons: High development effort required to build monitoring, analyze metrics, and generate recommendations.
Also, EC2, ECS, and Lambda require different monitoring approaches.
Would require setting up metric collection, thresholds, and logic.
Not an out-of-the-box solution, not minimal effort.
Use case: When very customized or domain-specific analysis is required and automation isn't available.
Rejected because: It requires significant development effort, against the requirement of "LEAST development effort."
---
Option B: Add cost allocation tags to the resources. Activate the tags in AWS Billing and Cost Management
Pros: Tags help allocate cost per team, application, or environment.
Helps identify which resources cost more.
Easy to implement, relatively low effort.
Cons:
Tags alone don’t identify inefficiency or recommend cost savings.
They help with cost visibility and allocation, but not with usage or right-sizing recommendations.
Tagging requires some governance but no automation for recommendations.
Use case: When you want to allocate and track costs by business unit or project but not for efficiency analysis or optimization.
Rejected because: Does not meet the requirement to identify inefficient usage or generate cost-saving recommendations.
---
Option C: Check AWS CloudTrail event history for the creation of the resources
Pros: Provides audit trail of resource creation and changes.
Helps understand who or what created resources.
Useful for governance, security, or troubleshooting.
Cons:
Does not provide usage metrics or cost insights.
No information about resource efficienc...
Author: FlamePhoenix2025 · Last updated May 7, 2026
A company needs to create a central catalog for all the company's ML models. The models are in AWS accounts where the company developed the models initially. The models are hosted in Amazon Elastic Cont...
Let's analyze each option carefully against the key requirements:
Requirements Recap:
Central catalog for all company ML models
Models reside in multiple AWS accounts where they were developed
Models are stored in Amazon ECR repositories
Central catalog must allow discovery and visibility across accounts
---
Option A:
Configure ECR cross-account replication for each existing ECR repository. Ensure that each model is visible in each AWS account.
Pros:
Cross-account replication can automatically sync images across accounts.
Each account can see the models if replication is set up correctly.
Cons:
Does not create a central catalog; it creates multiple copies of the same images across all accounts.
Managing replication rules for many repos and accounts can be complex and error-prone.
Not scalable for many models/accounts.
Replication means duplication of storage and management overhead.
Best scenario for:
When a few accounts and repos are involved, and you want to ensure image availability in all accounts without a single catalog.
---
Option B:
Create a new AWS account with a new ECR repository as the central catalog. Configure ECR cross-account replication between the initial ECR repositories and the central catalog.
Pros:
Centralizes model images into one ECR repo in a dedicated account, fulfilling the “central catalog” need.
Replication keeps the central catalog updated with models from multiple accounts.
Central account can provide consolidated access management and auditing.
Cons:
Replication involves duplication of images, storage costs increase.
Sync delays possible, meaning central catalog might not be real-time.
Best scenario for:
Organizations that want a physical centralized ECR repository as a source of truth and can accept duplication.
When a “single pane of glass” in a dedicated account is a hard requirement.
---
Option C:
Use the Amazon SageMaker Model Registry to create a model group for models hosted in Amazon ECR. Create a new AWS account. In the new account, use the SageMaker Model Registry as the central ca...
Author: MoonlitPantherX · Last updated May 7, 2026
A company has developed a new ML model. The company requires online model validation on 10% of the traffic before the company fully releases the model in production. The company uses an Amazon SageMaker endpoint behind an Application Load Balancer (ALB) to s...
Let's analyze each option carefully based on the requirements and key factors:
Requirements:
Online validation of the new ML model on 10% of the traffic before full production release.
The company uses Amazon SageMaker endpoint behind an Application Load Balancer (ALB).
Least operational overhead is desired.
Need to monitor model invocations.
---
Option A:
Use production variants to add the new model to the existing SageMaker endpoint. Set the variant weight to 0.1 for the new model. Monitor the number of invocations by using Amazon CloudWatch.
How it works:
SageMaker endpoints can host multiple production variants (models). You can assign traffic weights to each variant, and SageMaker will route a percentage of inference requests accordingly. Setting the variant weight to 0.1 routes 10% of traffic to the new model.
Operational overhead:
Low, because you are using a single SageMaker endpoint. No need to manage multiple endpoints or ALB routing rules.
Monitoring:
CloudWatch provides built-in metrics for each production variant's invocations and latency.
Best fit:
Ideal for online validation with traffic splitting and minimal operational complexity.
---
Option B:
Use production variants to add the new model to the existing SageMaker endpoint. Set the variant weight to 1 for the new model. Monitor the number of invocations by using Amazon CloudWatch.
How it works:
Routing 100% of the traffic to the new model immediately.
Issue:
Does not meet the requirement of validating on only 10% of the traffic; it's a full switch.
When to use:
When fully deploying a new model, not for gradual validation.
Rejection reason:
Does not meet the 10% traffic validation requirement.
---
Option C:
Create a new SageMaker endpoint. Use production variants to add the new model to the new endpoint. Monitor the number of invocations by using Amazon CloudWatch.
How it works:
A separate SageMaker endpoint is created to host the new ...
Author: ThunderBear · Last updated May 7, 2026
A company needs to develop an ML model. The model must identify an item in an image and must provide the location of the item.
Whic...
Let's analyze the requirements and the options one by one:
Requirements:
Identify an item in an image
Provide the location of the item in the image
---
Option A: Image Classification
What it does: Assigns a label or category to the whole image.
Key factor: It tells what is in the image but does not provide the location of the item.
When to use: When you only need to identify the class or category of the image as a whole (e.g., "dog" or "cat").
Rejected because: It cannot localize or output the bounding box coordinates of the item in the image.
---
Option B: XGBoost
What it does: Gradient-boosted decision trees for tabular data, classification, or regression tasks.
Key factor: Works with structured, tabular data, not images directly.
When to use: Predicting outcomes from tabular data, not images.
Rejected because: It is not designed for image processing or localization tasks.
---
Option C: Object Detection
What it does: Identifies and classifies objects in an image and also provides their locations as bounding boxes.
Key factor: This algorithm outputs both the class and location (bounding box coordinates) of each detected item.
When to use: When you need to detect multiple objects and their positions in...
Author: Ishaan · Last updated May 7, 2026
A company has an Amazon S3 bucket that contains 1 =D0=A2=D0=92 of files from different sources. The S3 bucket contains the following file types in the same S3 folder: CSV, JSON, XLSX, and Apache Parquet.
An ML engineer must implement a solution that uses AWS Glue DataBrew to process the data. The ML engineer also must...
Let's analyze the problem carefully and evaluate each option based on the key requirements and constraints.
---
Problem Summary:
S3 bucket has 1 TB of files from different sources.
Files of different types (CSV, JSON, XLSX, Apache Parquet) are all in the same S3 folder.
Must use AWS Glue DataBrew to process the data.
Final output must be stored back in S3.
Output must be consumable by AWS Glue in the future.
---
Key Factors to Consider:
1. Data Format Handling by DataBrew:
DataBrew can read and process multiple file formats (CSV, JSON, XLSX, Parquet).
However, DataBrew does not support mixing file types in the same dataset/job. It expects uniform input format or separated datasets per format.
2. Folder Structure & File Separation:
Having multiple file types in the same folder complicates ingestion.
Best practice: separate files by format to create reliable DataBrew datasets.
3. Output Format Compatibility:
AWS Glue supports Apache Parquet format natively.
There is no such thing as “AWS Glue Parquet format” — Apache Parquet is a standard format, used by Glue.
Output should be stored in Apache Parquet for efficient querying and compatibility.
4. Processing scale & maintainability:
1 TB of data is large; processing needs to be efficient.
Processing data by file type individually improves reliability and scalability.
---
Option-by-Option Analysis:
---
A) Use DataBrew to process the existing S3 folder. Store the output in Apache Parquet format.
Problem: All file types are mixed in the same S3 folder.
DataBrew does not support multiple formats simultaneously in one dataset.
This option would cause ingestion failure or incorrect processing.
Output in Apache Parquet is correct, but input handli...
Author: Isabella · Last updated May 7, 2026
A manufacturing company uses an ML model to determine whether products meet a standard for quality. The model produces an output of "Passed" or "Failed." Robots separate the products into the two categories by using the model to analyze photos on th...
Let's analyze the options for evaluating the ML model that classifies products as "Passed" or "Failed" for quality control:
---
Context:
Problem type: Binary classification (Passed vs. Failed).
Goal: Assess model performance in separating products into two categories.
Key factors:
False positives (products incorrectly marked as Passed)
False negatives (products incorrectly marked as Failed)
It's critical to understand how well the model detects defective products without letting poor-quality products pass or rejecting good products mistakenly.
---
Option A: Precision and Recall
Precision measures: Of all products predicted as Passed, how many actually Passed?
Recall measures: Of all products that actually Passed, how many were correctly predicted?
Use case: Excellent for classification tasks, especially where the cost of false positives and false negatives matters.
Relevance here: Important to balance because:
High precision means fewer bad products slipping through (false positives).
High recall means catching most good products without rejecting them (false negatives).
Verdict: Good choice.
---
Option B: Root mean square error (RMSE) and mean absolute percentage error (MAPE)
These are regression metrics measuring the error between predicted and actual numeric values.
Not applicable here because the...
Author: MoonlitPantherX · Last updated May 7, 2026
An ML engineer needs to encrypt all data in transit when an ML training job runs. The ML engineer must ensure that encryption in transit is applied to processes that Amazon SageMaker...
Great question! Let’s break down each option carefully, focusing on the key requirement: encrypt all data in transit during an Amazon SageMaker training job.
---
Key factors to consider:
Data in transit encryption means protecting data as it moves between components or nodes, i.e., network-level encryption.
SageMaker training jobs involve distributed training (potentially multiple nodes), communication between SageMaker processes, and potentially data moving between S3 and the training instances.
AWS offers encryption in transit via TLS by default for communication between SageMaker and other AWS services.
When encryption of data at rest is required, KMS keys come into play.
Batch processing and training clusters differ in their architecture and communication patterns.
KMS keys are mainly used for encryption at rest or for signing, not directly for encrypting network traffic in transit.
---
Option A: Encrypt communication between nodes for batch processing
Batch processing refers typically to SageMaker Batch Transform jobs, not training jobs.
Encrypting communication between nodes for batch processing is not applicable here because batch transform jobs don’t involve multiple nodes communicating; they are more about processing input data in batches independently.
This option is irrelevant for a training job scenario, which often requires distributed communication.
Scenario: Useful if you are running batch inference jobs that require secure communication between distributed components (rare).
Reject — Not relevant for training jobs and in-transit encryption during training.
---
Option B: Encrypt communication between nodes in a training cluster
This option directly targets in-transit encryption between nodes during a distributed training job.
Distributed training jobs communicate over network, exchanging gradients, model parameters, and control info.
SageMaker supports ...
Author: Arjun · Last updated May 7, 2026
An ML engineer needs to use metrics to assess the quality of a time-series forecasting model.
Whic...
Let's analyze the options carefully to determine which metrics apply to assessing the quality of a time-series forecasting model.
---
A) Recall
What it measures: Recall is a classification metric that measures the proportion of actual positives correctly identified by the model.
Why reject: Time-series forecasting typically deals with continuous numeric predictions over time, not classification into categories. Recall does not apply to regression or forecasting tasks.
Scenario: Useful in classification tasks, such as detecting fraudulent transactions, where identifying all positive cases is critical.
---
B) LogLoss
What it measures: Logarithmic loss measures the performance of a classification model where the output is a probability between 0 and 1.
Why reject: Again, LogLoss is a classification metric, not applicable to continuous value forecasting in time series.
Scenario: Used in binary or multi-class classification problems where probabilistic outputs are evaluated.
---
C) Root Mean Square Error (RMSE)
What it measures: RMSE is a common regression metric that quantifies the average magnitude of errors between predicted and actual continuous values, emphasizing larger errors due to squaring.
Why select: Time-series forecasting models output numeric values over time, and RMSE is highly appropriate to measure the model’s prediction accuracy on continuous data.
Scenario: Used when large errors are particularly undesirable and the model predict...
Author: CrimsonViperX · Last updated May 7, 2026
A company runs Amazon SageMaker ML models that use accelerated instances. The models require real-time responses. Each model has different scaling requirements. The company must not al...
Let's analyze each option carefully against the requirements:
Requirements Recap:
Use Amazon SageMaker ML models with accelerated instances (likely GPU-based).
Models require real-time responses (low latency).
Each model has different scaling requirements.
No cold start allowed (meaning the models must be always ready to serve without startup delay).
---
Option A:
Create a SageMaker Serverless Inference endpoint for each model. Use provisioned concurrency for the endpoints.
Serverless inference is designed for on-demand, event-driven workloads and auto-scales automatically.
While provisioned concurrency can reduce cold start latency, Serverless inference doesn't currently support accelerated instances like GPUs.
Also, serverless is not ideal for consistent real-time low-latency needs under heavy load, especially with GPU requirements.
Therefore, this option is rejected due to lack of support for accelerated instances and uncertain latency guarantees.
---
Option B:
Create a SageMaker Asynchronous Inference endpoint for each model. Create an auto scaling policy for each endpoint.
Asynchronous inference is intended for batch jobs or long-running inference tasks that don’t require immediate responses.
It is not designed for real-time, low-latency inference.
The question specifically requires real-time responses, so asynchronous inference is not suitable.
Therefore, rejected because it does not support real-time inference requirements.
---
Option C:
Create a SageMaker endpoint. Create an inference component for each model. In the inference component settings, specify the newly created endpoint. Create an auto scaling policy for each inference component. Set the parameter for the minimum number of copies to at least 1.
This describes creating a multi-model SageMaker endpoint with multiple inference components (each mapping to a model).
You can specify minimum number of instances to at least 1, preventing cold starts.
You can enable auto-scaling individually per inference component to meet different scaling needs.
SageMaker multi-model endpoints support accelerated instances and r...
Author: Rahul · Last updated May 7, 2026
A company uses Amazon SageMaker for its ML process. A compliance audit discovers that an Amazon S3 bucket for training data uses server-side encryption with S3 managed keys (SSE-S3).
The company requires customer managed keys. An ML engineer changes the S3 bucket to use server-side encryption with AWS KMS keys (SSE-KMS). The ML engineer makes no other configuration changes.
Af...
Let's analyze the problem and each option carefully:
---
Problem Summary:
The training data S3 bucket was initially encrypted using SSE-S3 (S3 managed keys).
The company requires customer managed keys (CMKs), so the bucket is changed to use SSE-KMS (KMS-managed keys).
After this change, SageMaker training jobs fail with AccessDenied errors.
The ML engineer did not make any other configuration changes.
---
Understanding the problem:
When using SSE-KMS, access to the data requires permissions not only on the S3 bucket/object but also on the KMS key used for encryption/decryption.
The SageMaker training job runs with an execution role that needs:
S3 permissions to read the training data.
KMS permissions to decrypt (and sometimes encrypt) the data using the customer managed key.
---
Analyze options:
---
Option A: Update the IAM policy attached to the execution role for the training jobs. Include the `s3:ListBucket` and `s3:GetObject` permissions.
These are basic S3 read permissions.
If these were missing, jobs would fail even before changing encryption.
The job was working before the encryption change, so the execution role likely already has these permissions.
The error is specifically AccessDenied after changing to SSE-KMS, which indicates a KMS permission problem, not S3 permission problem.
Verdict: Not the main issue, but necessary in general. Likely not sufficient or the root cause here.
---
Option B: Update the S3 bucket policy attached to the S3 bucket. Set the value of the `aws:SecureTransport` condition key to True.
This forces requests to use SSL (HTTPS).
This is a good security practice but unrelated to KMS key access.
The error is about AccessDenied after changing encryption; network security conditions are unrelated.
If the bucket policy requires `aws:SecureTransport=True`, and if the request is not HTTPS, errors would be different.
This option does not grant KMS permissions or resolve the encryption issue.
Verdict: Irrelevant to KMS access errors.
---
Option C: Up...
Author: Arjun · Last updated May 7, 2026
A company runs training jobs on Amazon SageMaker by using a compute optimized instance. Demand for training runs will remain constant for the next 55 weeks. The instance needs to run for 35 hours each week. The comp...
Let's analyze each option carefully based on the requirements and key factors:
---
Requirements and Key Factors:
Training jobs run on compute-optimized instance (likely `C` class instances like `C5`, `C6`).
Demand is constant for 55 weeks.
Instance needs to run 35 hours per week.
Goal: reduce training costs.
SageMaker training jobs, not inference.
Training duration per week is relatively predictable and steady.
---
Option A: Use a serverless endpoint with provisioned concurrency of 35 hours per week; run training on the endpoint.
Serverless endpoints are designed for inference, not training.
They don't support training jobs directly.
Provisioned concurrency applies to Lambda and serverless inference, not to training hours.
Running training on an endpoint is conceptually wrong; training is done using training jobs, not endpoints.
Rejected: Not suitable for training, no cost benefit for training jobs.
---
Option B: Use SageMaker Edge Manager for training, specifying instance requirements in edge device configuration.
SageMaker Edge Manager is for optimizing, monitoring, and deploying models on edge devices.
It is not designed to run or reduce costs for training jobs on SageMaker instances.
Edge Manager helps with model monitoring and deployment at the edge, not training cost optimization.
Rejected: Not applicable to training jobs or cost savings for training.
---
Option C: Use heterogeneous cluster feature of SageMaker Training by configuring `instance_type`, `instance_count`, and `instance_groups`.
This allows splitting training across different types of instances.
Potential cost optimization if some instances are cheaper but still performant.
However, the question states the company uses a compute-optimized instance alr...
Author: Manish · Last updated May 7, 2026
A company deployed an ML model that uses the XGBoost algorithm to predict product failures. The model is hosted on an Amazon SageMaker endpoint and is trained on normal operating data. An AWS Lambda function provides the predictions to the company's application.
An ML engineer must implement a so...
Great question! Let’s break down the options by considering key factors for detecting model accuracy degradation over time, specifically focusing on model drift detection and the best tools for this in the AWS ecosystem.
---
Key Factors:
Model Accuracy Monitoring requires comparison of live inference data with training data or expected data distribution.
Drift Detection often means detecting data distribution changes (data drift or concept drift).
Automated monitoring tools that can compare live data to a baseline are preferred.
Alerting should be part of the solution to notify when drift occurs.
SageMaker built-in tools are specialized for this purpose, supporting ease of integration and robustness.
---
Option A: Use Amazon CloudWatch to create a dashboard that monitors real-time inference data and model predictions. Use the dashboard to detect drift.
Reasoning: CloudWatch is great for monitoring logs, metrics, and creating dashboards for visual insights.
Limitations: It does not have built-in capabilities to analyze data distributions or detect drift between live data and training baselines automatically.
Manual setup and analysis would be required, which is error-prone and inefficient for real-time drift detection.
Use case: Best for monitoring system metrics, errors, latency, not for statistical model drift detection.
Reject because: No built-in data drift analysis, requires manual interpretation.
---
Option B: Modify the Lambda function to calculate model drift by using real-time inference data and model predictions. Program the Lambda function to send alerts.
Reasoning: Lambda can process data and perform custom logic, so theoretically, it can calculate drift metrics.
Limitations:
Drift detection involves statistical analysis comparing live data distributions with baseline data—complex and requires significant coding.
Lambda functions have time and resource limitations which may not be suited for heavy statistical calculations or storing historical data.
This approach duplicates effort, is less maintainable, and lacks standardization compared to built-in AWS tools.
Use case: Suitable for lightweight custom alerting or processing, but not for robust model drift d...
Author: Maya · Last updated May 7, 2026
A company has an ML model that uses historical transaction data to predict customer behavior. An ML engineer is optimizing the model in Amazon SageMaker to enhance the model's predictive accuracy. The ML engineer must examine the input data and the resulting predictions to identify trends ...
Let's analyze each option based on the requirement:
Requirement:
Examine input data and resulting predictions
Identify trends that could skew model performance across different demographics
Enhance model predictive accuracy by understanding bias or skew in data or model outputs
---
A) Use Amazon CloudWatch to monitor network metrics and CPU metrics for resource optimization during model training.
What it does: CloudWatch is mainly for infrastructure monitoring — CPU usage, memory, network I/O, disk, etc.
Relevance: This option focuses on resource utilization and operational health, not on analyzing input data or model predictions.
Use case: Useful for optimizing resource allocation or detecting training infrastructure issues, not for data or bias analysis.
Conclusion: Rejected because it does not address data or prediction analysis or bias detection.
---
B) Create AWS Glue DataBrew recipes to correct the data based on statistics from the model output.
What it does: DataBrew is a visual data preparation tool to clean and normalize data, create transformations, and generate recipes for consistent processing.
Relevance: While DataBrew can correct and prepare data, it doesn’t natively analyze model outputs or detect biases or trends related to demographics.
Use case: Best when data cleaning or transformation is needed before training, not directly for model output analysis.
Conclusion: Rejected because it doesn’t analyze predictions or model bias; it's mainly for data cleaning.
---
C) Use SageMaker Clarify to evaluate the model and training data f...
Author: Daniel · Last updated May 7, 2026
A company uses 10 Reserved Instances of accelerated instance types to serve the current version of an ML model. An ML engineer needs to deploy a new version of the model to an Amazon SageMaker real-time inference endpoint.
The solution must use the original 10 instances to serve both versions of the model. The solution also must include one additional Reserved Instance that is available...
Let's analyze the problem carefully:
---
Problem Breakdown:
Current setup: 10 Reserved Instances serving version 1 of the ML model.
Need to deploy version 2 using the same 10 instances for both versions.
There is 1 additional Reserved Instance available for the deployment process.
No downtime or service interruptions during the transition.
The new version must be deployed smoothly, leveraging reserved capacity efficiently.
---
Options analysis:
A) Blue/green deployment with all-at-once traffic shifting
Blue/green deployment creates a separate environment (green) with the new model.
All-at-once traffic shifting means switching 100% of traffic immediately to the new environment.
Problem: You need to have enough capacity to run both old and new environments in parallel.
The company has only 1 extra instance, but would need the full 10 new instances to run the new model while keeping the old model active.
Hence, not feasible because the new environment requires additional capacity equal to the current environment (10 instances), which is not available.
Rejected due to insufficient capacity to support a full green environment for 10 instances.
---
B) Blue/green deployment with canary traffic shifting and a size of 10%
Canary traffic shifting means sending a small portion (10%) of traffic to the new model in the green environment, gradually increasing.
This allows testing with limited traffic before full cutover.
Still requires enough capacity to run the green environment.
With 10 instances currently running old model, you'd need a new environment scaled to support at least 1 instance (10% of 10) — which matches the 1 extra reserved instance available.
This fits perfectly: the 1 extra reserved instance can serve the canary traffic for the new version, while the original 10 serve the old version.
Gradually, as you shift traffic, you can scale down old version instances accordingly.
This deployment style supports no downtime and gradual traffic shifting, minimizing risk.
This is a valid and recommended approach in this scenario.
---
C) Shadow test with a traffic sampling percentage of 10%
Shadow testing sends a copy of the live traffic to the new version without affecting user response.
New version processes 10% of the traffic, but does not serve actual user responses.
Useful for testing, but the new ...
Author: Michael · Last updated May 7, 2026
An IoT company uses Amazon SageMaker to train and test an XGBoost model for object detection. ML engineers need to monitor performance metrics when they train the model with variants in hyperparameters. The ML engineers also need to send Short Mess...
Let's break down the requirements and analyze each option carefully.
Requirements:
1. Monitor performance metrics during training with different hyperparameters
This involves tracking and visualizing model training metrics (e.g., accuracy, loss) over time or across different training jobs.
2. Send SMS text messages after training completion
SMS messages must be delivered as notifications after the training job finishes.
---
Key factors to consider:
Monitoring performance metrics for ML training jobs:
Amazon SageMaker integrates with Amazon CloudWatch to push metrics and logs. CloudWatch is the standard AWS service to collect, monitor, and visualize metrics in near real-time.
AWS CloudTrail records API calls and changes to AWS resources — it is primarily for auditing and compliance, not for monitoring runtime performance metrics like model accuracy or loss.
Sending SMS notifications:
For sending SMS, Amazon SNS (Simple Notification Service) is the service designed to deliver notifications including SMS, email, and push messages.
Amazon SQS (Simple Queue Service) is a message queue designed for decoupling and buffering messages between distributed systems. It does not send SMS or email directly; it only stores messages for other consumers.
---
Option analysis:
A) Use Amazon CloudWatch to monitor performance metrics. Use Amazon SQS for message delivery.
Monitoring: CloudWatch is appropriate here.
SMS delivery: SQS is not designed to send SMS. It’s a queue service; someone would have to poll SQS and then send SMS using another service.
Verdict: Not ideal because SQS does not directly send SMS.
B) Use Amazon CloudWatch to monitor performance m...
Author: Sam · Last updated May 7, 2026
A company is working on an ML project that will include Amazon SageMaker notebook instances. An ML engineer must ensure that the SageMaker notebook instances do not allow root access.
Which...
Let's analyze each option carefully in the context of preventing SageMaker notebook instances that allow root access from being deployed.
---
Key context:
Goal: Prevent deployment of SageMaker notebook instances that allow root access (not just detect or delete them after deployment).
Important factor: Root access setting is a property of the notebook instance (e.g., `RootAccess` property in SageMaker Notebook instance configuration).
---
Option A: Use IAM condition keys to stop deployments of SageMaker notebook instances that allow root access.
Explanation:
IAM policies can use condition keys to restrict API actions based on certain request parameters.
Amazon SageMaker supports some condition keys that can be used in IAM policies to restrict the creation of resources based on their configuration.
Relevance:
You can create an IAM policy that denies the `CreateNotebookInstance` or `UpdateNotebookInstance` API calls if the `RootAccess` parameter is set to "Enabled".
This prevents the user from creating notebook instances with root access at the API level, effectively blocking deployment before it happens.
When to use:
When you want to enforce a policy at the permission level, ensuring that non-compliant resource creation attempts are rejected outright.
Conclusion:
This is a direct and preventive approach aligned with the requirement.
---
Option B: Use AWS Key Management Service (AWS KMS) keys to stop deployments of SageMaker notebook instances that allow root access.
Explanation:
KMS keys control encryption and access to encrypted data.
KMS does not natively control or restrict resource creation based on resource configurations like root access flags.
Relevance:
This is unrelated to preventing root access configuration in notebook instances.
When to use:
Primarily used for controlling access to encrypted data or enforcing encryption-related policies, not resource deployment restrictions.
Conclusion:
Not applicable here, so rejected.
---
Option C: Monitor resource creation by using Amazon EventBridge events. Create an AWS Lambda function that deletes all deployed SageMaker notebook instances that allow root access.
Explanation:
EventBridge can capture events like SageMaker notebook instance creation, and a Lambda function can react by deleting instances that violate the root access policy.
Relevance:
This is a reactive approach — the notebook instance is created first, then detected and deleted.
...
Author: FlamePhoenix2025 · Last updated May 7, 2026
A company is using Amazon SageMaker to develop ML models. The company stores sensitive training data in an Amazon S3 bucket. The model training must have network iso...
Let's analyze each option carefully based on the key requirement: SageMaker training must have network isolation from the internet while accessing sensitive training data stored in S3.
---
A) Run the SageMaker training jobs in private subnets. Create a NAT gateway. Route traffic for training through the NAT gateway.
Explanation:
Running jobs in private subnets is good for isolation. However, creating a NAT gateway allows traffic to go through the internet (NAT gateway routes traffic to the internet), which breaks the network isolation requirement. Although the NAT gateway hides the internal IPs, it still means training jobs can access the internet.
Verdict:
Rejected because NAT gateway allows internet access, violating strict network isolation.
---
B) Run the SageMaker training jobs in private subnets. Create an S3 gateway VPC endpoint. Route traffic for training through the S3 gateway VPC endpoint.
Explanation:
Running in private subnets ensures no direct internet access. Using an S3 gateway VPC endpoint means all S3 traffic remains inside the AWS network and does not traverse the internet, maintaining network isolation. This meets the requirement of accessing S3 securely and privately.
Verdict:
Selected because it maintains network isolation, and securely accesses S3 without internet access.
---
C) Run the SageMaker training jobs in public subnets that have an attached security group. In the security group, use inbound rules to limit traffic from the internet. Encrypt SageMaker instanc...
Author: Andrew · Last updated May 7, 2026
A company needs an AWS solution that will automatically create versions of ML models as the models are create...
Let's analyze the requirement and each option carefully:
Requirement:
Automatically create versions of ML models as models are created.
This means the solution must track model versions and manage the lifecycle of these models.
---
Option A: Amazon Elastic Container Registry (Amazon ECR)
ECR is a managed container image registry service.
It is designed to store, manage, and deploy Docker container images.
While it can store containers used for serving models, it does NOT inherently provide model versioning or lifecycle management for ML models themselves.
Use case: Best when you want to manage container images for deployment, not for model versioning.
Reject: Does not handle ML model versioning or lifecycle tracking directly.
---
Option B: Model packages from Amazon SageMaker Marketplace
Model packages are pre-built models or algorithms available for purchase or use on SageMaker from the marketplace.
This is more about acquiring third-party models, not managing or versioning your own model development lifecycle.
Use case: When you want to buy or share models in a marketplace, not for internal versioning.
Reject: Not designed for automatic versioning of your own created models.
---
Option C: Amazon SageMaker ML Lineage Tracking
SageMaker Lineage Tracking captures metadata about ML workflows, including datasets, training jobs, models, and endpoints.
It provides visibility into t...
Author: Sofia · Last updated May 7, 2026
A company needs to use Retrieval Augmented Generation (RAG) to supplement an open source large language model (LLM) that runs on Amazon Bedrock. The company's data for RAG is a set of documents in an Amazon S3 bucket. The documents consist of .csv...
To determine which option meets the requirements with the least operational overhead, let's break down each solution:
A) Create a pipeline in Amazon SageMaker Pipelines to generate a new model. Call the new model from Amazon Bedrock to perform RAG queries.
Analysis:
This approach involves creating a new model pipeline in Amazon SageMaker, which is a complex and high-maintenance process. You’d have to manage the model creation, training, and then the integration with Amazon Bedrock.
Operational overhead is significant because it requires building, maintaining, and monitoring the pipeline, which involves managing training data, tuning hyperparameters, and model updates.
Why Rejected: This solution is overkill for the given requirements, as it involves a full model creation pipeline, which is unnecessary if you just need to perform RAG queries using the existing model.
B) Convert the data into vectors. Store the data in an Amazon Neptune database. Connect the database to Amazon Bedrock. Call the Amazon Bedrock API to perform RAG queries.
Analysis:
This involves several steps: converting your documents into vector representations (embedding), storing those vectors in Amazon Neptune, and setting up the integration with Amazon Bedrock. While Amazon Neptune is optimized for graph data, it does introduce additional complexity in terms of database management and the embedding generation process.
Operational overhead includes:
Managing a Neptune instance
Regularly updating vector embeddings
Ensuring smooth interaction between Neptune and Bedrock
Why Rejected: This option introduces a heavy operational burden with database management and the need to convert documents into embeddings. It’s suitable if you're working with highly structured relationships or need a graph database, but it’s unnecessarily complicated for this scenario.
C) Fine-tune an existing LLM by using an AutoML job in Amazon SageMaker. Configure the S3 bucket as a dat...
Author: NightmareDragon2025 · Last updated May 7, 2026
A company plans to deploy an ML model for production inference on an Amazon SageMaker endpoint. The average inference payload size will vary from 100 MB to 300 MB. Inference requests must be processed i...
To determine the best Amazon SageMaker inference option for this use case, let's analyze the key factors such as payload size, processing time, and the specific nature of the inference request (real-time vs. batch processing). Here's a breakdown of each option:
A) Serverless Inference
Use Case: Serverless inference is designed for workloads with unpredictable or low-volume inference requests, where users do not want to manage the underlying infrastructure. It allows for auto-scaling and is ideal for smaller, variable loads.
Payload Size Limitation: The serverless inference option has a payload size limit of 5 MB, which is far too small for the requirement of 100 MB to 300 MB payload size.
Processing Time: Serverless inference is optimized for fast, low-latency responses but may not be suitable for large payloads requiring significant time for processing.
Rejection Reason: The payload size exceeds the limit for serverless inference, making this option inappropriate.
B) Asynchronous Inference
Use Case: Asynchronous inference is well-suited for long-running inference requests, where the processing time might vary or exceed typical real-time response requirements. The results are delivered after the inference is completed.
Payload Size: It supports large payloads, including the 100 MB to 300 MB range, making it a good fit for handling large input data.
Processing Time: This option is suitable for scenarios where the inference time is not strictly bound by real-time requirements. The model may take several minutes or even hours to process requests.
Rejection Reason: Although this option can handle large payloads, it does not guarantee real-time processing (i.e., within 60 minutes), which is a strict requirement in this scenario. While it could work for long-running requests, real-time constraints rule it out.
C) Real-time Inference
Use Case: Real-time inference is designed for low-latency, interactive applications where quick responses are cri...
Author: Ava · Last updated May 7, 2026
An ML engineer notices class imbalance in an image classification training job.
What should the M...
Class imbalance in an image classification training job refers to the situation where one class (or multiple classes) in the dataset has significantly fewer examples than others, which can lead to the model learning biased patterns that favor the majority class. To address this issue, the machine learning engineer can consider various approaches.
Option A: Reduce the size of the dataset
Explanation: Reducing the size of the dataset could lead to losing valuable data, especially from the majority class. This would make the model potentially underperform, as it would have fewer examples to learn from. In addition, reducing the dataset size doesn't solve the class imbalance problem but rather exacerbates it.
Reason for Rejection: This option is not appropriate for class imbalance as it would reduce the overall diversity of the data, potentially hurting model performance.
Option B: Transform some of the images in the dataset
Explanation: Transforming images (through augmentation) is a common technique to increase the diversity of training examples. Augmentation methods like rotation, flipping, scaling, and cropping can help generate more examples of underrepresented classes, effectively balancing the dataset.
Reason for Rejection: While this is an effective solution to address class imbalance, it’s not the most direct one in terms of resolution. It can be more time-consuming to design appropriate augmentations compared to other methods (like oversampling). Moreover, it’s not always guaranteed that augmentations will fully solve the imbalance problem, as it depends on how much the augmentations resemble real-world variations.
Option C: Apply random oversampling on the dataset
Explanation: Random oversampling involves duplicating instances from the minority class so that the number of examples in each class becomes more balanced....
Author: Emma · Last updated May 7, 2026
A company receives daily .csv files about customer interactions with its ML model. The company stores the files in Amazon S3 and uses the files to retrain the model. An ML engineer needs to implement a solution to mask credit card numbers in the files b...
To solve this problem, the main goal is to mask credit card numbers in the incoming .csv files stored in Amazon S3 with minimal development effort. Let's evaluate each option based on key factors such as development complexity, automation capabilities, scalability, and cost-effectiveness.
Option A: Create a discovery job in Amazon Macie. Configure the job to find and mask sensitive data.
Pros:
Minimal Development: Amazon Macie is a managed service designed to automatically discover and protect sensitive data (like credit card numbers). It uses machine learning to identify sensitive information in your data with little configuration.
Automation: Macie can be set up to run on a schedule and automatically detect sensitive data in new files without needing custom code.
Scalability: It is fully managed by AWS and can scale with your data without additional infrastructure setup.
Cons:
Cost: Amazon Macie has associated costs, especially as the volume of data increases. Depending on the number of files processed, it might become more expensive than some of the other options.
Masking Limitation: While Macie can discover sensitive data, it is not primarily designed to perform complex data transformations, like masking sensitive data in real time. Additional steps would be required for actually masking the data (perhaps integration with Lambda functions or another process).
Conclusion: While Macie can detect sensitive data effectively, it is not designed for directly masking or modifying data. It is better suited for discovering and classifying data rather than performing operations like masking.
Option B: Create Apache Spark code to run on an AWS Glue job. Use the Sensitive Data Detection functionality in AWS Glue to find and mask sensitive data.
Pros:
Automated Processing: AWS Glue is a fully managed ETL (Extract, Transform, Load) service, and it can run Spark jobs that process large datasets in S3.
Sensitive Data Detection: AWS Glue offers built-in sensitive data detection, which can automatically detect sensitive information, such as credit card numbers, in files.
Scalability: AWS Glue can scale automatically, which is important for handling large datasets.
Integration: AWS Glue integrates well with S3, making it easy to set up automated jobs that trigger when new files are uploaded.
Cons:
Complexity: While this option provides a more automated approach than Option A, setting up AWS Glue with sensitive data detection might require some configuration and learning.
Cost: AWS Glue can incur additional costs, especially for large-scale data processing, although these costs may still be manageable.
Conclusion: AWS Glue is a strong candidate because i...
Author: Ella · Last updated May 7, 2026
A medical company is using AWS to build a tool to recommend treatments for patients. The company has obtained health records and self-reported textual information in English from patients. The company needs to use this information to gain ins...
To address this requirement of extracting insights from patients' health records and self-reported textual information with the least development effort, let’s analyze each option:
Option A: Use Amazon SageMaker to build a recurrent neural network (RNN) to summarize the data.
Pros: Customizable for complex tasks like summarizing unstructured data. RNNs are suitable for text-based tasks.
Cons: Building, training, and deploying an RNN model requires significant development effort, data labeling, and expertise in machine learning and deep learning techniques. You would need to handle all stages, including data preprocessing, training, hyperparameter tuning, and model evaluation.
Scenario: This approach is suitable if you need a completely custom model tailored to the specific dataset or the use case, but it’s overkill for this scenario.
Option B: Use Amazon Comprehend Medical to summarize the data.
Pros: Amazon Comprehend Medical is specifically designed to process and extract insights from medical text, including clinical notes, patient records, and other health-related information. It identifies entities such as medical conditions, treatments, medications, and relationships between them. It offers pre-trained models, which means minimal development work is required. It’s also easy to use and integrates well with AWS services.
Cons: It’s not specifically tailored to generate "summaries," but more for extracting key medical entities, relationships, and insights.
Scenario: Ideal when the goal is to extract medical entities, conditions, and relationships rather than generating summaries in natural language.
Option C: Use Amazon Kendra to create a quick-search tool to query the data.
Pros: Amazon Kendra is a powerful search service that allows you to search through unstructured data and retrieve ...
Author: Madison · Last updated May 7, 2026
A company needs to extract entities from a PDF document to build a classifier model.
Which solution will extract ...
When selecting the most efficient solution for extracting entities from a PDF document and storing them, we need to focus on speed, accuracy, and ease of integration. Here's a breakdown of each option:
Option A: Use Amazon Comprehend to extract the entities. Store the output in Amazon S3.
Advantages:
Amazon Comprehend is a powerful service designed specifically for natural language processing (NLP), and it can extract entities from text-based data such as PDFs that have been converted to text.
The workflow is straightforward and automated.
Disadvantages:
If the PDF is scanned or image-based, it will not be directly usable by Amazon Comprehend unless OCR is applied first.
It assumes the PDF is already in text format, so you may need an additional step for OCR before using Comprehend if the document is scanned.
Option B: Use an open-source AI OCR tool on Amazon SageMaker to extract the entities. Store the output in Amazon S3.
Advantages:
SageMaker allows the flexibility of using custom machine learning models and open-source OCR tools.
If you need high customization for complex documents or specific use cases, this could be an advantage.
Disadvantages:
This option involves building an OCR model or integrating an existing one, which can be time-consuming and may require advanced skills.
The overhead of setting up and maintaining the solution could lead to delays and complexity.
Option C: Use Amazon Textract to extract the entities. Use Amazon Comprehend to convert the entities to text. Store the output in Amazon S3.
Advantages:
Amazon Textract is specifically designed to extract text and structured data (like tables, forms, and entities) from scanned documents, including PDFs.
Amazon Comprehend can be used afterward to identify and classify entities from the extracted text, making it a powerful combination for comprehensive analysis.
Disadvantages:
In this workflow, the document is first processed by Textract, then passed to Comprehend. This introduces an additional step, whi...
Author: Evelyn · Last updated May 7, 2026
A company shares Amazon SageMaker Studio notebooks that are accessible through a VPN. The company must enforce access controls to prevent malicious actors from exploiting presigned...
To determine the best solution for enforcing access controls to prevent malicious actors from exploiting presigned URLs in the scenario provided, let’s analyze each option and its relevance to the problem:
1. Option A: Set up Studio client IP validation by using the `aws:sourceIp` IAM policy condition
Explanation: The `aws:sourceIp` condition allows restricting access based on the source IP address from which the request is originating. This is useful if you want to enforce that only requests coming from specific, trusted IPs (such as within the company's VPN or internal network) are allowed to access the notebooks.
Pros: This can ensure that only users within the VPN or on specific trusted networks can access the notebooks. It’s a strong way to limit access to known sources.
Cons: IP validation can be bypassed if a malicious actor gets access to a trusted IP address (e.g., by exploiting vulnerabilities in the VPN). It doesn’t provide deep-level control over other aspects like user identity or roles.
When to use: This is best when access needs to be restricted by location (network or VPN), but it isn't sufficient on its own for a full security posture.
2. Option B: Set up Studio client VPC validation by using the `aws:sourceVpc` IAM policy condition
Explanation: The `aws:sourceVpc` condition allows you to restrict access based on the VPC from which the request is made. This can be useful if the notebooks should only be accessed from specific VPCs that have been designated as trusted.
Pros: VPC-based controls can be effective in a cloud environment where all access is expected to come from specific networks within the company's infrastructure.
Cons: This can be restrictive, and it may not be granular enough for all use cases. VPC validation doesn’t directly tie to the individual user or their role.
When to use: This is useful if all users should only access the notebooks from a specific, controlled VPC and you need to enforce network-level security.
3. Option C: Set up Studio client role endpoint validation by using the `aws:PrimaryTag` IAM policy condition
Explanation: The `aws:PrimaryTag` condition allows you to control access based on the tags applied to resources or roles. In this case, it would be related to the tags on the role used to access the notebooks.
Pros: This option provides flexibility if your access control policies are tied to roles with specific tags, such as tags indicating trusted personnel or teams. It can also integrate with other AWS services.
Cons: This option focuses on roles and tagging but doesn’t address network access concerns directly. If the tags aren’t implemented consistently, it could lead to misconfigurations.
When to use: This would be useful if you have a sophisticated tagging strategy ...
Author: James · Last updated May 7, 2026
An ML engineer needs to merge and transform data from two sources to retrain an existing ML model. One data source consists of .csv files that are stored in an Amazon S3 bucket. Each .csv file consists of millions of records. The other data source is an Amazon Aurora DB cluster.
The result of the merge process must be written to a second S3 bucket. The ...
To meet the requirements of merging and transforming data from two sources (CSV files in Amazon S3 and an Amazon Aurora DB cluster) on a weekly basis with minimal operational overhead, let’s assess the available options based on key factors such as scalability, ease of maintenance, cost, and time to implementation.
Option A: Create a transient Amazon EMR cluster every week. Use the cluster to run an Apache Spark job to merge and transform the data.
Pros:
EMR is a powerful, scalable solution for processing large datasets (like the millions of records in the CSV files).
Apache Spark on EMR can efficiently handle complex transformations and merges.
Good for large-scale data processing.
Cons:
Operational overhead: Setting up and tearing down the cluster every week adds operational overhead. Managing EMR clusters can be complex, especially if you need to deal with configuration, security, and job orchestration.
Cost: Starting a transient EMR cluster for weekly use might incur significant costs, particularly when considering the computational resources needed for large datasets.
Maintenance: Managing the cluster’s performance, debugging, and troubleshooting adds complexity.
Conclusion for Option A: While powerful, this option would add more operational overhead and higher costs for the task, especially if the process needs to be automated weekly.
---
Option B: Create a weekly AWS Glue job that uses the Apache Spark engine. Use DynamicFrame native operations to merge and transform the data.
Pros:
Low operational overhead: AWS Glue is a fully managed ETL service that abstracts away the infrastructure management. It automatically scales resources based on the size of the data.
Cost-effective: Glue charges based on usage (per DPU-hour), and it’s ideal for processing and transforming large data sets on a regular basis.
Ease of use: AWS Glue provides a graphical interface and built-in support for common ETL tasks. Spark's native operations via DynamicFrames can handle CSV-to-DB merging and transformations efficiently.
Integration with S3 and Aurora: AWS Glue integrates natively with both Amazon S3 and Aurora, making it easy to connect to these data sources.
Built-in scheduler: You can schedule the Glue job weekly, reducing manual effort.
Cons:
Potential complexity in transformations: Depending on the complexity of the transformations, there may be a slight learning curve to leverage DynamicFrames and Glue’s native operations.
Conclusion for Option B: This is a fully managed service with low operational overhead, integrates seamlessly with the required data sources, and offers scalability. It’s ideal for periodic, complex ETL tasks like this one.
---
Option C: Create an AWS Lambda function that runs Apa...
Author: FlamePhoenix2025 · Last updated May 7, 2026
An ML engineer has deployed an Amazon SageMaker model to a serverless endpoint in production. The model is invoked by the InvokeEndpoint API operation.
The model's latency in production is higher than the baseline latency in the test environment. The ML engineer thinks that the i...
To confirm or deny whether the increased latency is due to model startup time, the ML engineer needs to monitor the model setup or loading time that happens when the endpoint is first invoked or when there is a cold start.
Let's evaluate each option in detail:
A) Schedule a SageMaker Model Monitor job. Observe metrics about model quality.
This option would help the ML engineer monitor the quality of the model's predictions over time (e.g., bias, data drift, accuracy). However, it does not specifically help measure startup time or latency. This job focuses on monitoring the performance of the model post-invocation, rather than the initial startup time. Therefore, this option is not directly relevant to confirming latency due to model startup.
B) Schedule a SageMaker Model Monitor job with Amazon CloudWatch metrics enabled.
While this option brings in CloudWatch metrics, it still focuses on monitoring model performance, which is not focused on startup latency. The combination of Model Monitor and CloudWatch could be useful for quality metrics but would not provide the granularity needed for model startup metrics. Thus, this option is not the best fit for diagnosing increased latency from startup.
C) Enable Amazon CloudWatch metrics. Observe the ModelSetupTime metric in the SageMaker namespace.
...
Author: Aditya · Last updated May 7, 2026
An ML engineer needs to ensure that a dataset complies with regulations for personally identifiable information (PII). The ML engineer will use the data to train an ML model on Amazon SageMaker instances. SageMaker must not use any...
To determine which solution meets the requirements in the most operationally efficient way, let’s evaluate each option based on key factors like data processing efficiency, ease of integration with SageMaker, data security, and scalability.
Option A: Use the Amazon Comprehend DetectPiiEntities API call to redact the PII from the data. Store the data in an Amazon S3 bucket. Access the S3 bucket from the SageMaker instances for model training.
Data Processing: Amazon Comprehend is a well-suited tool for detecting and redacting PII in text. This is efficient and specialized for text-based PII identification.
Data Storage: S3 is scalable, secure, and widely supported by SageMaker. It offers easy integration for training models.
Efficiency: Redacting the data using Amazon Comprehend and then storing it in S3 is a straightforward, automated process. Accessing data from S3 in SageMaker is native and optimized.
Why Rejected: This option is very operationally efficient. It does not have significant drawbacks compared to others.
Option B: Use the Amazon Comprehend DetectPiiEntities API call to redact the PII from the data. Store the data in an Amazon Elastic File System (Amazon EFS) file system. Mount the EFS file system to the SageMaker instances for model training.
Data Processing: Using Amazon Comprehend to redact the data is a good choice for text-based PII.
Data Storage: While EFS can be mounted to SageMaker, it is often more suited for file-based workloads requiring a shared file system. EFS can introduce additional operational overhead due to the complexity of managing the mount and access controls.
Efficiency: Although EFS works, it’s more complex than S3. EFS is typically used in scenarios requiring shared file systems for distributed workloads, not necessarily for isolated, simple training setups.
Why Rejected: S3 is a more efficient and simpler solution for storing data in most machine learning workflows compared to EFS.
Option C: Use AWS Glue DataBrew to cleanse the dataset of PII. Store the data in an Amazon Elastic File System (Amazon EFS) file system. Mount the EFS file system to the SageMaker instances for model training.
Data Processing: AWS Glue DataBrew is a powerful data preparation tool, but it may be an overkill for simply redacting PII. It is...
Author: Rahul · Last updated May 7, 2026
A company must install a custom script on any newly created Amazon SageMaker notebook instances.
Which solution will meet...
To meet the requirement of installing a custom script on newly created Amazon SageMaker notebook instances with the least operational overhead, let's examine each option in terms of factors like ease of implementation, maintainability, and scalability.
Option A: Create a lifecycle configuration script to install the custom script when a new SageMaker notebook is created.
Advantages:
SageMaker lifecycle configurations are specifically designed for this kind of task (custom script installation). This is a native feature of SageMaker and directly integrates with notebook instances.
Once set up, no additional management is needed.
The solution is highly automated and doesn’t require external services.
The lifecycle configuration runs automatically every time a notebook is created, making it very efficient.
Disadvantages:
Requires some upfront configuration to create the lifecycle script and attach it to each notebook instance. However, this is minimal overhead compared to other options.
This is a straightforward and effective solution for the task. Since lifecycle configurations are tailored for custom installations when creating notebook instances, this option is the least complex.
Option B: Create a custom Amazon Elastic Container Registry (Amazon ECR) image that contains the custom script.
Advantages:
This option allows you to create custom images that can be reused across different notebooks, providing flexibility.
Works well if you need a consistent and controlled environment.
Disadvantages:
Involves more overhead because you need to manage custom ECR images and ensure that they are regularly updated and maintained.
This approach also requires extra configuration when creating SageMaker Studio domains and selecting kernels, adding more complexity.
Not as seamless as lifecycle configurations when you only need to install a script on new notebooks.
This is a more complex solution a...
Author: Sophia · Last updated May 7, 2026
A company is building a real-time data processing pipeline for an ecommerce application. The application generates a high volume of clickstream data that must be ingested, processed, and visualized in near real time. The company needs a solution that supports SQL f...
When evaluating the best option for a real-time data processing pipeline with SQL support for data processing and Jupyter notebooks for interactive analysis, we need to consider the following key factors:
1. Real-time Data Ingestion: The system needs to handle a high volume of clickstream data in near real-time, so the ingestion mechanism should be robust and capable of handling streaming data efficiently.
2. Data Processing: The data needs to be processed with SQL support, meaning it should be possible to write SQL queries for transformation and analysis. Also, support for Jupyter notebooks for interactive analysis is a key requirement.
3. Visualization: Since the data needs to be visualized in near real-time, the solution should integrate easily with a visualization tool that provides real-time updates and interactive capabilities.
4. Ease of Use: The solution should be user-friendly and should allow for smooth integration between ingestion, processing, and visualization.
Let's analyze the options:
Option A: Use Amazon Data Firehose to ingest the data. Create an AWS Lambda function to process the data. Store the processed data in Amazon S3. Use Amazon QuickSight to visualize the data.
Data Ingestion: Amazon Data Firehose is a fully managed service for ingesting streaming data, which works well for high-volume data. However, it is not ideal for handling complex stream processing.
Data Processing: AWS Lambda is an excellent choice for lightweight, event-driven processing, but it lacks SQL support for complex transformations, which is a key requirement in this scenario. Lambda functions are generally more suited to simple processing tasks.
Visualization: Amazon QuickSight can be used for visualization, but it doesn't provide as much interactivity or real-time analytics as required for a dynamic, live data pipeline.
Drawback: The lack of SQL support for data processing and the limited interactivity in QuickSight makes this solution less optimal for real-time, SQL-based analysis and interactive work.
Option B: Use Amazon Kinesis Data Streams to ingest the data. Use Amazon Data Firehose to transform the data. Use Amazon Athena to process the data. Use Amazon QuickSight to visualize the data.
Data Ingestion: Amazon Kinesis Data Streams can handle the real-time ingestion of large volumes of clickstream data, making it a good choice for this scenario.
Data Processing: Amazon Athena supports SQL-based querying over data stored in Amazon S3. However, Athena is designed for batch processing rather than real-time processing. While it can handle SQL queries efficiently, it may not be the best option for real-time processing of streaming data.
Visualization: QuickSight can be used for visualization, but similar to Option A, it is not as suited for interactive, real-time analysis compared to other solutions.
Drawback: While Athena supports SQL, its batch-oriented nature is not ideal for near real-tim...
Author: Nathan · Last updated May 7, 2026
A medical company needs to store clinical data. The data includes personally identifiable information (PII) and protected health information (PHI).
An ML engineer needs to implement a solution to ensure that the...
Key Considerations for Selecting the Solution:
Data Privacy: PII and PHI must be protected and not used during model training.
Automation: The solution should ensure data masking or encryption is done automatically without requiring manual intervention.
Integration with ML Models: The solution should integrate seamlessly with ML model training workflows.
Scalability: The solution must scale as the volume of data increases.
Compliance: The solution must comply with regulations like HIPAA, which govern the use of PII and PHI in healthcare.
---
Option A: Store the clinical data in Amazon S3 buckets. Use AWS Glue DataBrew to mask the PII and PHI before the data is used for model training.
Advantages:
AWS Glue DataBrew is a fully managed service for data preparation, making it easy to clean and transform data before use.
DataBrew supports masking of PII and PHI, ensuring that these sensitive data types do not get exposed in the model training process.
Amazon S3 is scalable and can store large amounts of data.
Drawbacks:
While AWS Glue DataBrew can mask PII and PHI, it is an additional layer of data preparation and might introduce complexity in terms of workflow management.
It could require manual intervention in the setup process to configure the data transformation and masking logic.
Scenario:
This solution is appropriate when there is a need for comprehensive, scalable data storage and masking of both PII and PHI.
Option B: Upload the clinical data to an Amazon Redshift database. Use built-in SQL stored procedures to automatically classify and mask the PII and PHI before the data is used for model training.
Advantages:
Amazon Redshift is a highly scalable data warehouse solution, suitable for storing large datasets.
SQL stored procedures can be used to automatically classify and mask PII/PHI before training.
Drawbacks:
Redshift is more suited for structured data rather than the unstructured data typically found in clinical data.
Masking sensitive data directly within SQL stored procedures can be complex and error-prone, leading to potential issues with compliance and privacy.
It doesn't provide as granular control as some other options like DataBrew or Comprehend when dealing with text-heavy datasets.
Scenario:
This solution is ideal if the data is already structured and stored within Redshift, and there's a need to manage large-scale data warehouses.
Option C: Use Amazon Comprehend to detect a...
Author: Daniel · Last updated May 7, 2026
A company wants to ingest customer payment data into the company's data lake in Amazon S3. The company receives payment data every minute on average. The company wants to analyze the payment data in real time. Then the company wants to ingest the da...
To determine the most operationally efficient solution, let’s evaluate each option based on key factors like data ingestion speed, scalability, ease of integration with Amazon S3 (the data lake), real-time analysis, and simplicity of operation.
A) Use Amazon Kinesis Data Streams to ingest data. Use AWS Lambda to analyze the data in real time.
- Data Ingestion: Amazon Kinesis Data Streams can handle real-time streaming data with low-latency ingestion, making it suitable for continuously ingesting payment data every minute.
- Real-time Analysis: AWS Lambda can trigger functions on each event in the stream, allowing you to process and analyze the data in real time.
- Operational Efficiency: This approach requires you to manage both the stream and the Lambda functions, which can lead to operational complexity, especially as the amount of data grows and the number of Lambda invocations increases.
- Use Case: This is a good solution for real-time analytics, but it requires more effort in managing Lambda functions and stream scaling.
B) Use AWS Glue to ingest data. Use Amazon Kinesis Data Analytics to analyze the data in real time.
- Data Ingestion: AWS Glue is primarily used for batch processing and ETL (extract, transform, load) jobs. It’s not the best fit for real-time ingestion, as it typically operates on scheduled runs rather than continuous streaming data.
- Real-time Analysis: Amazon Kinesis Data Analytics is designed for real-time stream processing, but it expects data to come from streaming sources (like Kinesis Data Streams or Kinesis Data Firehose). Using AWS Glue for ingestion makes it less suitable for real-time analytics.
- Operational Efficiency: While AWS Glue is powerful for batch data processing, it’s not ideal for real-time ingestion, and using it in this scenario introduces unnecessary complexity and delays.
- Use Case: This solution is not a good fit for real-time data ingestion and analysis.
C) Use Amazon Kinesis Data Firehose to ingest data. Use Amazon Kinesis Data Analytics to analyze the data in real time.
- Data Ingestion: Kinesis Data Firehose is designed for easy, scalable, real-time data ingestion and can automatically deliver data to Amazon S3. This matches the requirement of ingesting data into the data lake (S3) and is highly efficient.
- Real-time Analysis: Kinesis Data Analytics allows real-time analysis of streaming data. It integrates directly with Kinesis Data Firehose, providing a seamless ...
Author: Joseph · Last updated Apr 16, 2026
A company runs a website that uses a content management system (CMS) on Amazon EC2. The CMS runs on a single EC2 instance and uses an Amazon Aurora MySQL Multi-AZ DB instance for the data tier. Website images are stored on an Amazon Elastic Block Store (Amazon EBS) volume that is mounted inside the EC2 instance.
...
Key Factors for Improving Performance and Resilience:
1. Scalability: To ensure the website can handle varying traffic loads and scale automatically as needed.
2. Availability and Fault Tolerance: To minimize downtime and ensure the website remains available even in the event of instance or infrastructure failures.
3. Performance Optimization: To ensure fast access to content, such as images, and improve load times.
4. Operational Efficiency: To reduce management overhead and improve the overall resilience of the application.
A) Move the website images into an Amazon S3 bucket that is mounted on every EC2 instance.
- Scalability and Performance: Amazon S3 provides highly scalable, durable storage for large amounts of data. Storing images in S3 can significantly improve performance by offloading the static content from the EC2 instance. S3 is designed for high availability, which can improve the resilience of the website.
- Mounting to EC2 Instances: However, S3 is an object storage service and is not directly "mounted" like a file system. Although you can use tools like S3FS to mount it, it’s not ideal for high-performance file system operations. Directly serving static content via S3 with CloudFront is generally more efficient than mounting S3 on EC2.
- Rejected Reasoning: Mounting S3 is not a standard or optimal approach for high-performance applications that need low-latency file access.
B) Share the website images by using an NFS share from the primary EC2 instance. Mount this share on the other EC2 instances.
- Scalability and Availability: NFS (Network File System) relies on a single instance for the file share, creating a potential single point of failure. In case the primary EC2 instance goes down, all other instances that rely on the NFS share will lose access to the images.
- Performance Issues: NFS can introduce bottlenecks and performance issues, especially as the number of instances increases. It does not scale well compared to solutions like Amazon EFS or S3.
- Rejected Reasoning: This option introduces a potential single point of failure and is not as scalable or resilient as other solutions, especially in distributed environments.
C) Move the website images onto an Amazon Elastic File System (Amazon EFS) file system that is mounted on every EC2 instance.
- Scalability and Availability: Amazon EFS provides scalable, shared file storage that can be mounted on multiple EC2 instances. It is highly available, supports concurrent access from multiple EC2 instances, and automatically scales as the application grows.
- Performance: EFS is designed for use cases that require shared storage with low-latency access. It integrates well with EC2 and provides a highly durable, scalable solution for static content storage like images.
- Resilience: EFS automatically handles failover and replication, making it more resilient than using NFS from a single EC2 instance.
- Selected Reasoning: This option provides shared storage with ex...
Author: Kunal · Last updated Apr 16, 2026
A company runs an infrastructure monitoring service. The company is building a new feature that will enable the service to monitor data in customer AWS accounts. The new feature will call AWS APIs in customer accounts to describe Amazon EC2 instances and read Amaz...
Key Factors for Secure Access:
1. Least Privilege: Ensure the company only has the necessary permissions to access specific resources in the customer accounts (i.e., EC2 instances and CloudWatch metrics).
2. Temporary Credentials: It's important to minimize the security risk by using temporary credentials instead of permanent access keys.
3. Ease of Integration: The solution should be easy to integrate into the company’s service without requiring excessive management overhead.
4. Scalability and Maintenance: The solution should scale easily as the company adds more customers and needs to access more AWS accounts.
A) Ensure that the customers create an IAM role in their account with read-only EC2 and CloudWatch permissions and a trust policy to the company’s account.
- IAM Role and Trust Policy: This option suggests creating a role in the customer's account with appropriate permissions (read-only access to EC2 and CloudWatch). The trust policy allows the company’s AWS account to assume this role.
- Security: Using an IAM role with a trust policy ensures that only the company’s account can assume the role, which adheres to the principle of least privilege. Temporary credentials are granted when assuming the role, minimizing the risk of long-term access key exposure.
- Ease of Integration: This solution is easy to implement, as it involves setting up an IAM role in the customer's account and allowing the company’s account to assume it. The company would call the `AssumeRole` API to retrieve temporary credentials.
- Scalability and Maintenance: This approach scales well and is secure since temporary credentials are automatically rotated, reducing the risk of compromised access.
Why this is selected: This option offers a secure, scalable, and manageable way to access customer accounts by using IAM roles with temporary credentials. It aligns well with AWS best practices for cross-account access.
B) Create a serverless API that implements a token vending machine to provide temporary AWS credentials for a role with read-only EC2 and CloudWatch permissions.
- Token Vending Machine: While providing temporary credentials via a token vending machine can work, this solution adds complexity. You would need to manage the vending machine service, ensure it securely generates credentials, and implement appropriate expiration and validation for those credentials.
- Operational Overhead: Managing an API for issuing temporary credentials adds additional operational complexity and may not be as simple as using AWS-native features like IAM roles with trust policies.
- Security Risks: If the token vending machine is not implemented securely, it could introduce vulnerabilities, especially with regards to how tokens are managed and transmitted.
- Rejected Reasoning: While this...
Author: Ella · Last updated Apr 16, 2026
A company needs to connect several VPCs in the us-east-1 Region that span hundreds of AWS accounts. The company's networking team has its own AWS account to manage the cloud netwo...
Key Factors for Operational Efficiency:
1. Scalability: The solution should easily scale to accommodate hundreds of VPCs across multiple accounts without requiring significant manual configuration.
2. Centralized Management: The solution should allow for centralized management of the network, reducing the complexity of managing connections individually.
3. Automation and Maintenance: The solution should minimize the ongoing operational effort, especially as new VPCs are added.
4. Cost and Performance: The solution should be cost-effective while maintaining good performance for inter-VPC communication.
A) Set up VPC peering connections between each VPC. Update each associated subnets route table.
- Scalability Issues: VPC peering requires a separate connection between each pair of VPCs. With hundreds of VPCs, this leads to a significant number of peering connections and route table updates. Specifically, the number of peering connections grows quadratically (O(n^2)), making it highly impractical for a large-scale environment.
- Operational Overhead: Managing hundreds of peering connections and route tables would become difficult and error-prone. Each new VPC added would require multiple manual configurations.
- Rejected Reasoning: While this approach might work for a small number of VPCs, it is not operationally efficient or scalable for a large number of VPCs across multiple accounts.
B) Configure a NAT gateway and an internet gateway in each VPC to connect each VPC through the internet.
- Inefficiency and Complexity: Using NAT gateways and internet gateways for inter-VPC communication is not a recommended approach because it would route traffic through the internet, increasing latency and operational complexity. Managing multiple NAT gateways and internet gateways is inefficient, especially when dealing with cross-account traffic.
- Security Risks: Exposing VPC traffic to the internet increases the security risks as the VPCs would be communicating over public networks.
- Rejected Reasoning: This approach is not secure, efficient, or scalable for inter-VPC communication, especially in a multi-account, large-scale environment.
C) Create an AWS Transit Gateway in the networking team’s AWS account. Configure static routes from each VPC.
- Scalability and Centralized Management: AWS Transit Gateway (TGW) is specifically designed for scenarios like this, where multiple VPCs nee...
Author: Maya · Last updated Apr 16, 2026
A company has Amazon EC2 instances that run nightly batch jobs to process data. The EC2 instances run in an Auto Scaling group that uses On-Demand billing. If a job fails on one instance, another instance will reprocess the job. The batch jobs run between 12:00 AM and 06:00 ...
Key Factors in Choosing the Most Cost-Effective Solution:
1. Cost-Effectiveness: The solution should minimize costs while still meeting the requirements of the batch job, including reliability and scalability.
2. Instance Availability: The batch jobs run during specific times (12:00 AM to 06:00 AM), so the solution should ensure that instances are available during this window, with minimal interruptions.
3. Scalability and Reliability: The solution must ensure that enough instances are available to process the batch jobs even if a job fails, which might require additional capacity.
4. Flexibility in Scaling: The solution should be able to automatically adjust the number of EC2 instances needed based on the job requirements.
A) Purchase a 1-year Savings Plan for Amazon EC2 that covers the instance family of the Auto Scaling group that the batch job uses.
- Cost Savings: A Savings Plan offers significant savings (compared to On-Demand) in exchange for a commitment to a certain amount of usage over a 1- or 3-year period.
- Flexibility: A Savings Plan provides flexibility in instance types, sizes, and regions, which can be helpful if the company decides to change instance types or other parameters.
- Usage Pattern: Since the batch jobs run only for a limited time each day (12:00 AM to 06:00 AM), this solution might not be as cost-effective as it locks the company into a commitment to pay for EC2 usage during a fixed term, which may not fully match the batch job's usage pattern.
- Rejected Reasoning: Although it provides savings, it may not be as flexible or optimized for this type of intermittent workload (daily batch jobs) because the company only needs instances during a specific window.
B) Purchase a 1-year Reserved Instance for the specific instance type and operating system of the instances in the Auto Scaling group that the batch job uses.
- Cost Savings: Reserved Instances provide cost savings by committing to a specific instance type for a long duration (1 year or more).
- Limitations on Flexibility: Reserved Instances are not as flexible as Savings Plans and are tied to specific instance types, sizes, and operating systems, which may be less optimal if the company needs to scale out dynamically based on demand.
- Usage Pattern: The batch job runs only during a specific time window each day, which means the company would still be paying for instance capacity that isn't always in use.
- Rejected Reasoning: Reserved Instances are best for predictable, long-term workloads. The batch job’s intermittent usage pattern doesn't fully align with the commitment required for Reserved Instances, making this option less cost-efficient.
C) Create...
Author: Ethan Smith · Last updated Apr 16, 2026
A social media company is building a feature for its website. The feature will give users the ability to upload photos. The company expects significant increases in demand during large events and must ensure that the website can handle...
To determine the most scalable solution for the social media company’s photo upload feature, we need to assess each option based on scalability, handling large amounts of traffic, and minimizing operational overhead. Let's evaluate each option:
A) Upload files from the user's browser to the application servers. Transfer the files to an Amazon S3 bucket.
- This approach involves the user first uploading files to the application servers, which then forward the files to Amazon S3.
- Scalability concerns: The application servers become a bottleneck because they must handle both the user traffic and the file uploads, requiring significant compute and network resources. Scaling the servers to handle high traffic during large events can be costly and complex.
- Operational overhead: The application servers need to process each upload, which requires maintaining and scaling those servers under heavy load.
- Rejection reason: This is not the most scalable option because it adds complexity and a central point of failure (the application servers).
B) Provision an AWS Storage Gateway file gateway. Upload files directly from the user's browser to the file gateway.
- AWS Storage Gateway is typically used to integrate on-premises environments with AWS cloud storage. The File Gateway component allows file-based access to Amazon S3 and is more suited for hybrid cloud scenarios.
- Scalability concerns: File Gateway isn’t designed to handle high levels of direct user traffic in web applications; it serves as a bridge between on-premises environments and S3, not a cloud-native solution for file uploads from a website.
- Operational overhead: This solution is more complex and not ideal for web-scale uploads.
- Rejection reason: It introduces unnecessary complexity and is not built for web-scale file uploads.
C) Generate Amazon S3 presigned URLs in the application. Upload files directly from the user's browser into an S3 bucket.
- Presigned URLs are a powerful AWS feature where t...
Author: Ethan Smith · Last updated Apr 16, 2026
A company has a web application for travel ticketing. The application is based on a database that runs in a single data center in North America. The company wants to expand the application to serve a global user base. The company needs to deploy the application to multiple AWS Regions. Average latency must be less than 1 second on updates to the reservation database.
The company wants to have separate deployments of its web platform across multiple...
To determine the best solution for the company's web application that needs to be deployed across multiple AWS Regions with a globally consistent primary reservation database and low latency, we need to evaluate the different options.
A) Convert the application to use Amazon DynamoDB. Use a global table for the center reservation table. Use the correct Regional endpoint in each Regional deployment.
- Scalability & Global Consistency: Amazon DynamoDB is a fully managed NoSQL database that supports global tables for multi-region, fully replicated tables. It provides fast reads and writes and handles the consistency across regions, which ensures that data is available across multiple AWS regions with low latency.
- Latency: DynamoDB Global Tables can be set up to synchronize changes in less than 1 second, making it suitable for the company’s latency requirement.
- Rejection reason: The company already has a relational database for the application (assumed from the context of "reservation database"), and converting to a NoSQL solution may require significant application changes. It may not be the most suitable option if the company relies heavily on relational features (like complex joins, ACID transactions, etc.) which DynamoDB does not support as effectively as a relational database.
B) Migrate the database to an Amazon Aurora MySQL database. Deploy Aurora Read Replicas in each Region. Use the correct Regional endpoint in each Regional deployment for access to the database.
- Scalability & Global Consistency: Aurora provides high performance and scalability, and Aurora Global Databases can be used to replicate data between regions with low latency. However, Aurora Global Databases are specifically designed for cross-region replication with minimal latency (under 1 second). This solution ensures a globally consistent primary database with Aurora acting as the single source of truth.
- Latency: Aurora Global Databases are designed for low-latency cross-region replication, making them a good fit for the requirement of updates to the reservation database being completed in less than 1 second.
- Reason for selection: This solution is ideal because Aurora Global Databases provide a globally consistent database while supporting low-latency replication. Each region can have a read replica, and writes are centralized in the primary region, meeting the company's requirement for a single primary database with low latency.
C) Migrate the database to an Amazon RDS for MySQL database. Deploy MySQL read replica...
Author: RadiantJaguar56 · Last updated Apr 16, 2026
A company has migrated multiple Microsoft Windows Server workloads to Amazon EC2 instances that run in the us-west-1 Region. The company manually backs up the workloads to create an image as needed.
In the event of a natural disaster in the us-west-1 Region, the company wants to recover workloads quickly in the us-west-2 Region. The company wants no more than 24 hours of data loss on the EC2 insta...
The company's goal is to automate backups of EC2 instances, ensuring minimal data loss (no more than 24 hours) and enabling a quick recovery in the event of a disaster in the us-west-1 Region, while also minimizing administrative effort. Let's evaluate each of the proposed solutions based on these criteria:
A) Create an Amazon EC2-backed Amazon Machine Image (AMI) lifecycle policy to create a backup based on tags. Schedule the backup to run twice daily. Copy the image on demand.
- Scalability & Automation: An AMI lifecycle policy helps automate the creation of backups based on tags, but it requires manual intervention to copy the AMI to another region (i.e., "copy the image on demand"). This introduces manual overhead and doesn't fully automate the backup process, especially for the disaster recovery aspect.
- Recovery and Data Loss: This solution would not provide an automated mechanism for copying the AMIs to another region as part of the disaster recovery process. While backups are automated, the process of copying to another region still requires manual steps, which doesn't meet the requirement for minimizing data loss and ensuring fast recovery in the event of a disaster.
- Rejection reason: It requires manual intervention for copying the image to another region and does not meet the requirement for minimizing data loss.
B) Create an Amazon EC2-backed Amazon Machine Image (AMI) lifecycle policy to create a backup based on tags. Schedule the backup to run twice daily. Configure the copy to the us-west-2 Region.
- Scalability & Automation: This solution allows AMI backups to be automated, scheduled twice daily, and automatically copied to us-west-2. This meets the requirement for disaster recovery in another region, and the backup is automated without manual intervention.
- Recovery and Data Loss: By automatically copying the AMI to us-west-2, this solution reduces the risk of data loss and allows for a faster recovery in the event of a disaster in us-west-1. The backup frequency of twice daily is within the 24-hour data loss limit.
- Reason for selection: This option provides the least administrative effort because it fully automates the backup and cross-region copy process, ensuring minimal data loss and quick recovery.
C) Create backup vaults in us-west-1 and in us-west-2 by using AWS Backup. Create a backup plan for the EC2 instances based on tag values. Create an AWS Lambda function to run as a scheduled job to copy the backup data to us-west-2.
- Scalability & Automation: While AWS Backup can automate the backup process, this solution requires the creation of a custom Lambda function to handle the copy of backup data between regions. This adds significant administrative overhead and complexity, as custom Lambda functions would need to be written, tested, and maintained.
- Recovery and Data Lo...
Author: Ava · Last updated Apr 16, 2026
A company operates a two-tier application for image processing. The application uses two Availability Zones, each with one public subnet and one private subnet. An Application Load Balancer (ALB) for the web tier uses the public subnets. Amazon EC2 instances for the application tier use the private subnets.
Users report that the application is running more slowly than expected. A security audit of the web server log files shows that the application is receiving millions of illegitimate requests from a small numbe...
The company is experiencing performance issues due to millions of illegitimate requests from specific IP addresses. The goal is to quickly mitigate the impact of these malicious requests while further investigation into a more permanent solution is underway. Let's evaluate the different options based on their effectiveness in resolving this issue.
A) Modify the inbound security group for the web tier. Add a deny rule for the IP addresses that are consuming resources.
- Effectiveness: Security groups are stateful and control traffic to EC2 instances at the instance level. However, security groups do not support explicit deny rules; they only allow you to define allowed traffic. This means you cannot directly deny access based on specific IP addresses in a security group.
- Rejection reason: Since security groups only allow specifying "allow" rules and not "deny" rules, this option will not be effective for blocking the illegitimate IPs.
B) Modify the network ACL for the web tier subnets. Add an inbound deny rule for the IP addresses that are consuming resources.
- Effectiveness: Network ACLs are stateless and operate at the subnet level, controlling traffic entering or leaving a subnet. Unlike security groups, network ACLs support both "allow" and "deny" rules, and they work at a broader level (subnet). By adding a deny rule for the offending IPs, traffic from these IPs can be blocked before it even reaches the web servers.
- Advantage: This is an effective and quick solution because it will prevent the malicious requests from reaching the ALB, improving the overall performance of the web tier.
- Reason for selection: This option is effective in immediately blocking unwanted traffic from the specific IP addr...
Author: Noah · Last updated Apr 16, 2026
A global marketing company has applications that run in the ap-southeast-2 Region and the eu-west-1 Region. Applications that run in a VPC in eu-west-1 need to communicate securely with databases that...
In this case, the company needs to allow secure communication between applications running in the eu-west-1 region and databases in ap-southeast-2 region. Let's evaluate the different options and determine which one best meets the requirements for secure and efficient communication across regions.
A) Create a VPC peering connection between the eu-west-1 VPC and the ap-southeast-2 VPC. Create an inbound rule in the eu-west-1 application security group that allows traffic from the database server IP addresses in the ap-southeast-2 security group.
- Effectiveness: VPC peering allows direct communication between VPCs, and creating security group rules based on IP addresses can control access. However, VPC peering doesn't directly allow referencing security groups across regions. Also, security group rules generally need to be configured based on security group IDs, not IP addresses when dealing with cross-region traffic.
- Rejection reason: The approach of allowing traffic based on IP addresses is not ideal for cross-region communications, and VPC peering between regions requires security group referencing, not IP-based access.
B) Configure a VPC peering connection between the ap-southeast-2 VPC and the eu-west-1 VPC. Update the subnet route tables. Create an inbound rule in the ap-southeast-2 database security group that references the security group ID of the application servers in eu-west-1.
- Effectiveness: This solution correctly configures a VPC peering connection and enables communication between regions. It also sets up security group rules to allow traffic based on the security group ID of the application servers. This is a more secure and scalable solution because it uses security group IDs, ensuring that traffic is allowed only from the intended application servers.
- Reason for selection: The use of security group IDs for cross-region access is a best practice in AWS, as it provides an additional layer of security by tightly controlling which entities can communicate across th...
Author: StarlightBear · Last updated Apr 16, 2026
A company is developing software that uses a PostgreSQL database schema. The company needs to configure multiple development environments and databases for the company's developers. On average, each development environment is used f...
To determine the most cost-effective solution, let's analyze each of the options based on key factors such as resource utilization, cost-efficiency, and use-case relevance. The main goal is to ensure the most cost-effective use of resources for the development environments and PostgreSQL databases.
Option A: Configure each development environment with its own Amazon Aurora PostgreSQL database
- Overview: Amazon Aurora is a fully managed database service that is compatible with PostgreSQL. Aurora is designed for high availability, scalability, and performance. However, it is typically more expensive than other database solutions because of its performance optimizations.
- Cost Analysis: Aurora is generally priced higher than RDS for PostgreSQL and does not offer the same level of cost efficiency for smaller, less intensive development environments. Given that the development environments are not likely to be used full-time, this option might lead to unnecessary overprovisioning of resources.
- Conclusion: While Aurora offers great performance and scalability, it's more suited for production workloads requiring high availability, which is not necessary for most development environments.
Option B: Configure each development environment with its own Amazon RDS for PostgreSQL Single-AZ DB instances
- Overview: Amazon RDS for PostgreSQL is a fully managed database service with the option for Single-AZ (Availability Zone) deployments. This option is cheaper than Amazon Aurora because it is a simpler, less complex service, without automatic failover to multiple AZs (which might be unnecessary for development environments).
- Cost Analysis: For a development environment that is only used for half of the day, Single-AZ instances can be cost-effective, especially when combined with options like instance size scaling, reserved instances, or stopping instances during non-working hours. This solution gives developers a dedicated database with better cost efficiency compared to Aurora.
- Conclusion: This option is a solid choice for development environments as it provides enough resources while being more affordable than Aurora.
Option C...