Microsoft Practice Questions, Discussions & Exam Topics by our Authors
HOTSPOT -
You are creating dimensions for a data warehouse in an Azure Synapse Analytics dedicated SQL pool.
You create a table by using the Transact-SQL statement shown in the following exhibit.
Use the drop-down menus to select the answer choice that completes each st...
Author: Benjamin · Last updated May 27, 2026
You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL pool. The table contains purchases from suppliers for a retail store. FactPurchase will contain the following columns.
FactPurchase will have 1 million rows of data added daily and will contain three years of data.
Transact-SQL queries similar to the following query will be executed daily.
SELECT -
SupplierKey, StockItemKe...
To minimize query times in Azure Synapse Analytics for a FactPurchase table containing purchase data for a retail store, the choice of table distribution is crucial for query performance, particularly given the query you provided that filters by DateKey and aggregates by SupplierKey and StockItemKey.
Let's evaluate each distribution option:
A) Replicated
- Reason for rejection: Replicated tables are best for small reference tables that can fit into memory on all nodes. However, in this case, the FactPurchase table is large, containing 1 million rows of data added daily and spanning three years. Replicating such a large table across all nodes would introduce significant overhead in terms of memory and storage usage, leading to inefficient processing and longer query times. Replication would not scale well for large fact tables, especially when queries involve a lot of data filtering and aggregation.
B) Hash-distributed on PurchaseKey
- Reason for rejection: While hash-distribution can be effective for large tables, distributing by PurchaseKey would not help the query you provided. In your query, you are filtering by DateKey and aggregating by SupplierKey and StockItemKey, not by PurchaseKey. Distributing on PurchaseKey would likely result in uneven data distribution and could lead to inefficient query processing since the filter (`WHERE DateKey >= 20210101 AND DateKey <= 20210131`) would require scanning data across multiple distributions, resulting in high query times due to unnecessary data shuffling.
C) Round-robin
- Reason for rejection: Round-robin distribu...
Author: Sofia · Last updated May 27, 2026
You are implementing a batch dataset in the Parquet format.
Data files will be produced be using Azure Data Factory and stored in Azure Data Lake Storage Gen2. The files will be consumed by an Azure Synapse Analytics s...
To minimize storage costs for the batch dataset stored in Parquet format and consumed by Azure Synapse Analytics serverless SQL pool, it's important to optimize both the storage efficiency and query performance. Let's evaluate the available options based on storage optimization.
A) Use Snappy compression for the files
- Reason for selection: Snappy compression is commonly used with Parquet files because it offers a good balance between compression ratio and decompression speed. It significantly reduces the storage space required while maintaining reasonable performance during data access. Parquet files inherently support columnar storage, and compressing them with Snappy can further reduce storage costs. Since Snappy is a widely adopted compression method for Parquet and balances both compression and query performance, this is the best option for minimizing storage costs.
B) Use OPENROWSET to query the Parquet files
- Reason for rejection: OPENROWSET is a method used to query external data in Azure Synapse Analytics but doesn't directly influence storage costs. It is simply a mechanism for querying external datasets, not a storage optimization technique. While OPENROWSET can be useful for querying Parquet files, it won't reduce the storage size of those files. Therefore, it doesn't address the need to minimize storage...
Author: Zara · Last updated May 27, 2026
DRAG DROP -
You need to build a solution to ensure that users can query specific files in an Azure Data Lake Storage Gen2 account from an Azure Synapse Analytics serverless SQL pool.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the corre...
Author: Aditya · Last updated May 27, 2026
You are designing a data mart for the human resources (HR) department at your company. The data mart will contain employee information and employee transactions.
From a source system, you have a flat extract that has the following fields:
* EmployeeID
FirstName -
* LastName
* Recipient
* GrossAmount
* TransactionID
* GovernmentID
* NetAmountPaid
* TransactionDate
You need to design a star schema data model in an Azure...
When designing a star schema data model for a data mart in Azure Synapse Analytics for the Human Resources (HR) department, you need to structure the data to optimize for both query performance and data organization. The goal of a star schema is to have fact tables containing numerical measures and dimension tables containing descriptive attributes that are used for filtering and grouping in queries.
Evaluating the options:
A) A dimension table for Transaction
- Reason for rejection: The Transaction entity in the provided flat extract includes transactional details like TransactionID, TransactionDate, GrossAmount, and NetAmountPaid, which are numeric and represent facts or measures. These values should be part of a fact table, not a dimension table. A dimension table generally contains descriptive attributes about entities like employees, products, or time, but not transactional data.
B) A dimension table for EmployeeTransaction
- Reason for rejection: EmployeeTransaction is not a natural entity for a dimension table. A dimension table describes entities in a way that helps to slice the data (e.g., describing employees, transactions, etc.), but the EmployeeTransaction is more like a relationship between Employee and Transaction rather than a descriptive attribute for filtering or grouping. Therefore, this does not fit well as a dimension.
C) A dimension table for Employee
- Reason for selection: The Employee dimension table would contain descriptive attributes about employees, such as Emp...
Author: Leah · Last updated May 27, 2026
You are designing a dimension table for a data warehouse. The table will track the value of the dimension attributes over time and preserve the history of the data by adding new rows as ...
When designing a dimension table in a data warehouse that will track the value of the dimension attributes over time and preserve the history of the data by adding new rows as the data changes, the goal is to ensure that changes in the dimension attributes are tracked in a way that allows for historical analysis.
Let's evaluate the types of Slowly Changing Dimensions (SCD):
A) Type 0
- Reason for rejection: Type 0 is essentially a "no-change" type where the dimension values are not updated once they are stored in the dimension table. In this type, no history is kept, and when changes occur, the previous values are not preserved. This option does not track historical data and would not meet the requirement to "preserve the history of the data" as stated in the question.
B) Type 1
- Reason for rejection: Type 1 involves overwriting the existing dimension data when it changes. This approach does not preserve the historical data since the previous values are lost when an update occurs. It is useful when you want to keep only the most recent data, but it doesn't fit the requirement to track the history of changes.
C) Type 2
- Reason for selection: Type 2 is the best choice when you need to track the history o...
Author: Charlotte · Last updated May 27, 2026
DRAG DROP -
You have data stored in thousands of CSV files in Azure Data Lake Storage Gen2. Each file has a header row followed by a properly formatted carriage return (/ r) and line feed (/n).
You are implementing a pattern that batch loads the files daily into a dedicated SQL pool in Azure Synapse Analytics by using PolyBase.
You need to skip the header row when you import the files into the data warehouse. Before building the loading pattern, you need to prepare the required database objects in Azure Synapse Analytics.
Which thr...
Author: Liam · Last updated May 27, 2026
HOTSPOT -
You are building an Azure Synapse Analytics dedicated SQL pool that will contain a fact table for transactions from the first half of the year 2020.
You need to ensure that the table meets the following requirements:
* Minimizes the processing time to delete data that is older than 10 years
* Minimizes the I/O for queries that use year-to-date values
How should...
Author: Zara1234 · Last updated May 27, 2026
You are performing exploratory analysis of the bus fare data in an Azure Data Lake Storage Gen2 account by using an Azure Synapse Analytics serverless SQL pool.
You execute the Transact...
To answer this question, we need to carefully analyze the Transact-SQL query and the options provided.
Let's break down the analysis step by step:
Scenario Breakdown
1. Query Context: The query is executed in an Azure Synapse Analytics serverless SQL pool to perform exploratory analysis of bus fare data in an Azure Data Lake Storage Gen2 account.
2. Analysis Goal: The query intends to select CSV files that are located in a specific subfolder (e.g., `tripdata_2020`).
3. File Naming Pattern: The query's `WHERE` condition likely filters files based on file names or patterns, which is crucial for determining the result.
Exploring the Options
Option A: Only CSV files in the `tripdata_2020` subfolder
- This option suggests that the query will only consider CSV files that reside specifically in the `tripdata_2020` subfolder.
- Reason for rejection: If the query is written to filter based on a file naming pattern (e.g., files starting with "tripdata_2020"), it won't necessarily limit to the subfolder. It could be looking for files anywhere, not just that subfolder.
Option B: All files that have file names beginning with "tripdata_2020"
- This option would capture all files that start with the name "tripdata_2020" regardless of their file type or location.
- Reason for rejection: This is an overly broad option. If the query specifically targets CSV files, it would not include non-CSV files, so this option doesn't match the typical pattern expected from a query filtering by file type and name.
...
Author: Siddharth · Last updated May 27, 2026
DRAG DROP -
You use PySpark in Azure Databricks to parse the following JSON input.
You need to output the data in the following tabular format.
How should you complete the PySpark code? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. You m...
Author: Isabella · Last updated May 27, 2026
HOTSPOT -
You are designing an application that will store petabytes of medical imaging data.
When the data is first created, the data will be accessed frequently during the first week. After one month, the data must be accessible within 30 seconds, but files will be accessed infrequently. After one year, the data will be accessed infrequently but must be accessible within five minutes.
You need to select a storage strategy for the data. The solution m...
Author: Ahmed97 · Last updated May 27, 2026
You have an Azure Synapse Analytics Apache Spark pool named Pool1.
You plan to load JSON files from an Azure Data Lake Storage Gen2 container into the tables in Pool1. The structure and data types vary by file.
You need to loa...
To address the scenario, let's evaluate each of the options and determine the best approach for loading JSON files from an Azure Data Lake Storage Gen2 container into tables in the Azure Synapse Analytics Apache Spark pool Pool1, while maintaining the source data types.
Key Factors for the Solution
- Data types: The solution must maintain the original data types from the JSON files when loading them into the tables.
- File structure: The structure and data types vary by file, which suggests that flexibility in handling different data structures is required.
Option A: Use a Conditional Split transformation in an Azure Synapse data flow
- Explanation: The Conditional Split transformation in Azure Synapse data flow allows you to split data into multiple streams based on certain conditions. While it is useful for transforming data before writing it to a destination, it does not directly load data into Spark tables nor does it handle complex schema inference or type preservation from raw JSON files.
- Reason for rejection: This option is more suited for ETL operations where you need to filter or transform data after it's loaded. It doesn’t offer a way to load the data with the required structure or type preservation directly from a JSON file.
Option B: Use a Get Metadata activity in Azure Data Factory
- Explanation: The Get Metadata activity in Azure Data Factory is primarily used to retrieve metadata (e.g., file properties, column names) of the files stored in Azure Data Lake Storage. This can be helpful for dynamically identifying files or directories and their metadata but does not load the data or preserve the data types.
- Reason for rejection: While useful for identifying file properties, this option doesn't help in loading the data or maintaining data types. It’s more of a precursor to other data flow operations.
Option C: Load the data by using the OPENROWSET Transact-SQL command in an Azure Synapse Analytics serverless SQL pool
- Explanation:...
Author: Victoria · Last updated May 27, 2026
You have an Azure Databricks workspace named workspace1 in the Standard pricing tier. Workspace1 contains an all-purpose cluster named cluster1.
You need to reduce the time it takes for cluster1...
To address the scenario, let's break down each option and evaluate which one is best suited to reduce the time it takes for cluster1 to start and scale up while minimizing costs.
Key Factors for the Solution:
- Minimize start and scale-up time: The goal is to reduce the time it takes for cluster1 to start and scale up.
- Minimize costs: The solution must help reduce time without incurring unnecessary costs.
- Cluster configuration: The solution should focus on making cluster start and scale-up more efficient.
Option A: Configure a global init script for workspace1
- Explanation: Init scripts in Azure Databricks are used to customize the environment of a cluster when it starts. While these can be useful for configuring dependencies, environment variables, or other configurations when a cluster starts, they do not directly impact the scaling time of the cluster.
- Reason for rejection: Although init scripts are helpful for configuring clusters, they do not help reduce the time it takes to start and scale up a cluster. The start time is more related to the size of the cluster, its configuration, and whether there's an optimized pool for the cluster to use.
Option B: Create a cluster policy in workspace1
- Explanation: Cluster policies in Azure Databricks allow you to enforce specific cluster configurations, ensuring that clusters are provisioned with certain characteristics like instance types, sizes, or maximum cost. However, cluster policies are used primarily for governance and controlling how clusters are created, not directly for performance improvements in terms of start or scale-up time.
- Reason for rejection: While cluster policies can be useful for standardizing and controlling cluster configurations, they don’t directly affect cluster start and scale-up times. The policy helps ensure the cluster meets specific criteria but doesn't speed up the scaling or startup process.
...
Author: Ethan · Last updated May 27, 2026
HOTSPOT -
You are building an Azure Stream Analytics job that queries reference data from a product catalog file. The file is updated daily.
The reference data input details for the file are shown in the Input exhibit. (Click the Input tab.)
The storage account container view is shown in the Refdata exhibit. (Click the Refdata tab.)
You need to configure the Stream Analytics job to pick up t...
Author: Chloe · Last updated May 27, 2026
HOTSPOT -
You have the following Azure Stream Analytics query.
For each of the following statements, select Yes if the statement is true. Otherwise, sel...
Author: Amira · Last updated May 27, 2026
HOTSPOT -
You are building a database in an Azure Synapse Analytics serverless SQL pool.
You have data stored in Parquet files in an Azure Data Lake Storege Gen2 container.
Records are structured as shown in the following sample.
{
"id": 123,
"address_housenumber": "19c",
"address_line": "Memory Lane",
"applicant1_name": "Jane",
"applicant2_name": "Dev"
}
The records contain two applicants at most.
You need to build a table that includes onl...
Author: MoonlitPantherX · Last updated May 27, 2026
HOTSPOT -
You have an Azure Synapse Analytics dedicated SQL pool named Pool1 and an Azure Data Lake Storage Gen2 account named Account1.
You plan to access the files in Account1 by using an external table.
You need to create a data source in Pool1 that you can reference when you create the external table.
How should you complete the ...
Author: Ethan Smith · Last updated May 27, 2026
You have an Azure subscription that contains an Azure Blob Storage account named storage1 and an Azure Synapse Analytics dedicated SQL pool named
Pool1.
You need to store data in storage1. The data will be read by Pool1. The solution must meet the following requirements:
Enable Pool1 to skip columns and rows that are ...
To address the scenario, let's evaluate each of the options and determine which file format best meets the requirements outlined:
Key Requirements:
- Enable Pool1 to skip unnecessary columns and rows in a query: This implies that the file format should support columnar storage to allow efficient querying by skipping unnecessary data.
- Automatically create column statistics: The file format should allow for the automatic generation of statistics at the column level to optimize query performance.
- Minimize the size of files: The format should support efficient compression to reduce file size.
Option A: JSON
- Explanation: JSON (JavaScript Object Notation) is a flexible, human-readable file format that is often used for semi-structured data. It does not have built-in support for columnar storage or optimizations like skipping unnecessary columns and rows.
- Reason for rejection: JSON is a row-based format, and it does not support efficient querying by column, nor does it create column statistics automatically. Additionally, it is less efficient in terms of file size compared to columnar formats, as it does not offer compression mechanisms as efficiently as other formats.
- Scenario use: JSON could be useful for highly dynamic or flexible data structures but is not optimal for query performance or storage efficiency in this case.
Option B: Parquet
- Explanation: Parquet is a columnar file format designed to efficiently store large datasets. It allows for columnar storage, meaning only the necessary columns are read when querying. This aligns with the requirement to skip unnecessary columns in queries. Parquet files also support automatic column statistics generation, which helps optimize query execution. Furthermore, Parquet is highly compressed and significantly reduces file sizes compared to row-based formats.
- Why selected: Parquet meets all the requirements:
- It supports columnar storage, enabling the skipping of unnecessary columns in queries.
- It allows for automatic column statistics.
- It is compressed, minimizing the size of files.
- Scenario use: Parquet is optimal for large-scale data processing, where you need both p...
Author: Ava · Last updated May 27, 2026
DRAG DROP -
You plan to create a table in an Azure Synapse Analytics dedicated SQL pool.
Data in the table will be retained for five years. Once a year, data that is older than five years will be deleted.
You need to ensure that the data is distributed evenly across partitions. The solution must minimize the amount of time required to delete old data.
How should you complete the Transact-SQL statement? To answer, drag the appropriate values to the correct targets. ...
Author: Noah Williams · Last updated May 27, 2026
HOTSPOT -
You have an Azure Data Lake Storage Gen2 service.
You need to design a data archiving solution that meets the following requirements:
* Data that is older than five years is accessed infrequently but must be available within one second when requested.
* Data that is older than seven years is NOT accessed.
* Costs must be minimized while maintaining the required av...
Author: StarryEagle42 · Last updated May 27, 2026
HOTSPOT -
You plan to create an Azure Data Lake Storage Gen2 account.
You need to recommend a storage solution that meets the following requirements:
* Provides the highest degree of data resiliency
* Ensures that content remains available for writes if a primary data center fails
What should you include in the ...
Author: MoonlitPantherX · Last updated May 27, 2026
You need to implement a Type 3 slowly changing dimension (SCD) for product category data in an Azure Synapse Analytics dedicated SQL pool.
You have a table that was created by using the following Transact-SQL statement.
Which two columns should you add to the table? Each cor...
Author: Matthew · Last updated May 27, 2026
DRAG DROP -
You have an Azure subscription.
You plan to build a data warehouse in an Azure Synapse Analytics dedicated SQL pool named pool1 that will contain staging tables and a dimensional model.
Pool1 will contain the following tables.
You need to design the table storage for pool1. The solution must meet the following requirements:
* Maximize the performance of data loading operations to Staging.WebSessions.
* Minimize query times for reporting queries against the dimensional model.
Which type of table distribution should you use for each table? To answer, drag the appropriate table di...
Author: FlamePhoenix2025 · Last updated May 27, 2026
HOTSPOT -
You have an Azure Synapse Analytics dedicated SQL pool.
You need to create a table named FactInternetSales that will be a large fact table in a dimensional model. FactInternetSales will contain 100 million rows and two columns named SalesAmount and OrderQuantity. Queries executed on FactInternetSales will aggregate the values in SalesAmount and OrderQuantity from the last year for a specific product. The solution must minimize the d...
Author: Emma · Last updated May 27, 2026
You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1. Table1 contains the following:
* One billion rows
* A clustered columnstore index
* A hash-distributed column named Product Key
* A column named Sales Date that is of the date data type and cannot be null
Thirty million rows will be added to Table1 each month.
You n...
To determine the optimal partitioning strategy for Table1 in Azure Synapse Analytics, let's consider the following key factors:
Key Factors:
1. Data Characteristics:
- The table has one billion rows, and 30 million rows will be added each month.
- The partitioning should focus on optimizing query performance and data loading.
2. Partitioning Strategy:
- The table already has a clustered columnstore index (CCI) and is hash-distributed on the Product Key. This setup is already optimized for storage and query performance.
- The goal is to partition the table based on the Sales Date column (which is of the date data type) to optimize queries, especially those that filter based on this date, and to handle monthly data loading efficiently.
Analyzing the Partitioning Frequency Options:
Option A: Once per month
- Explanation: Given that 30 million rows are added each month, partitioning the table on a monthly basis means each partition will contain approximately 30 million rows. This is reasonable in terms of size and fits well with the query workload, which likely involves filtering by date range (e.g., querying by month).
- Why selected: Monthly partitioning will align well with the data loading pattern and optimize queries that filter based on the Sales Date column. Partition pruning will be efficient for queries that request data from a specific month or a range of months.
Option B: Once per year
- Explanation: Partitioning the table on an annual basis would result in a single partition containing all rows for the year, which could grow quite large over time (e.g., after a few years, each partition might contain hundreds of millions of rows). This could lead to poor query performance for queries that filter on specific dates, as the query would need to scan large amounts of data in each partition, even if only a small range of dates is queried.
- Reason for rejection: Annual partitioning is too coarse-grained and will not optimize queries that filter by specific dates within the year, especially given the large volume o...
Author: Olivia Johnson · Last updated May 27, 2026
You have an Azure Databricks workspace that contains a Delta Lake dimension table named Table1.
Table1 is a Type 2 slowly changing dimension (SCD) table.
You need to apply updates f...
In the context of Delta Lake in Azure Databricks, the requirement is to apply updates from a source table to a Type 2 slowly changing dimension (SCD) table. Type 2 SCD is used to manage historical data, where changes in a dimension (like a customer's address or status) result in the creation of new records rather than updating existing ones, preserving the history of changes.
Let's break down the available options:
Key Factors:
- Type 2 Slowly Changing Dimension (SCD): This method involves maintaining historical records. New records are inserted for changes, while existing records might be "expired" by marking them as outdated (e.g., with an `end_date` or `is_active` flag) and creating a new version of the record with the updated values.
- The solution must be able to handle this pattern of updating existing records and inserting new records efficiently.
Option A: CREATE
- Explanation: The CREATE statement in SQL is typically used to create new database objects (e.g., tables, views, or indexes). It is not used to modify data within a table.
- Reason for rejection: This option is not relevant for updating or merging data. It would only be useful when creating a table, not for applying updates or handling slowly changing dimensions.
Option B: UPDATE
- Explanation: The UPDATE statement in SQL is used to modify the existing records in a table. While it can update specific fields in a table, it is not suitable for handling the logic required for Type 2 SCD. Type 2 SCD typically requires inserting new rows for changes, rather than updating existing rows.
- Reason for rejection: UPDATE would only modify existing rows but would not handle the creation of new records or the management of historical data. It doesn't preserve the history of changes, which is essential for Type 2 SCD.
Option C...
Author: Layla · Last updated May 27, 2026
You are designing an Azure Data Lake Storage solution that will transform raw JSON files for use in an analytical workload.
You need to recommend a format for the transformed files. The solution must meet the following requirements:
* Contain information about the data types of each column in the files.
* Support quer...
To address the scenario, let's evaluate each file format based on the given requirements:
Key Requirements:
1. Contain information about the data types of each column in the files: This implies that the file format should allow for schema information to be stored along with the data.
2. Support querying a subset of columns: The file format must support columnar storage, allowing for efficient querying of only the necessary columns.
3. Support read-heavy analytical workloads: The format should be optimized for fast, efficient read operations, especially for large datasets typically encountered in analytical workloads.
4. Minimize the file size: The format should support efficient data compression to reduce storage and improve performance.
Option A: JSON
- Explanation: JSON is a popular format for storing semi-structured data. However, it is a row-based format rather than a columnar one. JSON does not store schema information alongside the data in a way that optimizes querying performance. Furthermore, it lacks efficient compression compared to columnar formats.
- Reason for rejection: JSON is not optimized for querying a subset of columns, and it does not efficiently support read-heavy analytical workloads. Additionally, the file sizes tend to be larger compared to columnar formats like Parquet or Avro. While it does contain schema information, it does not meet the performance and size requirements of this scenario.
Option B: CSV
- Explanation: CSV is a simple, widely-used format for storing tabular data. It is human-readable and easy to use, but like JSON, it is row-based and does not store schema information alongside the data. CSV files also lack support for efficient columnar storage, making them inefficient for querying specific columns.
- Reason for rejection: CSV files do not contain data type information (schema), and they are not optimized for read-heavy analytical workloads. Moreover, CSV files tend to be larger because they lack advanced compression mechanisms available in formats like Parquet. Therefore, CSV does not meet the performance or storage efficiency requirements.
Option C: Apache Avro
- Explanation: Apache Avro...
Author: Lucas · Last updated May 27, 2026
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.
You pla...
Key factors to consider:
- The task is to copy data from an Azure Storage account to an enterprise data warehouse in Azure Synapse Analytics.
- You need to prepare the files for quick data copying to the warehouse.
Analyzing the proposed solution:
Solution: Modify the files to ensure that each row is less than 1 MB.
- The row size: The proposed solution aims to reduce the row size to less than 1 MB. However, reducing row size alone may not necessarily optimize the process of copying data into Azure Synapse Analytics.
- Azure Synapse Analytics uses columnar storage (especially when data is loaded in a columnar format such as Parquet or Delta), which is optimized for analytical workloads.
- Copying data into Azure Synapse Analytics typically involves optimizing the file format (e.g., Parquet, CSV, etc.), data partitioning, and the total size of the files rather than just focusing on row size.
Key Points:
1. Row size: While reducing the row size might improve the performance of reading individual rows, Azure Synapse Analytics often han...
Author: Leah Davis · Last updated May 27, 2026
You plan to create a dimension table in Azure Synapse Analytics that will be less than 1 GB.
You need to create the table to meet the following requirements:
* Provide the fastest query tim...
When designing a dimension table in Azure Synapse Analytics, you need to ensure that you minimize data movement and provide the fastest query times. Below is an analysis of each option and why one might be selected over others:
A) Replicated Table
- Description: A replicated table is a table that is copied to every distribution in the Synapse pool. This means every node has a local copy of the table, which eliminates the need for data movement during queries.
- Best Use Case: Replicated tables are best for small dimension tables or lookup tables that are frequently used in joins. The size of the table being less than 1 GB fits perfectly for this option.
- Why Selected: The key factor here is that a table of less than 1 GB will easily fit in memory on all nodes, ensuring that the data is immediately available locally on each distribution. This minimizes data movement and provides faster query times when joining with large fact tables.
B) Hash Distributed Table
- Description: A hash-distributed table is divided into distributions based on a hash function on a specific column (usually a key). This helps in balancing data across distributions for large tables.
- Best Use Case: Hash distribution works well for large fact tables or tables that need to be joined on specific columns (e.g., foreign keys). However, the overhead of hashing and distributing data across distributions can introduce delays for small tables.
- Why Rejected: For a small table (less than 1 GB), hash distribution could introduce unnecessary complexity and overhead. A smaller table is unlikely to benefit much from hash distribution, and it may even increase query latency due to data redistri...
Author: David · Last updated May 27, 2026
You are designing a dimension table in an Azure Synapse Analytics dedicated SQL pool.
You need to create a surrogate key for the table. The solution must provide th...
When designing a surrogate key for a dimension table in Azure Synapse Analytics, the goal is to select the most efficient option for generating the surrogate key, ensuring the fastest query performance while maintaining scalability and simplicity.
A) A GUID Column
- Description: A GUID (Globally Unique Identifier) is a 128-bit value used as a unique identifier. GUIDs can be generated either client-side or server-side.
- Best Use Case: GUIDs are typically used when unique identification is needed across systems or for distributed systems where keys must be unique without coordination.
- Why Rejected: While GUIDs are unique, they are not sequential and can lead to fragmentation in clustered indexes, which can result in inefficient storage and slower query performance. Because of the non-sequential nature, inserts may be scattered across the index, leading to higher I/O costs during insert operations and poor query performance.
B) A Sequence Object
- Description: A sequence object is a database object that generates a series of numeric values in a sequential manner, similar to an auto-incrementing counter.
- Best Use Case: Sequence objects are useful when you need a globally unique and sequential set of numbers but need more control over the sequence generation (e.g., across multiple tables or databases).
- Why Rejected: While a sequence can provide sequential keys, it introduces overhead due to the need to manage the sequence s...
Author: Henry · Last updated May 27, 2026
HOTSPOT
-
You have an Azure Data Lake Storage Gen2 account that contains a container named container1. You have an Azure Synapse Analytics serverless SQL pool that contains a native external table named dbo.Table1. The source data for dbo.Table1 is stored in container1. The folder structure of container1 is shown in the following exhibit.
The external data source is defined by using the fo...
Author: Ahmed97 · Last updated May 27, 2026
You have an Azure Synapse Analytics dedicated SQL pool.
You need to create a fact table named Table1 that will store sales data from the last three years. The solution must be optimized for the following query operations:
* Show order counts by week.
* Calculate sales totals by region.
*...
When optimizing a fact table in Azure Synapse Analytics, partitioning plays a crucial role in query performance. The goal is to partition the table in a way that maximizes query efficiency for the given operations. Let's analyze each option in the context of the query operations:
A) Product
- Description: Partitioning by product means that each product’s sales data would be stored in separate partitions.
- Best Use Case: This is ideal for queries that aggregate or filter based on products.
- Why Rejected: While partitioning by product could be useful for queries that specifically filter by product, the queries you mentioned (such as showing order counts by week or calculating sales totals by region) are more likely to benefit from partitioning based on time or geography. Product-based partitioning would not optimize queries that involve aggregations or filters by time or region.
B) Month
- Description: Partitioning by month means that each partition will contain data for one specific month.
- Best Use Case: This is useful for time-based queries, such as those that analyze data for specific months.
- Why Rejected: While partitioning by month would optimize queries for finding orders from a given month, it’s less effective for the other query types. For example, queries that aggregate sales by week, region, or product would require scanning multiple partitions, which could decrease query performance compared to a partitioning strategy based on week or region.
C) Week
- Description: Partitioning by week means that eac...
Author: MoonlitPantherX · Last updated May 27, 2026
You are designing the folder structure for an Azure Data Lake Storage Gen2 account.
You identify the following usage patterns:
* Users will query data by using Azure Synapse Analytics serverless SQL pools and Azure Synapse Analytics serverless Apache Spark pools.
* Most queries will include a filter on the current year or week.
* Data will be secured by data source.
You need to recommend a folder struc...
When designing a folder structure for Azure Data Lake Storage Gen2, the primary objectives are to support the usage patterns, simplify folder security, and minimize query times. Let's evaluate each folder structure option in detail based on the provided requirements:
Key Requirements:
- Usage patterns: Queries will frequently filter data by year or week.
- Security: Data will be secured by data source.
- Performance: Query times need to be minimized, especially for filtering by year or week.
Evaluation of Options:
A) `DataSourceSubjectAreaYYYYWWFileData_YYYY_MM_DD.parquet`
- Structure: Data is organized first by data source, then by subject area, followed by year (`YYYY`), then week (`WW`), with the data files named by date (`YYYY_MM_DD`).
- Advantages:
- This structure places data in a hierarchy that aligns with both year and week, enabling efficient filtering when querying by either of these dimensions.
- Data is grouped by data source first, allowing for easier security management at the data source level.
- Queries can directly filter on `YYYY` and `WW` to minimize the amount of data read.
- Why Selected: This structure is optimized for performance because it allows for quick filtering by year and week. Additionally, the data source is at the top of the hierarchy, simplifying access control and security. It also reduces the need for unnecessary scanning of data when filtering by `YYYY` and `WW`.
B) `DataSourceSubjectAreaYYYY-WWFileData_YYYY_MM_DD.parquet`
- Structure: Data is organized by data source, subject area, then by a combined year-week (`YYYY-WW`), with the file names containing the date.
- Advantages:
- Combining `YYYY` and `WW` into a single folder reduces the number of folders, making the structure simpler.
- This might be useful if your data is primarily queried by specific week combinations.
- Why Rejected: While the structure combines year and week, it doesn't provide the granularity to efficiently filter by just `YYYY` or `WW` independently. This could be less efficient if you frequently query by one or the other. Additionally, security at the level of year or week would be more complex in this structure.
C) `DataSourceSubjectAreaWWYYYYFileData_YYYY_MM_DD.parquet`
- Structure: Data is organ...
Author: Sofia2021 · Last updated May 27, 2026
You have an Azure Synapse Analytics dedicated SQL pool named Pool1. Pool1 contains a table named table1.
You load 5 TB of data into table1.
You need to ensure that columnsto...
To maximize columnstore compression in an Azure Synapse Analytics dedicated SQL pool, it's crucial to use an operation that optimizes or rebuilds the columnstore index, as columnstore indexes are the key mechanism for compression in Synapse. Here’s the breakdown of each option:
A) DBCC INDEXDEFRAG (pool1, table1)
- Explanation: `DBCC INDEXDEFRAG` is used to defragment an index to help with performance. However, it does not maximize compression or improve columnstore compression specifically.
- Why Rejected: This operation is useful for traditional indexes, but columnstore compression is not impacted by defragmenting an index. It's not the right choice for optimizing compression.
B) DBCC DBREINDEX (table1)
- Explanation: `DBCC DBREINDEX` is a command that rebuilds all indexes in a database, but it’s deprecated in Azure Synapse Analytics and does not address columnstore index compression directly.
- Why Rejected: While rebuilding indexes is important for performance, `DBCC DBREINDEX` does not focus on optimizing columnstore compression, and it's not recommended in Azure Synapse Analytics. Columnstore index rebuilding should be done using specific `ALTER INDEX` commands.
C) ALTER INDEX ALL on table1 REORGANIZE
- Explanation: `A...
Author: RadiantJaguar56 · Last updated May 27, 2026
You have an Azure Synapse Analytics dedicated SQL pool named pool1.
You plan to implement a star schema in pool and create a new table named DimCustomer by using the following code.
You need to ensure that DimCustomer has the necessary columns to support a Type 2 slowly changing dimension (SCD).
Wh...
To support a Type 2 Slowly Changing Dimension (SCD) in a star schema, you need to ensure that the table contains columns that track changes over time. A Type 2 SCD maintains historical data by adding new records when data changes, with a specific focus on the start and end dates of the change.
Key Elements of Type 2 SCD:
1. Effective Start Date: Indicates when the record version became active.
2. Effective End Date: Indicates when the record version became inactive or ended.
3. Current Flag (Optional): A flag to indicate whether a record is the current version, but this is not mentioned here.
4. Row Identifier (RowID): Often used to uniquely identify each version of a record.
Explanation of Each Option:
A) [HistoricalSalesPerson] [nvarchar] (256) NOT NULL
- Explanation: This column might track the salesperson associated with a customer at some point in time. However, it does not directly support the Type 2 SCD process, which requires tracking changes in the data and time periods of those changes.
- Why Rejected: This column doesn’t help track changes in customer data over time. It's more related to a business dimension, not necessary for the SCD mechanism.
B) [EffectiveEndDate] [datetime] NOT NULL
- Explanation: The `EffectiveEndDate` column tracks the date when a version of the record is no longer valid, which is critical for Type 2 SCD. It marks when the previous version of a customer record ended.
- Why Selected: This column is necessary for Type 2 SCD because it allows you to track the historical end of each record. It helps maintain accurate time periods for when customer data was valid.
C) [PreviousModifiedDate] [datetime] NOT NULL
- Explanation: This column could be used to store the...
Author: Madison · Last updated May 27, 2026
HOTSPOT
-
You have an Azure subscription that contains an Azure Synapse Analytics dedicated SQL pool.
You plan to deploy a solution that will analyze sales data and include the following:
* A table named Country that will contain 195 rows
* A table named Sales that will contain 100 million rows
* A query to identify total sales by country and customer from the past 30 days
You need to create the tables. The solution...
Author: Olivia · Last updated May 27, 2026
You have an Azure subscription that contains an Azure Data Lake Storage Gen2 account named account1 and an Azure Synapse Analytics workspace named workspace1.
You need to create an external table in a serverless SQL pool in workspace1. The external table will reference ...
To create an external table in a serverless SQL pool in Azure Synapse Analytics that references CSV files stored in an Azure Data Lake Storage Gen2 account, we must focus on maximizing performance and ensuring secure authentication. Here’s the breakdown of each option:
A) Use a native external table and authenticate by using a shared access signature (SAS)
- Explanation: A native external table is designed to work directly with data in a data lake or blob storage. SAS tokens are typically used for granular, time-limited access to resources, but they aren't always the most efficient or secure way to authenticate in production environments.
- Why Rejected: While SAS tokens can provide access to the files, they can lead to performance bottlenecks, especially when dealing with large datasets, due to their temporary nature and possible overhead. Additionally, SAS tokens can be a security risk if not handled properly.
B) Use a native external table and authenticate by using a storage account key
- Explanation: This option uses a native external table, which directly references data stored in Data Lake or Blob Storage. Using a storage account key for authentication is secure, and it can provide higher performance compared to SAS tokens because it avoids the overhead of token-based authentication.
- Why Selected: This is the most efficient and reliable choice for external tables in Synapse. The storage account key provides full access to the storage account and is faster for large-scale queries because there’s no need for the overhead associated with token management. It is more suited for high-performance scenarios, which is critical when dealing with large datasets in Synap...
Author: James · Last updated May 27, 2026
HOTSPOT
-
You have an Azure Synapse Analytics serverless SQL pool that contains a database named db1. The data model for db1 is shown in the following exhibit.
Use the drop-down menus to select the answer choice that completes each state...
Author: Ming88 · Last updated May 27, 2026
You have an Azure Databricks workspace and an Azure Data Lake Storage Gen2 account named storage1.
New files are uploaded daily to storage1.
You need to recommend a solution that configures storage1 as a structured streaming source. The solution must meet the following requirements:
* Incrementally process new files as they are uploaded to storage1.
* Minimize implementation and m...
To address the requirements, let's evaluate each of the options:
A) COPY INTO
- Description: COPY INTO is a SQL command used in Azure Synapse Analytics and Azure Databricks, typically for batch ingestion of data into Delta Lake tables.
- Evaluation: While it can load files into a structured format like Delta Lake, it does not inherently support incremental processing as new files are uploaded or schema drift. The command works well for batch processes but doesn’t suit real-time or near-real-time ingestion, nor does it offer the same level of flexibility or low maintenance for continuous file ingestion.
- Why Rejected: Does not support incremental processing or schema drift well, and it's more batch-oriented, which doesn’t align with the need for near-real-time processing.
B) Azure Data Factory
- Description: Azure Data Factory (ADF) is a powerful ETL service that can orchestrate data movement and transformation from various data sources. It supports scheduled data pipeline runs.
- Evaluation: ADF can move data from Azure Data Lake Storage Gen2 to various destinations but does not natively support streaming. While ADF can be used for incremental data loading, it is primarily used for batch processing and lacks support for schema drift and real-time streaming.
- Why Rejected: ADF is better suited for batch-oriented processes and does not offer incremental stream processing or the level of ease and cost-effectiveness required for processing millions of small files continuously.
C) Auto Loader
- Description: Auto Loader is a feature in Azure Databricks that efficiently ingests large amounts of data from cloud storage. It automatically detects new files in a source directory and increments processing as new data arrives, leveraging structured streaming.
- Evaluation: Auto Loader is built spec...
Author: Benjamin · Last updated May 27, 2026
You have an Azure subscription that contains the resources shown in the following table.
You need to read the TSV files by using ad-hoc queries and the OPENROWSET function. The solution must assign a name and overri...
To read TSV files by using ad-hoc queries and the `OPENROWSET` function in Azure, with the goal of assigning a name and overriding the inferred data type of each column, we must consider the correct approach to achieving this.
Let's analyze each option and their relevance to the task:
A) The WITH clause
- The `WITH` clause in the `OPENROWSET` function allows you to specify additional options for how the data should be interpreted. This includes specifying the column names and data types for each column.
- In this case, it enables overriding the default inferred data types of the TSV columns by specifying explicit column definitions and names.
- This option is ideal because it allows you to both assign names and override inferred data types for the columns in the TSV file.
B) The ROWSET_OPTIONS bulk option
- The `ROWSET_OPTIONS` is used to specify various options for reading the bulk data from a file, like performance or how rows are interpreted.
- While this can be used for bulk loading or modifying how rows are read, it does not provide a way to assign names to columns or override their inferred data types.
- This option is not suitable for this scena...
Author: Ravi Patel · Last updated May 27, 2026
You have an Azure Synapse Analytics dedicated SQL pool.
You plan to create a fact table named Table1 that will contain a clustered columnstore index.
You need to optimize data compression and query performance for Table...
When designing a clustered columnstore index in Azure Synapse Analytics for a fact table like Table1, optimizing data compression and query performance is a critical factor. To achieve this, partitioning the table can help enhance performance by breaking the data into more manageable units.
Let's analyze each option in terms of its relevance to this task:
A) 100,000
- 100,000 rows is a relatively small number for a dedicated SQL pool, especially for large fact tables. The overhead of partitioning at this small scale is typically not justified because the benefits of partitioning (such as better query performance and compression) start to materialize at much higher data volumes.
- This option is rejected because the data volume is too small to fully benefit from partitioning in a dedicated SQL pool with a clustered columnstore index.
B) 600,000
- 600,000 rows is still a small dataset for partitioning in a dedicated SQL pool. While partitioning may help at this scale, the improvements in compression and query performance are typically not significant enough to justify the overhead at this level.
- This option is rejected because partitioning does not provide substantial benefits with fewer than 1 million rows.
C) 1 million
- 1 million rows is often considered the minimum thr...
Author: Aarav2020 · Last updated May 27, 2026
You have an Azure Synapse Analytics dedicated SQL pool that contains a table named DimSalesPerson. DimSalesPerson contains the following columns:
* RepSourceID
* SalesRepID
* FirstName
* LastName
* StartDate
* EndDate
* Region
You are developing an Azure Synapse Analytics pipeline that includes a mapping data flow named Dataflow1. Dataflow1 will read sales team data from an external source and use a Type 2 slowly changing dimension (SCD) when loading the data into DimSalesPerson.
...
In this scenario, you're using Type 2 Slowly Changing Dimension (SCD) in Azure Synapse Analytics to manage the DimSalesPerson table. Type 2 SCD is used to handle historical data by maintaining multiple records for the same entity (salesperson, in this case) with different versions, allowing you to track changes over time (e.g., updates to last names or other attributes).
Let’s break down each option to determine which actions are required when updating the last name of a salesperson in this scenario:
A) Update three columns of an existing row
- In Type 2 SCD, when a change occurs in a dimension (e.g., a salesperson's last name is updated), the current record is marked as "expired" by updating the `EndDate` and a new record is inserted with the updated value.
- However, updating three columns (e.g., `LastName`, `EndDate`, and `StartDate`) is not typical unless there's a specific requirement to update additional fields, which is usually not required in the case of a simple last name change.
- This option is less likely because updating three columns is not necessary in a typical Type 2 SCD scenario unless other attributes are involved in the change.
B) Update two columns of an existing row
- In Type 2 SCD, when an update occurs (like changing the last name), the current active row typically has its `EndDate` updated (to indicate the record is no longer valid), and a new row is inserted with the updated value (`LastName`) and a new `StartDate`.
- This option is correct because you will likely need to update two columns...
Author: Sofia · Last updated May 27, 2026
HOTSPOT
-
You plan to use an Azure Data Lake Storage Gen2 account to implement a Data Lake development environment that meets the following requirements:
* Read and write access to data must be maintained if an availability zone becomes unavailable.
* Data that was last modified more than two years ago must be deleted automatically.
* Costs must be...
Author: Ahmed97 · Last updated May 27, 2026
HOTSPOT
-
You are designing an Azure Data Lake Storage Gen2 container to store data for the human resources (HR) department and the operations department at your company.
You have the following data access requirements:
* After initial processing, the HR department data will be retained for seven years and rarely accessed.
* The operations department data will be accessed frequently for the first six months, and then accessed once per month.
You need to design a data retention solution to meet the access requiremen...
Author: Liam · Last updated May 27, 2026
HOTSPOT
-
You are developing an Azure Synapse Analytics pipeline that will include a mapping data flow named Dataflow1. Dataflow1 will read customer data from an external source and use a Type 1 slowly changing dimension (SCD) when loading the data into a table named DimCustomer in an Azure Synapse Analytics dedicated SQL pool.
You need to ensure that Dataflow1 can perform the following tasks:
* Detect whether the data of a given customer has changed in the DimCustomer table.
* Perfo...
Author: Samuel · Last updated May 27, 2026
DRAG DROP
-
You have an Azure Synapse Analytics serverless SQL pool.
You have an Azure Data Lake Storage account named adls1 that contains a public container named container1. The container1 container contains a folder named folder1.
You need to query the top 100 rows of all the CSV files in folder1.
How should you complete the query? To answer, drag the appropriate values to the correct targets. Each v...
Author: IceDragon2023 · Last updated May 27, 2026
You have an Azure Synapse Analytics workspace named WS1 that contains an Apache Spark pool named Pool1.
You plan to create a database named DB1 in Pool1.
You need to ensure that when tables are created in DB1, the tables are available automatically...
To ensure that the tables created in DB1 within your Apache Spark pool (Pool1) are automatically available as external tables to the built-in serverless SQL pool in Azure Synapse Analytics, you need to consider how external tables are defined and how the two pools (Apache Spark and serverless SQL pool) can communicate with each other.
Let's analyze each option:
A) Parquet
- Parquet is a columnar data format that is highly optimized for analytics workloads. It is natively supported in both Apache Spark and serverless SQL pools in Azure Synapse Analytics.
- Parquet files are commonly used for external tables because they support efficient compression and are easily queried using both Apache Spark and SQL.
- This option is likely to be the best choice because Parquet is a widely used format that supports integration between Spark and serverless SQL pools.
B) ORC
- ORC (Optimized Row Columnar) is another columnar storage format used in analytics. It is supported by Apache Spark and other tools like Hive, but it is less common in Azure Synapse Analytics compared to Parquet.
- ORC files might be compatible with external tables, but they are not as broadly used as Parquet in Azure Synapse Analytics. Serverless SQL pool support for ORC files is less seamless than for Parquet.
- This option is less ideal because it is not as natively supported or commonly used for t...
Author: Stella · Last updated May 27, 2026
You have an Azure Data Lake Storage Gen2 account named storage1.
You plan to implement query acceleration for storage1.
Which two file types support query acceleration? Each correct answe...
In the context of Azure Data Lake Storage Gen2 and query acceleration, the goal is to accelerate queries on large datasets by utilizing formats that are optimized for analytics workloads. Query acceleration in Azure Data Lake typically focuses on formats that work well with Azure Synapse Analytics, Azure Data Explorer, or other analytic services that support optimized query processing.
Let's analyze each file format option:
A) JSON
- JSON is a text-based, semi-structured format that is often used for web data, logs, or small data exchanges. While it is widely supported, it is not as efficient as other formats (like Parquet or Avro) when it comes to large-scale queries. JSON is row-based, which is less optimized for analytical workloads where columnar storage formats provide more efficiency.
- This option is rejected because JSON does not provide the best performance or query acceleration compared to columnar formats like Parquet and Avro.
B) Apache Parquet
- Parquet is a columnar storage format that is specifically designed for large-scale data analytics. It supports efficient data compression and encoding schemes that improve query performance. It is natively supported by query acceleration technologies and is the most common file format for high-performance analytics workloads in Azure.
- This option is selected because Parquet is one of the formats most optimized for query acceleration, especially when working with services like Azure Synapse Analytics and Azure Data Explorer.
C) XML
- XML is another text-based format, but it is highly verbose and not optimized for analytical queries. Like JSON, it is often used for data interchange but doesn't offer the same query performance be...
Author: NightmareDragon2025 · Last updated May 27, 2026
You have an Azure subscription that contains the resources shown in the following table.
You need to read the files in storage1 by using ad-hoc queries and the OPENROWSET function. The solution must ensure that each rowset co...
In this scenario, the goal is to use the `OPENROWSET` function to read files stored in an Azure Storage account (in `storage1`) by executing ad-hoc queries. Additionally, it is specified that each rowset returned should contain a single JSON record.
Let's analyze each option:
A) JSON
- The JSON format is the most suitable option when you are working with JSON files in Azure Storage, particularly when you want to read data as individual JSON records.
- Using the `JSON` option with the `OPENROWSET` function ensures that the query is correctly interpreting the data as JSON and returns individual JSON records as separate rows. This aligns with the requirement of each rowset containing a single JSON record.
B) DELTA
- Delta is a storage format designed for use with Delta Lake, which is built on top of Apache Spark. Delta files are typically used to support versioning and ACID transactions in big data scenarios.
- The DELTA format is used when you need transactional consistency and versioning for data, but it’s not relevant for simple JSON records in this context.
- Since the task focuses on reading sing...
Author: Abigail · Last updated May 27, 2026
HOTSPOT
-
You have an Azure subscription that contains the Azure Synapse Analytics workspaces shown in the following table.
Each workspace must read and write data to datalake1.
Each workspace contains an unused Apache Spark pool.
You plan to configure each Spark pool to share catalog objects that reference datalake1.
F...