Skip to content

Commit

Permalink
Update GCP Professional Data Engineer Question Dump 1 - Easily Solved.md
Browse files Browse the repository at this point in the history
  • Loading branch information
OdyAsh authored Jul 1, 2024
1 parent 5129361 commit 2e18e46
Showing 1 changed file with 59 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -137,3 +137,62 @@ You store and analyze your relational data in BigQuery on Google Cloud with all

Sol: D.

# Question #: 222
Topic #: 1

You have a variety of files in Cloud Storage that your data science team wants to use in their models. Currently, users do not have a method to explore, cleanse, and validate the data in Cloud Storage. You are looking for a low code solution that can be used by your data science team to quickly cleanse and explore data within Cloud Storage. What should you do?

·         A. Provide the data science team access to Dataflow to create a pipeline to prepare and validate the raw data and load data into BigQuery for data exploration.

·         B. Create an external table in BigQuery and use SQL to transform the data as necessary. Provide the data science team access to the external tables to explore the raw data.

·         C. Load the data into BigQuery and use SQL to transform the data as necessary. Provide the data science team access to staging tables to explore the raw data.

·         D. Provide the data science team access to Dataprep to prepare, validate, and explore the data within Cloud Storage.

Sol: D.

# Question #: 223
Topic #: 1

You are building an ELT solution in BigQuery by using Dataform. You need to perform uniqueness and null value checks on your final tables. What should you do to efficiently integrate these checks into your pipeline?

·         A. Build BigQuery user-defined functions (UDFs).

·         B. Create Dataplex data quality tasks.

·         C. Build <mark style="background: #FFF3A3A6;">Dataform assertions</mark> into your code.

·         D. Write a Spark-based stored procedure.

Sol: C.

# Question #: 224
Topic #: 1

A web server sends click events to a Pub/Sub topic as messages. The web server includes an eventTimestamp attribute in the messages, which is the time when the click occurred. You have a Dataflow streaming job that reads from this Pub/Sub topic through a subscription, applies some transformations, and writes the result to another Pub/Sub topic for use by the advertising department. The advertising department needs to receive each message within 30 seconds of the corresponding click occurrence, but they report receiving the messages late. Your Dataflow job's system lag is about 5 seconds, and the data freshness is about 40 seconds. Inspecting a few messages show no more than 1 second lag between their eventTimestamp and publishTime. What is the problem and what should you do?

·         A. The advertising department is causing delays when consuming the messages. Work with the advertising department to fix this.

·         B. Messages in your Dataflow job are taking more than 30 seconds to process. Optimize your job or increase the number of workers to fix this.

·         C. Messages in your Dataflow job are processed in less than 30 seconds, but your job cannot keep up with the backlog in the Pub/Sub subscription. Optimize your job or increase the number of workers to fix this.

·         D. The web server is not pushing messages fast enough to Pub/Sub. Work with the web server team to fix this.

Sol: C.

# Question #: 226
Topic #: 1

Your organization has two Google Cloud projects, project A and project B. In project A, you have a Pub/Sub topic that receives data from confidential sources. Only the resources in project A should be able to access the data in that topic. You want to ensure that project B and any future project cannot access data in the project A topic. What should you do?

·         A. Add firewall rules in project A so only traffic from the VPC in project A is permitted.

·         B. Configure VPC Service Controls in the organization with a perimeter around project A.

·         C. Use Identity and Access Management conditions to ensure that only users and service accounts in project A. can access resources in project A.

·         D. Configure VPC Service Controls in the organization with a perimeter around the VPC of project A.

Sol: <mark style="background: #FF5582A6;">C. or B.</mark> , but I believe C. (full discussion: [here](https://www.examtopics.com/discussions/google/view/129873-exam-professional-data-engineer-topic-1-question-226/))

0 comments on commit 2e18e46

Please sign in to comment.