Skip to content

Commit

Permalink
Update GCP Professional Data Engineer Question Dump 2 - Understood.md
Browse files Browse the repository at this point in the history
  • Loading branch information
OdyAsh authored Jul 25, 2024
1 parent 02c68bb commit 4d00129
Showing 1 changed file with 30 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -63,4 +63,34 @@ You have a BigQuery table that contains customer data, including sensitive infor

Sol: B., E. (Even though some say A., B.)

# Question #: 228
Topic #: 1

You have a streaming pipeline that ingests data from Pub/Sub in production. You need to update this streaming pipeline with improved business logic. You need to ensure that the updated pipeline reprocesses the previous two days of delivered Pub/Sub messages. What should you do? (Choose two.)

·         A. Use the Pub/Sub subscription clear-retry-policy flag

·         B. Use Pub/Sub Snapshot capture two days before the deployment.

·         C. Create a new Pub/Sub subscription two days before the deployment.

·         D. Use the Pub/Sub subscription retain-acked-messages flag.

·         E. Use Pub/Sub Seek with a timestamp.

Sol: D., E. proof: [here](https://cloud.google.com/pubsub/docs/replay-message#seek_to_a_timestamp) ... check [here](https://www.examtopics.com/discussions/google/view/129875-exam-professional-data-engineer-topic-1-question-228/) for further discussions.

# Question #: 229
Topic #: 1

You currently use a SQL-based tool to visualize your data stored in BigQuery. The data visualizations require the use of outer joins and analytic functions. Visualizations must be based on data that is no less than 4 hours old. Business users are complaining that the visualizations are too slow to generate. You want to improve the performance of the visualization queries while minimizing the maintenance overhead of the data preparation pipeline. What should you do?

·         A. Create materialized views with the allow_non_incremental_definition option set to true for the visualization queries. Specify the max_staleness parameter to 4 hours and the enable_refresh parameter to true. Reference the materialized views in the data visualization tool.

·         B. Create views for the visualization queries. Reference the views in the data visualization tool.

·         C. Create a Cloud Function instance to export the visualization query results as parquet files to a Cloud Storage bucket. Use Cloud Scheduler to trigger the Cloud Function every 4 hours. Reference the parquet files in the data visualization tool.

·         D. Create materialized views for the visualization queries. Use the incremental updates capability of BigQuery materialized views to handle changed data automatically. Reference the materialized views in the data visualization tool.

Sol: A. Explanation: "outer joins & analytic functions" aren't supported in mat. views, but are usable in non-inc. mat. views, so we have to set the allow_/// param. to true. max_staleness param will store the new changes from base table into non-inc. mat. view by running its query after max_staleness time passes., and the enable_refresh param. will automatically refresh the non-inc. mat. view at a periodic time (otherwise, we have to manually refresh ourselves each time we want to display the newly arrived data). ([discussion](https://www.examtopics.com/discussions/google/view/129876-exam-professional-data-engineer-topic-1-question-229/))

0 comments on commit 4d00129

Please sign in to comment.