Update GCP Professional Data Engineer Question Dump 2 - Understood.md

OdyAsh · Jul 25, 2024 · 4d00129 · 4d00129
1 parent 02c68bb
commit 4d00129
Showing 1 changed file with 30 additions and 0 deletions.
diff --git a/... Engineering/GCP/GCP Professional Data Engineer Question Dump 2 - Understood.md b/... Engineering/GCP/GCP Professional Data Engineer Question Dump 2 - Understood.md
@@ -63,4 +63,34 @@ You have a BigQuery table that contains customer data, including sensitive infor
 
 Sol: B., E. (Even though some say A., B.)
 
+# Question #: 228  
+Topic #: 1
+
+You have a streaming pipeline that ingests data from Pub/Sub in production. You need to update this streaming pipeline with improved business logic. You need to ensure that the updated pipeline reprocesses the previous two days of delivered Pub/Sub messages. What should you do? (Choose two.)
+
+·         A. Use the Pub/Sub subscription clear-retry-policy flag
+
+·         B. Use Pub/Sub Snapshot capture two days before the deployment.
+
+·         C. Create a new Pub/Sub subscription two days before the deployment.
+
+·         D. Use the Pub/Sub subscription retain-acked-messages flag.
+
+·         E. Use Pub/Sub Seek with a timestamp.
+
+Sol: D., E. proof: [here](https://cloud.google.com/pubsub/docs/replay-message#seek_to_a_timestamp) ... check [here](https://www.examtopics.com/discussions/google/view/129875-exam-professional-data-engineer-topic-1-question-228/) for further discussions.
+
+# Question #: 229  
+Topic #: 1
+
+You currently use a SQL-based tool to visualize your data stored in BigQuery. The data visualizations require the use of outer joins and analytic functions. Visualizations must be based on data that is no less than 4 hours old. Business users are complaining that the visualizations are too slow to generate. You want to improve the performance of the visualization queries while minimizing the maintenance overhead of the data preparation pipeline. What should you do?
+
+·         A. Create materialized views with the allow_non_incremental_definition option set to true for the visualization queries. Specify the max_staleness parameter to 4 hours and the enable_refresh parameter to true. Reference the materialized views in the data visualization tool.
+
+·         B. Create views for the visualization queries. Reference the views in the data visualization tool.
+
+·         C. Create a Cloud Function instance to export the visualization query results as parquet files to a Cloud Storage bucket. Use Cloud Scheduler to trigger the Cloud Function every 4 hours. Reference the parquet files in the data visualization tool.
+
+·         D. Create materialized views for the visualization queries. Use the incremental updates capability of BigQuery materialized views to handle changed data automatically. Reference the materialized views in the data visualization tool.
 
+Sol: A. Explanation: "outer joins & analytic functions" aren't supported in mat. views, but are usable in non-inc. mat. views, so we have to set the allow_/// param. to true. max_staleness param will store the new changes from base table into non-inc. mat. view by running its query after max_staleness time passes., and the enable_refresh param. will automatically refresh the non-inc. mat. view at a periodic time (otherwise, we have to manually refresh ourselves each time we want to display the newly arrived data). ([discussion](https://www.examtopics.com/discussions/google/view/129876-exam-professional-data-engineer-topic-1-question-229/))