Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
Jay authored and Jay committed Nov 11, 2024
1 parent 15a1a7f commit ddf0645
Showing 1 changed file with 14 additions and 3 deletions.
17 changes: 14 additions & 3 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,32 @@
def main():
# Extract data
extract()

# Start Spark session
spark = start_spark("USBirthDataProcessing")

# Load data into DataFrame
df = load_data(spark)
# Generate descriptive statistics

# Example metrics
describe(df)
# Query example: Count births by year

# Query the data
query(
spark,
df,
"SELECT year, COUNT(*) AS birth_count FROM USBirthData GROUP BY year ORDER BY year",
(
"SELECT year, COUNT(*) AS birth_count "
"FROM USBirthData "
"GROUP BY year "
"ORDER BY year"
),
"USBirthData",
)

# Example transformation
example_transform(df)

# End Spark session
end_spark(spark)

Expand Down

0 comments on commit ddf0645

Please sign in to comment.