Raad van State Scraper and Analyzer

This repository contains a web scraper and analyzer for the Dutch Raad van State (Council of State) advices. Replicate serverless LLM API is used, for analysis by Llama3.1-405B-instruct.

Features

Scrapes advices from the Raad van State website for a specified year
Analyzes the scraped advices and categorizes them based on standard dictum formulations
Provides reasoning for the categorization
Saves the scraped and analyzed data to CSV files
Validates date fields, formats text dates into dd-mm-yyyy
Merges all seperate years into one giant CSV file

Scraped Data Structure

The scraper collects the following information for each advice:

URL: The URL of the advice on the Raad van State website.
Content: The full text content of the advice.
Reference: The reference number (kenmerk) of the advice.
Advice Type: The type of advice (Wet or Algemene maatregel van bestuur).
Date Aanhanging: Day of "aanhanging" in text format
Date Vaststelling: Day of "vaststelling" in text format
Date Advies: Day of "advies" in text format
Date Publicatie: Day of "publicatie" in text format

The scraped data is saved in a CSV file named raad_van_state_adviezen_<year>.csv.

Afterwards the validator script changes the date format into machine readable format:

Date Advies Formatted: Day of advies formatted in dd-mm-yyyy form

Installation

Clone the repository:

git clone https://github.com/Democratie-Monitor/raad-van-state.git

Install the required dependencies:

pip install -r requirements.txt

Set up the necessary environment variables:

REPLICATE_API_TOKEN: Your Replicate API token for using the language model.

Usage

1. Run the scraper to fetch advices for a specific year (default is 2025):

python src/scraper.py --year 2025

Optional arguments:

--test: Run in test mode to scrape only 10 advices.
--year: Specify the year to scrape advices for (e.g., 2024).

2. Run the analyzer to categorize the scraped advices:

python src/analyzer.py data/raad_van_state_adviezen_2025.csv

Optional arguments:

--test: Run in test mode to analyze only 10 advices.
--start-row: Start processing from a specific row number.

This will result in a raad_van_state_adviezen_YYYY_analyzed.csv file

3. Run the date validator to check date fields and format into dd-mm-yyyy:

This script processes all RvS advice CSV files in the current directory and adds formatted date columns.

python src/validator.py

No arguments needed - the script will:

Find all files matching 'raad_van_state_adviezen_*.csv'
Convert Dutch format dates to dd-mm-yyyy format
Create new columns with '_formatted' suffix
Prompt for manual input when datum_advies cannot be automatically converted

4. Transfer date information from original RvS advice CSV files to their analyzed counterparts.

python date_merger.py

No arguments needed - the script will:

Find pairs of files like 'raad_van_state_adviezen_YYYY.csv' and 'raad_van_state_adviezen_YYYY_analyzed.csv'
Copy the 'datum_advies_formatted' column from the original to the analyzed file
Create a log file 'date_merger_errors.log' with details of any issues encountered
Process all years found in the current directory

File requirements:

Original files should be named: raad_van_state_adviezen_YYYY.csv
Analyzed files should be named: raad_van_state_adviezen_YYYY_analyzed.csv

5. View the scraped and analyzed data in your directory.

Do a couple of manual checks to make sure everything went as expected.

6. Merge all CSV into one giant file

This script combines all analyzed RvS advice CSV files into a single comprehensive CSV file.

python csv_merger.py

No arguments needed - the script will:

Find all files matching '*_analyzed.csv' in the current directory
Merge them into a single CSV file
Add a 'source_file' column to track the origin of each row
Create detailed logs in 'merger_log.log'
Output file will be named: merged_raad_van_state_adviezen_YYYYMMDD.csv (using current date)

Features:

Verifies column consistency across files
Detects and logs duplicate URLs
Preserves all original data
Creates summary statistics of the merge operation

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvement, please open an issue or submit a pull request.

License

This project is licensed under the GNU GPL v3

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Raad van State Scraper and Analyzer

Features

Categories

Scraped Data Structure

Installation

Usage

1. Run the scraper to fetch advices for a specific year (default is 2025):

2. Run the analyzer to categorize the scraped advices:

3. Run the date validator to check date fields and format into dd-mm-yyyy:

4. Transfer date information from original RvS advice CSV files to their analyzed counterparts.

5. View the scraped and analyzed data in your directory.

6. Merge all CSV into one giant file

Contributing

License

About

Releases

Packages

Languages

License

Democratie-Monitor/raad-van-state

Folders and files

Latest commit

History

Repository files navigation

Raad van State Scraper and Analyzer

Features

Categories

Scraped Data Structure

Installation

Usage

1. Run the scraper to fetch advices for a specific year (default is 2025):

2. Run the analyzer to categorize the scraped advices:

3. Run the date validator to check date fields and format into dd-mm-yyyy:

4. Transfer date information from original RvS advice CSV files to their analyzed counterparts.

5. View the scraped and analyzed data in your directory.

6. Merge all CSV into one giant file

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages