Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres fixes: Concurrency issue, connection leak, and fixes to guard CRUD operations #83

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

AlejandroEsquivel
Copy link
Contributor

@AlejandroEsquivel AlejandroEsquivel commented Nov 1, 2024

Pull Request Type

  • Breaking Change
  • Feature
  • Bug Fix
  • Non-bug Patch (dependency update, non-production code, etc.)

Link to Notion Task or Github Issue

Summary of Feature(s)

Summary of Bug Fix(es)

  • The docker images using gunicorn need to use uvicorn workers (WSGI / ASGI incompatibility) due to the FastAPI switch
  • The containers were crashing upon coming online (workers were dying) due to workers concurrently attempting to execute the sql statements with the audit function, now using advisory lock
  • Using db = next(self.get_db()) was essentially causing connections to proliferate due to not being properly closed, so opted for contextmanager approach
  • Also db = next(self.get_db()) was causing some changes not to commit properly so deletions / updates weren't working properly due to inconsistent states during db.commits

Previous Behaviour

  • Failed to start in Docker due to WSGI / ASGI incompatibility after FastAPI change.
  • In POSTGRES mode workers were dying due to attempting to execute audit function creation SQL statements concurrently.
  • Connection leak due to sessions not being explicitly closed
  • UDATE / DELETE API CRUD operations were not working

New Behaviour

  • No connection leaks
  • UDATE / DELETE API CRUD operations working
  • Multiple workers able to come up reliably while only one worker executes initialization SQL statements with advisory lock

Other details

  • PG Client now has util_ methods, these are utility functions which don't create db connections but rather take db sessions as arguments and perform auxiliary actions for the non util functions which are called directly by API controllers and do start db sessions

Dependencies

@AlejandroEsquivel AlejandroEsquivel marked this pull request as ready for review November 13, 2024 20:21
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
compose.yml Outdated
@@ -9,7 +9,7 @@ services:
volumes:
- ./postgres:/data/postgres
ports:
- "5432:5432"
- "5932:5432"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch shouldn't be checked in

Comment on lines +45 to +49
PGPORT: 5432
PGDATABASE: postgres
PGHOST: postgres
PGUSER: ${PGUSER:-postgres}
PGPASSWORD: ${PGPASSWORD:-changeme}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to think we should just have two different Dockfiles and compose files to run with PG vs config.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed 👌🏼

Comment on lines +80 to +82
# Support for providing env vars as uvicorn does not support supplying args to create_app
# - Usage: GR_CONFIG_FILE_PATH=config.py GR_ENV_FILE=.env PORT=8080 uvicorn --factory 'guardrails_api.app:create_app' --host 0.0.0.0 --port $PORT --workers 2 --timeout-keep-alive 90
# - Usage: gunicorn -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:$PORT --timeout=90 --workers=2 "guardrails_api.app:create_app(None, None, $PORT)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's rough. We'll have to address this in the OSS's cli as well since it calls this.

Comment on lines +100 to +115
with engine.begin() as connection:
lock_acquired = connection.execute(text(f"SELECT pg_try_advisory_lock({lock_id});")).scalar()
if lock_acquired:
self.run_initialization(connection)
# Release the lock after initialization is complete
connection.execute(text(f"SELECT pg_advisory_unlock({lock_id});"))

def run_initialization(self, connection):
# Perform the actual initialization tasks
from guardrails_api.models import GuardItem, GuardItemAudit # noqa
Base.metadata.create_all(bind=self.engine)

# Execute custom SQL extensions and triggers
connection.execute(text(INIT_EXTENSIONS))
connection.execute(text(AUDIT_FUNCTION))
connection.execute(text(AUDIT_TRIGGER))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good patch. We probably just need a real migration tool in the long run since sql-alchemy doesn't do updates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally I believe they can be kicked off programatically as well so we can probably add this to lifecycle hooks

CalebCourier
CalebCourier previously approved these changes Feb 6, 2025
compose.yml Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants