Jupyter anywidget
and magic for working with pglite
(single user postgres wasm build) (provodes access to a PostgreSQL browser running solely in the the browser).
Try it in JupyterLite (in pyodide kernel, %pip install anywidget==0.9.13 jupyter_anywidget_pglite
)
Install from PyPi as: pip install jupyter_anywidget_pglite
Uses: anywidget==0.9.13
Usage:
- import package and magic:
%load_ext jupyter_anywidget_pglite
from jupyter_anywidget_pglite import pglite_panel
pg = pglite_panel()
#This should open a panel in the right-hand sidebar
# (`split-right`) by default.
# Close the panel either manually or via:
# pg.close()
# w = pglite_panel("example panel title)`
# w = pglite_panel(None, "split-bottom")`
# Headless mode (no HTML UI, works in:
# Jupyter Lab, Jupyter Notebook, VS Code w/ Jupyter notebook support)
#from jupyter_anywidget_pglite import pglite_headless
#pg = pglite_headless()
# Inline display
# Display HTML UI as initialising code cell output
# Display will be updated with consequent queries
#from jupyter_anywidget_pglite import pglite_inline
#pg = pglite_inline()
Use pg.ready()
/ pg.ready(timeout=TIME_IN_S)
function that will block until the pglite
widget is loaded and ready to accept requests (not JupyterLite).
To persist the database in browser storage, set the `idb='DBNAME`` parameter when creating a widget. For example:
pg_headless_persist = pglite_headless(idb="pglitetest1")
To run a query, place the query insde a %%pglite
cell block magic.
- use the
-w / --widget-name
setting to set the widget within the magic and it does not need to be passed again (for example,%%pglite_magic -w pg
) - alternatively, prior to calling the block magic, set the widget used in the magic via a line magic:
%setwidget pg
Running queries on the database using IPython cell block magic %%pglite -w WIDGET_VARIABLE
:
%%pglite_magic -w pg
CREATE TABLE IF NOT EXISTS test (
id serial primary key,
title varchar not null
);
#----
%%pglite_magic
INSERT INTO test (title) VALUES ('dummy');
#----
%%pglite_magic
SELECT * FROM test;
To run multiple SQL statements in the same cell:
- use the
-m / --multiple-statements
flag (default:False
) when calling the cell block magic [NOT RECOMMENDED]. This will naively split the query on each;
character, and then run each split item as a separate command. The response will be set to the response from the final query; - use the
-M / --multiple-statement-block
flag to run all the tems using thepglite
.exec()
command.
We can also run queries (with the same arguments) using the %pglite_query
line magic, with the query set via the -q / --query
parameter:
%pglite_query -r -q 'SELECT * FROM test LIMIT 1;'
Add the -d / --dataframe
flag to return the query result as a dataframe (not JupyterLite).
Having made a query onto the database via a magic cell, we can retrieve the response:
pg.response
If pandas
is installed, we can get rows returned from a query response as a dataframe:
pg.df()
We run a query by setting query state on the widget. The following Python function helps with that:
pg_headless.query("SELECT 'hello';")
If you are not in an emscripten
platform, blocking replies will be attempted automatically. These can be disabled by setting pg_headless.prefer_use_blocking=False
. Blocking attempts can also be forced by calling queries with autorespond=True
.
# If blocking is available (not JuphyterLite, marimo)
pg_headless.query("SELECT 'hello';", autorespond=True, df=True)
List tables: pg_headless.tables(autorespond=True)
Show table schema: pg_headless.table_schema("test", autorespond=True)
We can insert data from a data frame into a pre-existing table with an appropriate schema:
import pandas as pd
df = pd.DataFrame({"title":["a","b","c"]})
# Insert data from a dataframe into a table that already exists
pg_headless.insert_from_df("test", df, autorespond=True)
Partial support is provided for DBAPI2 and SQLAlchemy connections.
For example, we can pass a connection into a pandas.read_sql()
function (this probably won't work in JupyterLite, marimo, etc., where blocking is not supported):
from jupyter_anywidget_pglite.dbapi2 import create_connection
conn = create_connection(pg_headless)
pd.read_sql("SELECT * FROM test;", conn)
# Minimal SQLAlchemy connection object support
from jupyter_anywidget_pglite.sqlalchemy_api import create_engine
conn2 = create_engine(pg_headless)
pd.read_sql("SELECT * FROM test;", conn2)
Note that the pglite
query runs asynchronously, so how do we know on the Python side when the response is ready?
Using the jupyter_ui_poll
package (not JupyterLite), we can run a blocking wait on a response from pglite
(not JupyterLite):
response = pg.blocking_reply()
Optionally provide a timeout period (seconds):
response = pg.blocking_reply(timeout=5)
We can also use a blocking trick to return a response from the magic cell (not JupyterLite). Set -r / --response
flag when calling the magic. Optionally set the -t / --timeout
to the timeout period in seconds (default 5s; if the timeout is explicitly set, -r
is assumed): %%pglite -r
, %pglite -t 10
Note: I think that IPython notebook cells should have cell run IDs cleared prior to running. I have seen errors if there are non-unique cell run IDs for the blocking cell.
Recall also the option of running pg.ready()
/ pg.ready(timeout=TIME_IN_S)
.
If we have a table already defined on the database, and a dataframe that confoms to it, we can add the data in the dataframe to the table using magic as follows (BROKEN?):
%pglite_df_insert -d df -t my_table
Save and load data to / from a file.
We can get an export of the data using the pglite
data exporter (.dumpdatadir()
) in the database by calling:
pg.create_data_dump()
For a blocking wait until the datadump is read, use pg.create_data_dump(True)
or pg.create_data_dump(wait=True)
(not JupyterLite). You can also pass a timeout
parameter in seconds (wait=True
is assumed if the timeout parameter is explicitly set).
After a moment or two, the data will appear in a dictionary on: pg.file_package
If we make a copy of that data, we can then create a new pglite
widget with a new pglite
instance that can load in the data using the pglite
data load option (loadDataDir
).
Use the data=
argument when creating the widget to pass in the data:
datadump = pg.file_package.copy()
# View info
#datadump["file_info"]
# Actual data is in: datadump["file_content"]
pg1 = pglite_inline(data=datadump)
We can export the datadump to a file using:
pg.save_datadump_to_file()
Or pass a filename: pg.save_datadump_to_file('myfile.tar.gz')
Load data into a datadump object:
from jupyter_anywidget_pglite import load_datadump_from_file
dd = load_datadump_from_file("pgdata.tar.gz")
dd["file_info"]
Or create a new widget with the pglite
database seeded from the file:
pg2 = pglite_panel(data="pgdata.tar.gz")
An increasing range of PostgreSQL extensions are available for pglite
(catalogue), and several of these are currently supported by jupyter_anywidget_pglite
:
from jupyter_anywidget_pglite import AVAILABLE_EXTENSIONS
AVAILABLE_EXTENSIONS
>>> ['fuzzystrmatch', 'pg_trgm', 'vector']
When creating the widget, pass the list of required extensions via the extensions=
parameter. For example: pglite_panel(extensions=["vector", "fuzzystrmatch", "pg_trgm"])
If specified, the extension package is dynamically downloaded and the extension automatically enabled via an automatically run CREATE EXTENSION IF NOT EXISTS extension_name
call.
Examples:
pg_trgm
:%pglite_query -w pg_panel -r -q "SELECT similarity('One sentence', 'Another sentence');"
fuzzystrmatch
:%pglite_query -r -q "SELECT soundex('hello world!');"
TO DO: provide a way for users to dynamically add references to additional extension packages.
To provide an audible alert when a query or a data dump generation operation has completed, set: pg.audio = True
.
If audible alerts are enabled, if an error is raised, an audible alert will sound and the error message will also be reported using the browser text-to-speech engine.
jupyter_anywidget_pglite
works in:
- JupyterLab (presumably also Jupyter notebook v7(?))
- VS Code (not the side panel)
- JupyterLite (not the blocking parts)
marimo
notebooks (note the magics, not the side panel, (blocking not tested))
- options to display outputs in the panel;
- button to clear input history;
- button to reset database;
- explore possibility of a JuptyerLab extension to load
pglite
"centrally" and then connect to the same instance from any notebook. - need a better py api (I've been assuming use through IPython magics)
- make an async call for loading the db and running queries for use in JupyterLite?
- look at use of a (multi-tab) worker? e.g. if we have multiple noteboooks in JupyterLab tabs, or multiple standalone notebook tabs open, each wanting their own database connection?
See the GitHub topic tag: jupyter-wasm-anywidget-extension