-
-
Notifications
You must be signed in to change notification settings - Fork 802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dictionary for altair chart differs when the VegaFusion data transformer is used #3782
Comments
Related to #3759 |
@firasm have you tried IIRC, the difference you're seeing here is from explicitly asking for a If that isn't possible, then you might need https://github.com/vega/vl-convert?tab=readme-ov-file#python
|
Thanks for the help! When I try:
I get an error saying it needs to be the
I'm taking a look at
|
Thanks for following this up @firasm. Solution 1If size of the notebook is your primary concern, I would suggest using the url directly in the chart: import altair as alt
url = "https://raw.githubusercontent.com/firasm/bits/refs/heads/master/street_trees.csv"
chart = (
alt.Chart(url)
.mark_point()
.encode(alt.X("count(diameter):Q"), alt.Y("species_name:N"))
)
>>> chart.to_dict()["mark"]["type"]
'point' The trade-off for this is that you'll need to specify encoding types - as the data will be entirely opaque to Note Including the data in the spec would only be beneficial if you expect any students to make transformations prior to passing data to Solution 2 (advice)Another way to approach the problem is normalizing the dataset to multiple tables with less redundant information. Looking at only the string columns, they all have a relatively low cardinality. import pandas as pd
import polars as pl
url = "https://raw.githubusercontent.com/firasm/bits/refs/heads/master/street_trees.csv"
df = pl.DataFrame(pd.read_csv(url))
>>> df.lazy().select(cs.string().n_unique()).collect()
shape: (1, 13)
┌────────────┬────────────┬──────────────┬───────────────┬─────────────┬──────────┬──────────────┬────────────┬───────────┬────────────────────┬──────────────────┬──────┬──────────────┐
│ std_street ┆ genus_name ┆ species_name ┆ cultivar_name ┆ common_name ┆ assigned ┆ root_barrier ┆ plant_area ┆ on_street ┆ neighbourhood_name ┆ street_side_name ┆ curb ┆ date_planted │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ u32 ┆ u32 ┆ u32 ┆ u32 ┆ u32 ┆ u32 ┆ u32 ┆ u32 ┆ u32 ┆ u32 ┆ u32 ┆ u32 ┆ u32 │
╞════════════╪════════════╪══════════════╪═══════════════╪═════════════╪══════════╪══════════════╪════════════╪═══════════╪════════════════════╪══════════════════╪══════╪══════════════╡
│ 805 ┆ 97 ┆ 283 ┆ 294 ┆ 634 ┆ 2 ┆ 2 ┆ 49 ┆ 812 ┆ 22 ┆ 6 ┆ 2 ┆ 3995 │
└────────────┴────────────┴──────────────┴───────────────┴─────────────┴──────────┴──────────────┴────────────┴───────────┴────────────────────┴──────────────────┴──────┴──────────────┘ I'd lean towards this option generally, but it does require more care in thinking what information is available where. |
I've ran into this a few times when I want to grade/compare the spec of a chart with a correct spec. I believe the easiest workaround is to temporarily disable the vegafusion data transformer: with alt.data_transformers.enable('default'):
chart.to_dict() That should be all the needed, but I also tend to remove the data since I rarely want to compare that, just the rest of the spec and the data might be huge: chart_with_less_data = chart.copy() # optional but cleaner
with alt.data_transformers.enable('default'):
chart_with_less_data['data'] = chart_with_less_data['data'][:1] # optional but potentially faster
spec = chart_with_less_data.to_dict()
assert spec['mark']['type'] in ['circle', 'point'] # For example I've thought that maybe we should do this disabling of the context manager automatically if the |
What is your suggestion?
I am updating some labs for a university course that teaches altair, and I wanted to avoid embedding the full dataset into the Jupyter notebook. So I figure I'd try to use the vegafusion data transformer to reduce the size of the notebook (without it, it's a ~50mb file).
The trouble is that the chart spec as a dictionary varies significantly when the vegafusion transformer is used, so I have to painstakingly update all the OtterGrader tests that have already been written.
For example, here is a simple chart (with a large dataset):
Here's how I can get the
mark
(point):When I use the vegafusion data transformer, I have to do:
Is this something that should be expected? I can write my tests either using the vegafusion transformer, or have a large file size and keep the standard tests as-is ?
I would have expected the chart spec to have the same format.
Have you considered any alternative solutions?
No response
The text was updated successfully, but these errors were encountered: