Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary for altair chart differs when the VegaFusion data transformer is used #3782

Open
firasm opened this issue Jan 26, 2025 · 6 comments
Labels
needs-user-response Issue triaged, waiting on user question vega: vegafusion Requires upstream/integration action w/ `vegafusion`

Comments

@firasm
Copy link

firasm commented Jan 26, 2025

What is your suggestion?

I am updating some labs for a university course that teaches altair, and I wanted to avoid embedding the full dataset into the Jupyter notebook. So I figure I'd try to use the vegafusion data transformer to reduce the size of the notebook (without it, it's a ~50mb file).

The trouble is that the chart spec as a dictionary varies significantly when the vegafusion transformer is used, so I have to painstakingly update all the OtterGrader tests that have already been written.

For example, here is a simple chart (with a large dataset):

import pandas as pd
import altair as alt
alt.data_transformers.disable_max_rows()
#alt.data_transformers.enable("vegafusion")
#alt.renderers.enable("jupyterlab")

url = "https://raw.githubusercontent.com/firasm/bits/refs/heads/master/street_trees.csv"

df = pd.read_csv(url)

chart = alt.Chart(df).mark_point().encode(alt.X('count(diameter)'),alt.Y('species_name'))
chart

Here's how I can get the mark (point):

chart.to_dict()['mark']['type']

When I use the vegafusion data transformer, I have to do:

chart.to_dict(format="vega")['marks'][0]['style'][0]

Is this something that should be expected? I can write my tests either using the vegafusion transformer, or have a large file size and keep the standard tests as-is ?

I would have expected the chart spec to have the same format.

Have you considered any alternative solutions?

No response

@firasm
Copy link
Author

firasm commented Jan 26, 2025

Related to #3759

@dangotbanned dangotbanned added the vega: vegafusion Requires upstream/integration action w/ `vegafusion` label Jan 26, 2025
@dangotbanned
Copy link
Member

dangotbanned commented Jan 26, 2025

chart.to_dict(format="vega")['marks'][0]['style'][0]

@firasm have you tried format="vegalite"?

IIRC, the difference you're seeing here is from explicitly asking for a vega spec - instead of vegalite.

If that isn't possible, then you might need https://github.com/vega/vl-convert?tab=readme-ov-file#python

vl_convert should be able to export to either format

@firasm
Copy link
Author

firasm commented Jan 27, 2025

Thanks for the help!

When I try:

chart.to_dict(format="vega-lite")

I get an error saying it needs to be the vega format:

ValueError: When the "vegafusion" data transformer is enabled, the 
to_dict() and to_json() chart methods must be called with format="vega". 
For example: 
    >>> chart.to_dict(format="vega")
    >>> chart.to_json(format="Vega")

I'm taking a look at vl_convert, but it seems there isn't an option to go from Vega to Vegalite:

vl2html    Convert a Vega-Lite specification to an HTML file
vg2svg     Convert a Vega specification to an SVG image
vg2png     Convert a Vega specification to an PNG image
vg2jpeg    Convert a Vega specification to an JPEG image
vg2pdf     Convert a Vega specification to an PDF image
vg2url     Convert a Vega specification to a URL that opens the chart in the Vega editor
vg2html    Convert a Vega specification to an HTML file

@dangotbanned
Copy link
Member

Thanks for following this up @firasm.
It seems I was thinking of vegalite_to_vega, which wouldn't be helpful for this case.

Solution 1

If size of the notebook is your primary concern, I would suggest using the url directly in the chart:

import altair as alt

url = "https://raw.githubusercontent.com/firasm/bits/refs/heads/master/street_trees.csv"
chart = (
    alt.Chart(url)
    .mark_point()
    .encode(alt.X("count(diameter):Q"), alt.Y("species_name:N"))
)
>>> chart.to_dict()["mark"]["type"]
'point'

The trade-off for this is that you'll need to specify encoding types - as the data will be entirely opaque to altair and the resulting vega-lite spec.

Note

Including the data in the spec would only be beneficial if you expect any students to make transformations prior to passing data to altair.

Solution 2 (advice)

Another way to approach the problem is normalizing the dataset to multiple tables with less redundant information.
The current shape is (146650, 21).

Looking at only the string columns, they all have a relatively low cardinality.

import pandas as pd
import polars as pl

url = "https://raw.githubusercontent.com/firasm/bits/refs/heads/master/street_trees.csv"
df = pl.DataFrame(pd.read_csv(url))

>>> df.lazy().select(cs.string().n_unique()).collect()

shape: (1, 13)
┌────────────┬────────────┬──────────────┬───────────────┬─────────────┬──────────┬──────────────┬────────────┬───────────┬────────────────────┬──────────────────┬──────┬──────────────┐
│ std_streetgenus_namespecies_namecultivar_namecommon_nameassignedroot_barrierplant_areaon_streetneighbourhood_namestreet_side_namecurbdate_planted │
│ ---------------------------------------          │
│ u32u32u32u32u32u32u32u32u32u32u32u32u32          │
╞════════════╪════════════╪══════════════╪═══════════════╪═════════════╪══════════╪══════════════╪════════════╪═══════════╪════════════════════╪══════════════════╪══════╪══════════════╡
│ 80597283294634224981222623995         │
└────────────┴────────────┴──────────────┴───────────────┴─────────────┴──────────┴──────────────┴────────────┴───────────┴────────────────────┴──────────────────┴──────┴──────────────┘

I'd lean towards this option generally, but it does require more care in thinking what information is available where.
Dependending on the skill level of the course - this could either be reinforcing best practices or a stumbling block if these concepts haven't been introduced

@dangotbanned dangotbanned added question needs-user-response Issue triaged, waiting on user vega: vegafusion Requires upstream/integration action w/ `vegafusion` and removed question vega: vegafusion Requires upstream/integration action w/ `vegafusion` labels Jan 27, 2025
@jonmmease
Copy link
Contributor

Hi @firasm,
Yeah, this difference is expected because VegaFusion operates on the lower-level Vega specifications rather than Vega-Lite specifications, which is what Altair is based on.

@joelostblom
Copy link
Contributor

I've ran into this a few times when I want to grade/compare the spec of a chart with a correct spec. I believe the easiest workaround is to temporarily disable the vegafusion data transformer:

with alt.data_transformers.enable('default'):
    chart.to_dict()

That should be all the needed, but I also tend to remove the data since I rarely want to compare that, just the rest of the spec and the data might be huge:

chart_with_less_data = chart.copy()  # optional but cleaner
with alt.data_transformers.enable('default'):
    chart_with_less_data['data'] = chart_with_less_data['data'][:1]  # optional but potentially faster
    spec = chart_with_less_data.to_dict()
assert spec['mark']['type'] in ['circle', 'point']  # For example

I've thought that maybe we should do this disabling of the context manager automatically if the vegalite format of a dict is explicitly requested from a chart created with vegafusion (or it could be part of #3759). We would need to decide if it is explicit enough that such a conversion might return a huge dictionary due to the more verbose VL format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-user-response Issue triaged, waiting on user question vega: vegafusion Requires upstream/integration action w/ `vegafusion`
Projects
None yet
Development

No branches or pull requests

4 participants