Dictionary for altair chart differs when the VegaFusion data transformer is used #3782

firasm · 2025-01-26T23:07:34Z

What is your suggestion?

I am updating some labs for a university course that teaches altair, and I wanted to avoid embedding the full dataset into the Jupyter notebook. So I figure I'd try to use the vegafusion data transformer to reduce the size of the notebook (without it, it's a ~50mb file).

The trouble is that the chart spec as a dictionary varies significantly when the vegafusion transformer is used, so I have to painstakingly update all the OtterGrader tests that have already been written.

For example, here is a simple chart (with a large dataset):

import pandas as pd
import altair as alt
alt.data_transformers.disable_max_rows()
#alt.data_transformers.enable("vegafusion")
#alt.renderers.enable("jupyterlab")

url = "https://raw.githubusercontent.com/firasm/bits/refs/heads/master/street_trees.csv"

df = pd.read_csv(url)

chart = alt.Chart(df).mark_point().encode(alt.X('count(diameter)'),alt.Y('species_name'))
chart

Here's how I can get the mark (point):

chart.to_dict()['mark']['type']

When I use the vegafusion data transformer, I have to do:

chart.to_dict(format="vega")['marks'][0]['style'][0]

Is this something that should be expected? I can write my tests either using the vegafusion transformer, or have a large file size and keep the standard tests as-is ?

I would have expected the chart spec to have the same format.

Have you considered any alternative solutions?

No response

The text was updated successfully, but these errors were encountered:

firasm · 2025-01-26T23:08:24Z

Related to #3759

dangotbanned · 2025-01-26T23:26:53Z

chart.to_dict(format="vega")['marks'][0]['style'][0]

@firasm have you tried format="vegalite"?

IIRC, the difference you're seeing here is from explicitly asking for a vega spec - instead of vegalite.

If that isn't possible, then you might need https://github.com/vega/vl-convert?tab=readme-ov-file#python

vl_convert should be able to export to either format

firasm · 2025-01-27T03:04:55Z

Thanks for the help!

When I try:

chart.to_dict(format="vega-lite")

I get an error saying it needs to be the vega format:

ValueError: When the "vegafusion" data transformer is enabled, the 
to_dict() and to_json() chart methods must be called with format="vega". 
For example: 
    >>> chart.to_dict(format="vega")
    >>> chart.to_json(format="Vega")

I'm taking a look at vl_convert, but it seems there isn't an option to go from Vega to Vegalite:

vl2html    Convert a Vega-Lite specification to an HTML file
vg2svg     Convert a Vega specification to an SVG image
vg2png     Convert a Vega specification to an PNG image
vg2jpeg    Convert a Vega specification to an JPEG image
vg2pdf     Convert a Vega specification to an PDF image
vg2url     Convert a Vega specification to a URL that opens the chart in the Vega editor
vg2html    Convert a Vega specification to an HTML file

dangotbanned · 2025-01-27T10:40:53Z

Thanks for following this up @firasm.
It seems I was thinking of vegalite_to_vega, which wouldn't be helpful for this case.

Solution 1

If size of the notebook is your primary concern, I would suggest using the url directly in the chart:

import altair as alt

url = "https://raw.githubusercontent.com/firasm/bits/refs/heads/master/street_trees.csv"
chart = (
    alt.Chart(url)
    .mark_point()
    .encode(alt.X("count(diameter):Q"), alt.Y("species_name:N"))
)
>>> chart.to_dict()["mark"]["type"]
'point'

The trade-off for this is that you'll need to specify encoding types - as the data will be entirely opaque to altair and the resulting vega-lite spec.

Note

Including the data in the spec would only be beneficial if you expect any students to make transformations prior to passing data to altair.

Solution 2 (advice)

Another way to approach the problem is normalizing the dataset to multiple tables with less redundant information.
The current shape is (146650, 21).

Looking at only the string columns, they all have a relatively low cardinality.

import pandas as pd
import polars as pl

url = "https://raw.githubusercontent.com/firasm/bits/refs/heads/master/street_trees.csv"
df = pl.DataFrame(pd.read_csv(url))

>>> df.lazy().select(cs.string().n_unique()).collect()

shape: (1, 13)
┌────────────┬────────────┬──────────────┬───────────────┬─────────────┬──────────┬──────────────┬────────────┬───────────┬────────────────────┬──────────────────┬──────┬──────────────┐
│ std_street ┆ genus_name ┆ species_name ┆ cultivar_name ┆ common_name ┆ assigned ┆ root_barrier ┆ plant_area ┆ on_street ┆ neighbourhood_name ┆ street_side_name ┆ curb ┆ date_planted │
│ ---        ┆ ---        ┆ ---          ┆ ---           ┆ ---         ┆ ---      ┆ ---          ┆ ---        ┆ ---       ┆ ---                ┆ ---              ┆ ---  ┆ ---          │
│ u32        ┆ u32        ┆ u32          ┆ u32           ┆ u32         ┆ u32      ┆ u32          ┆ u32        ┆ u32       ┆ u32                ┆ u32              ┆ u32  ┆ u32          │
╞════════════╪════════════╪══════════════╪═══════════════╪═════════════╪══════════╪══════════════╪════════════╪═══════════╪════════════════════╪══════════════════╪══════╪══════════════╡
│ 805        ┆ 97         ┆ 283          ┆ 294           ┆ 634         ┆ 2        ┆ 2            ┆ 49         ┆ 812       ┆ 22                 ┆ 6                ┆ 2    ┆ 3995         │
└────────────┴────────────┴──────────────┴───────────────┴─────────────┴──────────┴──────────────┴────────────┴───────────┴────────────────────┴──────────────────┴──────┴──────────────┘

I'd lean towards this option generally, but it does require more care in thinking what information is available where.
Dependending on the skill level of the course - this could either be reinforcing best practices or a stumbling block if these concepts haven't been introduced

jonmmease · 2025-01-27T11:42:12Z

Hi @firasm,
Yeah, this difference is expected because VegaFusion operates on the lower-level Vega specifications rather than Vega-Lite specifications, which is what Altair is based on.

joelostblom · 2025-01-27T15:12:55Z

I've ran into this a few times when I want to grade/compare the spec of a chart with a correct spec. I believe the easiest workaround is to temporarily disable the vegafusion data transformer:

with alt.data_transformers.enable('default'):
    chart.to_dict()

That should be all the needed, but I also tend to remove the data since I rarely want to compare that, just the rest of the spec and the data might be huge:

chart_with_less_data = chart.copy()  # optional but cleaner
with alt.data_transformers.enable('default'):
    chart_with_less_data['data'] = chart_with_less_data['data'][:1]  # optional but potentially faster
    spec = chart_with_less_data.to_dict()
assert spec['mark']['type'] in ['circle', 'point']  # For example

I've thought that maybe we should do this disabling of the context manager automatically if the vegalite format of a dict is explicitly requested from a chart created with vegafusion (or it could be part of #3759). We would need to decide if it is explicit enough that such a conversion might return a huge dictionary due to the more verbose VL format.

firasm added the enhancement label Jan 26, 2025

dangotbanned added the vega: vegafusion Requires upstream/integration action w/ `vegafusion` label Jan 26, 2025

dangotbanned added question and removed enhancement labels Jan 26, 2025

dangotbanned added question needs-user-response Issue triaged, waiting on user vega: vegafusion Requires upstream/integration action w/ `vegafusion` and removed question vega: vegafusion Requires upstream/integration action w/ `vegafusion` labels Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dictionary for altair chart differs when the VegaFusion data transformer is used #3782

Dictionary for altair chart differs when the VegaFusion data transformer is used #3782

firasm commented Jan 26, 2025

firasm commented Jan 26, 2025

dangotbanned commented Jan 26, 2025 •

edited

Loading

firasm commented Jan 27, 2025 •

edited

Loading

dangotbanned commented Jan 27, 2025

jonmmease commented Jan 27, 2025

joelostblom commented Jan 27, 2025

Dictionary for altair chart differs when the VegaFusion data transformer is used #3782

Dictionary for altair chart differs when the VegaFusion data transformer is used #3782

Comments

firasm commented Jan 26, 2025

What is your suggestion?

Have you considered any alternative solutions?

firasm commented Jan 26, 2025

dangotbanned commented Jan 26, 2025 • edited Loading

firasm commented Jan 27, 2025 • edited Loading

dangotbanned commented Jan 27, 2025

Solution 1

Solution 2 (advice)

jonmmease commented Jan 27, 2025

joelostblom commented Jan 27, 2025

dangotbanned commented Jan 26, 2025 •

edited

Loading

firasm commented Jan 27, 2025 •

edited

Loading