ESM2 Interactive inference #654

farhadrgh · 2025-01-24T00:48:07Z

Description

Type of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Refactor
Documentation update
Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels:

SKIP_CI - Skip all continuous integration tests
INCLUDE_NOTEBOOKS_TESTS - Execute notebook validation tests in pytest

Note

By default, the notebooks validation tests are skipped unless explicitly enabled.

Usage

TODO: Add code snippet

Pre-submit Checklist

I have tested these changes locally
I have updated the documentation accordingly
I have added/updated tests as needed
All existing tests pass successfully

Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>

farhadrgh · 2025-01-24T00:52:42Z

sub-packages/bionemo-esm2/src/bionemo/esm2/testing/eval.py

+from bionemo.core.utils.dtypes import PrecisionTypes, get_autocast_dtype
+from bionemo.esm2.data import tokenizer
+from bionemo.esm2.model.model import ESM2Config
+from bionemo.testing import megatron_parallel_state_utils


I will have to move megatron_parallel_state_utils from bionemo.testing for this to work.
CC: @jstjohn @pstjohn

Yeah I had this same issue and almost put a thread together for it. The other option (which honestly might be better) is just to move the context manager outside of eval_esm2) and just ensure that eval_esm2 is called inside one. Thats what I do with assert_model_equivalence

I.e., I think we can call into bionemo.testing in the example notebooks without tach yelling at us, and it makes the creation of the megatron context more explicit

yeah, I moved the context manager outside and into the notebooks, no complain from tach!

pstjohn · 2025-01-24T15:27:27Z

sub-packages/bionemo-esm2/src/bionemo/esm2/testing/compare.py

-        nemo_model = Float16Module(nemo_config, nemo_model)
-
-    nemo_output = nemo_model(input_ids, attention_mask)
+    nemo_output = eval_esm2(ckpt_path, test_proteins, precision=precision)


IMO I like to ensure the tensor inputs to the two models are the same, rather than doing that twice here. Might make sense to decompose your eval_esm2 function into more modular pieces so that's possible

I figured we might run into cases where we iterate on a dataloader, so I separated the setup/teardown to avoid configuring the model every time we call forward.

Now I can also add a method to check the inputs and tokenize them if they are sequences instead of two tensors (input_ids, attention_mask). But I don't think that would look clean. What do you think?

Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>

codecov-commenter · 2025-01-24T18:13:20Z

❌ 7 Tests Failed:

Tests completed	Failed	Passed	Skipped
894	7	887	12

View the top 3 failed tests by shortest run time

sub-packages/bionemo-esm2/tests/bionemo/esm2/model/test_model.py::test_model_equivalence_with_huggingface_8m[bf16]

Stack Traces | 2.26s run time

precision = 'bf16'

    @pytest.mark.parametrize("precision", ["fp32", "bf16", "fp16", "bf16-mixed"])
    def test_model_equivalence_with_huggingface_8m(precision):
        model_tag = "facebook/esm2_t6_8M_UR50D"
        ckpt_path = load("esm2/8m:2.0")
        with megatron_parallel_state_utils.distributed_model_parallel_state(precision=precision):
>           assert_model_equivalence(ckpt_path, model_tag, precision=precision)

.../esm2/model/test_model.py:183: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ckpt_path = PosixPath('.../github/home/.cache/bionemo/2957b2c36d5978d0f595d6f1b72104b312621cf0329209086537b613c1c96d16-esm2_hf_converted_8m_checkpoint.tar.gz.untar')
model_tag = 'facebook/esm2_t6_8M_UR50D', precision = 'bf16', rtol = None
atol = None

    def assert_model_equivalence(
        ckpt_path: Path | str,
        model_tag: str,
        precision: PrecisionTypes = "fp32",
        rtol: float | None = None,
        atol: float | None = None,
    ) -> None:
        """Testing utility to compare the outputs of a NeMo2 checkpoint to the original HuggingFace model weights.
    
        Compares the cosine similarity of the logit and hidden state outputs of a NeMo2 model checkpoint to the outputs of
        the corresponding HuggingFace model.
    
        Args:
            ckpt_path: A path to a NeMo2 checkpoint for an ESM-2 model.
            model_tag: The HuggingFace model tag for the model to compare against.
            precision: The precision type to use for the comparison. Defaults to "fp32".
            rtol: The relative tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
            atol: The absolute tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
        """
        tokenizer = get_tokenizer()
    
        test_proteins = [
            "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLA",
            "MKTVRQERLKSI<mask>RILERSKEPVSGAQLAEELS<mask>SRQVIVQDIAYLRSLGYN<mask>VATPRGYVLAGG",
        ]
        tokens = tokenizer(test_proteins, return_tensors="pt", padding=True, truncation=True).to("cuda")
        input_ids = tokens["input_ids"]
        attention_mask = tokens["attention_mask"]
    
        dtype = get_autocast_dtype(precision)
        nemo_config = ESM2Config(
            initial_ckpt_path=ckpt_path,
            include_embeddings=True,
            include_hiddens=True,
            params_dtype=dtype,
            pipeline_dtype=dtype,
            autocast_dtype=dtype,
            bf16=dtype is torch.bfloat16,
            fp16=dtype is torch.float16,
        )
        evaluator = ESM2ModelEvaluator(nemo_config)
        evaluator.setup()
        nemo_output = evaluator.eval(input_ids, attention_mask)
        evaluator.teardown()
    
        nemo_logits = nemo_output["token_logits"].transpose(0, 1).contiguous()[..., : tokenizer.vocab_size]
        nemo_hidden_state = nemo_output["hidden_states"]
    
        hf_model = AutoModelForMaskedLM.from_pretrained(model_tag, torch_dtype=get_autocast_dtype(precision)).cuda().eval()
        hf_output_all = hf_model(input_ids, attention_mask, output_hidden_states=True)
        hf_hidden_state = hf_output_all.hidden_states[-1]
    
        # Rather than directly comparing the logit or hidden state tensors, we compare their cosine similarity. These
        # should be essentially 1 if the outputs are equivalent, but is less sensitive to small numerical differences.
        # We don't care about the padding tokens, so we only compare the non-padding tokens.
        logit_similarity = torch.nn.functional.cosine_similarity(nemo_logits, hf_output_all.logits, dim=2)
        logit_similarity = logit_similarity[attention_mask == 1]
    
        hidden_state_similarity = torch.nn.functional.cosine_similarity(nemo_hidden_state, hf_hidden_state, dim=2)
        hidden_state_similarity = hidden_state_similarity[attention_mask == 1]
    
>       torch.testing.assert_close(logit_similarity, torch.ones_like(logit_similarity), rtol=rtol, atol=atol)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 132 / 132 (100.0%)
E       Greatest absolute difference: 0.23046875 at index (64,) (up to 1e-05 allowed)
E       Greatest relative difference: 0.23046875 at index (64,) (up to 0.016 allowed)

.../local/lib/python3.12.../esm2/testing/compare.py:91: AssertionError

sub-packages/bionemo-esm2/tests/bionemo/esm2/model/test_model.py::test_model_equivalence_with_huggingface_8m[bf16-mixed]

Stack Traces | 2.27s run time

precision = 'bf16-mixed'

    @pytest.mark.parametrize("precision", ["fp32", "bf16", "fp16", "bf16-mixed"])
    def test_model_equivalence_with_huggingface_8m(precision):
        model_tag = "facebook/esm2_t6_8M_UR50D"
        ckpt_path = load("esm2/8m:2.0")
        with megatron_parallel_state_utils.distributed_model_parallel_state(precision=precision):
>           assert_model_equivalence(ckpt_path, model_tag, precision=precision)

.../esm2/model/test_model.py:183: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ckpt_path = PosixPath('.../github/home/.cache/bionemo/2957b2c36d5978d0f595d6f1b72104b312621cf0329209086537b613c1c96d16-esm2_hf_converted_8m_checkpoint.tar.gz.untar')
model_tag = 'facebook/esm2_t6_8M_UR50D', precision = 'bf16-mixed', rtol = None
atol = None

    def assert_model_equivalence(
        ckpt_path: Path | str,
        model_tag: str,
        precision: PrecisionTypes = "fp32",
        rtol: float | None = None,
        atol: float | None = None,
    ) -> None:
        """Testing utility to compare the outputs of a NeMo2 checkpoint to the original HuggingFace model weights.
    
        Compares the cosine similarity of the logit and hidden state outputs of a NeMo2 model checkpoint to the outputs of
        the corresponding HuggingFace model.
    
        Args:
            ckpt_path: A path to a NeMo2 checkpoint for an ESM-2 model.
            model_tag: The HuggingFace model tag for the model to compare against.
            precision: The precision type to use for the comparison. Defaults to "fp32".
            rtol: The relative tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
            atol: The absolute tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
        """
        tokenizer = get_tokenizer()
    
        test_proteins = [
            "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLA",
            "MKTVRQERLKSI<mask>RILERSKEPVSGAQLAEELS<mask>SRQVIVQDIAYLRSLGYN<mask>VATPRGYVLAGG",
        ]
        tokens = tokenizer(test_proteins, return_tensors="pt", padding=True, truncation=True).to("cuda")
        input_ids = tokens["input_ids"]
        attention_mask = tokens["attention_mask"]
    
        dtype = get_autocast_dtype(precision)
        nemo_config = ESM2Config(
            initial_ckpt_path=ckpt_path,
            include_embeddings=True,
            include_hiddens=True,
            params_dtype=dtype,
            pipeline_dtype=dtype,
            autocast_dtype=dtype,
            bf16=dtype is torch.bfloat16,
            fp16=dtype is torch.float16,
        )
        evaluator = ESM2ModelEvaluator(nemo_config)
        evaluator.setup()
        nemo_output = evaluator.eval(input_ids, attention_mask)
        evaluator.teardown()
    
        nemo_logits = nemo_output["token_logits"].transpose(0, 1).contiguous()[..., : tokenizer.vocab_size]
        nemo_hidden_state = nemo_output["hidden_states"]
    
        hf_model = AutoModelForMaskedLM.from_pretrained(model_tag, torch_dtype=get_autocast_dtype(precision)).cuda().eval()
        hf_output_all = hf_model(input_ids, attention_mask, output_hidden_states=True)
        hf_hidden_state = hf_output_all.hidden_states[-1]
    
        # Rather than directly comparing the logit or hidden state tensors, we compare their cosine similarity. These
        # should be essentially 1 if the outputs are equivalent, but is less sensitive to small numerical differences.
        # We don't care about the padding tokens, so we only compare the non-padding tokens.
        logit_similarity = torch.nn.functional.cosine_similarity(nemo_logits, hf_output_all.logits, dim=2)
        logit_similarity = logit_similarity[attention_mask == 1]
    
        hidden_state_similarity = torch.nn.functional.cosine_similarity(nemo_hidden_state, hf_hidden_state, dim=2)
        hidden_state_similarity = hidden_state_similarity[attention_mask == 1]
    
>       torch.testing.assert_close(logit_similarity, torch.ones_like(logit_similarity), rtol=rtol, atol=atol)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 132 / 132 (100.0%)
E       Greatest absolute difference: 0.23046875 at index (64,) (up to 1e-05 allowed)
E       Greatest relative difference: 0.23046875 at index (64,) (up to 0.016 allowed)

.../local/lib/python3.12.../esm2/testing/compare.py:91: AssertionError

sub-packages/bionemo-esm2/tests/bionemo/esm2/model/test_model.py::test_model_equivalence_with_huggingface_8m[fp32]

Stack Traces | 2.27s run time

precision = 'fp32'

    @pytest.mark.parametrize("precision", ["fp32", "bf16", "fp16", "bf16-mixed"])
    def test_model_equivalence_with_huggingface_8m(precision):
        model_tag = "facebook/esm2_t6_8M_UR50D"
        ckpt_path = load("esm2/8m:2.0")
        with megatron_parallel_state_utils.distributed_model_parallel_state(precision=precision):
>           assert_model_equivalence(ckpt_path, model_tag, precision=precision)

.../esm2/model/test_model.py:183: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ckpt_path = PosixPath('.../github/home/.cache/bionemo/2957b2c36d5978d0f595d6f1b72104b312621cf0329209086537b613c1c96d16-esm2_hf_converted_8m_checkpoint.tar.gz.untar')
model_tag = 'facebook/esm2_t6_8M_UR50D', precision = 'fp32', rtol = None
atol = None

    def assert_model_equivalence(
        ckpt_path: Path | str,
        model_tag: str,
        precision: PrecisionTypes = "fp32",
        rtol: float | None = None,
        atol: float | None = None,
    ) -> None:
        """Testing utility to compare the outputs of a NeMo2 checkpoint to the original HuggingFace model weights.
    
        Compares the cosine similarity of the logit and hidden state outputs of a NeMo2 model checkpoint to the outputs of
        the corresponding HuggingFace model.
    
        Args:
            ckpt_path: A path to a NeMo2 checkpoint for an ESM-2 model.
            model_tag: The HuggingFace model tag for the model to compare against.
            precision: The precision type to use for the comparison. Defaults to "fp32".
            rtol: The relative tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
            atol: The absolute tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
        """
        tokenizer = get_tokenizer()
    
        test_proteins = [
            "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLA",
            "MKTVRQERLKSI<mask>RILERSKEPVSGAQLAEELS<mask>SRQVIVQDIAYLRSLGYN<mask>VATPRGYVLAGG",
        ]
        tokens = tokenizer(test_proteins, return_tensors="pt", padding=True, truncation=True).to("cuda")
        input_ids = tokens["input_ids"]
        attention_mask = tokens["attention_mask"]
    
        dtype = get_autocast_dtype(precision)
        nemo_config = ESM2Config(
            initial_ckpt_path=ckpt_path,
            include_embeddings=True,
            include_hiddens=True,
            params_dtype=dtype,
            pipeline_dtype=dtype,
            autocast_dtype=dtype,
            bf16=dtype is torch.bfloat16,
            fp16=dtype is torch.float16,
        )
        evaluator = ESM2ModelEvaluator(nemo_config)
        evaluator.setup()
        nemo_output = evaluator.eval(input_ids, attention_mask)
        evaluator.teardown()
    
        nemo_logits = nemo_output["token_logits"].transpose(0, 1).contiguous()[..., : tokenizer.vocab_size]
        nemo_hidden_state = nemo_output["hidden_states"]
    
        hf_model = AutoModelForMaskedLM.from_pretrained(model_tag, torch_dtype=get_autocast_dtype(precision)).cuda().eval()
        hf_output_all = hf_model(input_ids, attention_mask, output_hidden_states=True)
        hf_hidden_state = hf_output_all.hidden_states[-1]
    
        # Rather than directly comparing the logit or hidden state tensors, we compare their cosine similarity. These
        # should be essentially 1 if the outputs are equivalent, but is less sensitive to small numerical differences.
        # We don't care about the padding tokens, so we only compare the non-padding tokens.
        logit_similarity = torch.nn.functional.cosine_similarity(nemo_logits, hf_output_all.logits, dim=2)
        logit_similarity = logit_similarity[attention_mask == 1]
    
        hidden_state_similarity = torch.nn.functional.cosine_similarity(nemo_hidden_state, hf_hidden_state, dim=2)
        hidden_state_similarity = hidden_state_similarity[attention_mask == 1]
    
>       torch.testing.assert_close(logit_similarity, torch.ones_like(logit_similarity), rtol=rtol, atol=atol)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 132 / 132 (100.0%)
E       Greatest absolute difference: 0.23029422760009766 at index (65,) (up to 1e-05 allowed)
E       Greatest relative difference: 0.23029422760009766 at index (65,) (up to 1.3e-06 allowed)

.../local/lib/python3.12.../esm2/testing/compare.py:91: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>

farhadrgh · 2025-01-24T20:45:43Z

@pstjohn any idea why the golden value tests are not passing? I am getting AssertionError: Tensor-likes are not close! from sub-packages/bionemo-esm2/tests/bionemo/esm2/model/test_convert.py

pstjohn · 2025-01-24T20:51:37Z

@farhadrgh looks like you're changing the model comparison eval function; I'm guessing we're not getting the same values out. Maybe try running through a debugger to see where the change is happening?

farhadrgh added 3 commits January 23, 2025 16:45

interactive inference

904161f

Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>

update

2d86b0a

Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>

add interactive inference

4c4ba12

Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>

farhadrgh commented Jan 24, 2025

View reviewed changes

pstjohn reviewed Jan 24, 2025

View reviewed changes

farhadrgh added 2 commits January 24, 2025 09:38

separate setup/teardown from eval

ce62d80

Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>

separate setup/teardown from eval

447df1d

Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>

farhadrgh added 2 commits January 24, 2025 11:57

add mising ctx manager

7b5dff7

Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>

resolve conflicts

3703a63

Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESM2 Interactive inference #654

ESM2 Interactive inference #654

farhadrgh commented Jan 24, 2025

farhadrgh Jan 24, 2025

pstjohn Jan 24, 2025

pstjohn Jan 24, 2025

farhadrgh Jan 24, 2025

pstjohn Jan 24, 2025 •

edited

Loading

farhadrgh Jan 24, 2025 •

edited

Loading

codecov-commenter commented Jan 24, 2025 •

edited

Loading

farhadrgh commented Jan 24, 2025

pstjohn commented Jan 24, 2025

ESM2 Interactive inference #654

Are you sure you want to change the base?

ESM2 Interactive inference #654

Conversation

farhadrgh commented Jan 24, 2025

Description

Type of changes

CI Pipeline Configuration

Usage

Pre-submit Checklist

farhadrgh Jan 24, 2025

Choose a reason for hiding this comment

pstjohn Jan 24, 2025

Choose a reason for hiding this comment

pstjohn Jan 24, 2025

Choose a reason for hiding this comment

farhadrgh Jan 24, 2025

Choose a reason for hiding this comment

pstjohn Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

farhadrgh Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Jan 24, 2025 • edited Loading

❌ 7 Tests Failed:

farhadrgh commented Jan 24, 2025

pstjohn commented Jan 24, 2025

pstjohn Jan 24, 2025 •

edited

Loading

farhadrgh Jan 24, 2025 •

edited

Loading

codecov-commenter commented Jan 24, 2025 •

edited

Loading