Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESM2 Interactive inference #654

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

ESM2 Interactive inference #654

wants to merge 7 commits into from

Conversation

farhadrgh
Copy link
Collaborator

Description

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor
  • Documentation update
  • Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels:

Note

By default, the notebooks validation tests are skipped unless explicitly enabled.

Usage

TODO: Add code snippet

Pre-submit Checklist

  • I have tested these changes locally
  • I have updated the documentation accordingly
  • I have added/updated tests as needed
  • All existing tests pass successfully

Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
from bionemo.core.utils.dtypes import PrecisionTypes, get_autocast_dtype
from bionemo.esm2.data import tokenizer
from bionemo.esm2.model.model import ESM2Config
from bionemo.testing import megatron_parallel_state_utils
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to move megatron_parallel_state_utils from bionemo.testing for this to work.
CC: @jstjohn @pstjohn

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I had this same issue and almost put a thread together for it. The other option (which honestly might be better) is just to move the context manager outside of eval_esm2) and just ensure that eval_esm2 is called inside one. Thats what I do with assert_model_equivalence

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e., I think we can call into bionemo.testing in the example notebooks without tach yelling at us, and it makes the creation of the megatron context more explicit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I moved the context manager outside and into the notebooks, no complain from tach!

nemo_model = Float16Module(nemo_config, nemo_model)

nemo_output = nemo_model(input_ids, attention_mask)
nemo_output = eval_esm2(ckpt_path, test_proteins, precision=precision)
Copy link
Collaborator

@pstjohn pstjohn Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO I like to ensure the tensor inputs to the two models are the same, rather than doing that twice here. Might make sense to decompose your eval_esm2 function into more modular pieces so that's possible

Copy link
Collaborator Author

@farhadrgh farhadrgh Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured we might run into cases where we iterate on a dataloader, so I separated the setup/teardown to avoid configuring the model every time we call forward.

Now I can also add a method to check the inputs and tokenize them if they are sequences instead of two tensors (input_ids, attention_mask). But I don't think that would look clean. What do you think?

Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
@codecov-commenter
Copy link

codecov-commenter commented Jan 24, 2025

❌ 7 Tests Failed:

Tests completed Failed Passed Skipped
894 7 887 12
View the top 3 failed tests by shortest run time
sub-packages/bionemo-esm2/tests/bionemo/esm2/model/test_model.py::test_model_equivalence_with_huggingface_8m[bf16]
Stack Traces | 2.26s run time
precision = 'bf16'

    @pytest.mark.parametrize("precision", ["fp32", "bf16", "fp16", "bf16-mixed"])
    def test_model_equivalence_with_huggingface_8m(precision):
        model_tag = "facebook/esm2_t6_8M_UR50D"
        ckpt_path = load("esm2/8m:2.0")
        with megatron_parallel_state_utils.distributed_model_parallel_state(precision=precision):
>           assert_model_equivalence(ckpt_path, model_tag, precision=precision)

.../esm2/model/test_model.py:183: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ckpt_path = PosixPath('.../github/home/.cache/bionemo/2957b2c36d5978d0f595d6f1b72104b312621cf0329209086537b613c1c96d16-esm2_hf_converted_8m_checkpoint.tar.gz.untar')
model_tag = 'facebook/esm2_t6_8M_UR50D', precision = 'bf16', rtol = None
atol = None

    def assert_model_equivalence(
        ckpt_path: Path | str,
        model_tag: str,
        precision: PrecisionTypes = "fp32",
        rtol: float | None = None,
        atol: float | None = None,
    ) -> None:
        """Testing utility to compare the outputs of a NeMo2 checkpoint to the original HuggingFace model weights.
    
        Compares the cosine similarity of the logit and hidden state outputs of a NeMo2 model checkpoint to the outputs of
        the corresponding HuggingFace model.
    
        Args:
            ckpt_path: A path to a NeMo2 checkpoint for an ESM-2 model.
            model_tag: The HuggingFace model tag for the model to compare against.
            precision: The precision type to use for the comparison. Defaults to "fp32".
            rtol: The relative tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
            atol: The absolute tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
        """
        tokenizer = get_tokenizer()
    
        test_proteins = [
            "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLA",
            "MKTVRQERLKSI<mask>RILERSKEPVSGAQLAEELS<mask>SRQVIVQDIAYLRSLGYN<mask>VATPRGYVLAGG",
        ]
        tokens = tokenizer(test_proteins, return_tensors="pt", padding=True, truncation=True).to("cuda")
        input_ids = tokens["input_ids"]
        attention_mask = tokens["attention_mask"]
    
        dtype = get_autocast_dtype(precision)
        nemo_config = ESM2Config(
            initial_ckpt_path=ckpt_path,
            include_embeddings=True,
            include_hiddens=True,
            params_dtype=dtype,
            pipeline_dtype=dtype,
            autocast_dtype=dtype,
            bf16=dtype is torch.bfloat16,
            fp16=dtype is torch.float16,
        )
        evaluator = ESM2ModelEvaluator(nemo_config)
        evaluator.setup()
        nemo_output = evaluator.eval(input_ids, attention_mask)
        evaluator.teardown()
    
        nemo_logits = nemo_output["token_logits"].transpose(0, 1).contiguous()[..., : tokenizer.vocab_size]
        nemo_hidden_state = nemo_output["hidden_states"]
    
        hf_model = AutoModelForMaskedLM.from_pretrained(model_tag, torch_dtype=get_autocast_dtype(precision)).cuda().eval()
        hf_output_all = hf_model(input_ids, attention_mask, output_hidden_states=True)
        hf_hidden_state = hf_output_all.hidden_states[-1]
    
        # Rather than directly comparing the logit or hidden state tensors, we compare their cosine similarity. These
        # should be essentially 1 if the outputs are equivalent, but is less sensitive to small numerical differences.
        # We don't care about the padding tokens, so we only compare the non-padding tokens.
        logit_similarity = torch.nn.functional.cosine_similarity(nemo_logits, hf_output_all.logits, dim=2)
        logit_similarity = logit_similarity[attention_mask == 1]
    
        hidden_state_similarity = torch.nn.functional.cosine_similarity(nemo_hidden_state, hf_hidden_state, dim=2)
        hidden_state_similarity = hidden_state_similarity[attention_mask == 1]
    
>       torch.testing.assert_close(logit_similarity, torch.ones_like(logit_similarity), rtol=rtol, atol=atol)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 132 / 132 (100.0%)
E       Greatest absolute difference: 0.23046875 at index (64,) (up to 1e-05 allowed)
E       Greatest relative difference: 0.23046875 at index (64,) (up to 0.016 allowed)

.../local/lib/python3.12.../esm2/testing/compare.py:91: AssertionError
sub-packages/bionemo-esm2/tests/bionemo/esm2/model/test_model.py::test_model_equivalence_with_huggingface_8m[bf16-mixed]
Stack Traces | 2.27s run time
precision = 'bf16-mixed'

    @pytest.mark.parametrize("precision", ["fp32", "bf16", "fp16", "bf16-mixed"])
    def test_model_equivalence_with_huggingface_8m(precision):
        model_tag = "facebook/esm2_t6_8M_UR50D"
        ckpt_path = load("esm2/8m:2.0")
        with megatron_parallel_state_utils.distributed_model_parallel_state(precision=precision):
>           assert_model_equivalence(ckpt_path, model_tag, precision=precision)

.../esm2/model/test_model.py:183: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ckpt_path = PosixPath('.../github/home/.cache/bionemo/2957b2c36d5978d0f595d6f1b72104b312621cf0329209086537b613c1c96d16-esm2_hf_converted_8m_checkpoint.tar.gz.untar')
model_tag = 'facebook/esm2_t6_8M_UR50D', precision = 'bf16-mixed', rtol = None
atol = None

    def assert_model_equivalence(
        ckpt_path: Path | str,
        model_tag: str,
        precision: PrecisionTypes = "fp32",
        rtol: float | None = None,
        atol: float | None = None,
    ) -> None:
        """Testing utility to compare the outputs of a NeMo2 checkpoint to the original HuggingFace model weights.
    
        Compares the cosine similarity of the logit and hidden state outputs of a NeMo2 model checkpoint to the outputs of
        the corresponding HuggingFace model.
    
        Args:
            ckpt_path: A path to a NeMo2 checkpoint for an ESM-2 model.
            model_tag: The HuggingFace model tag for the model to compare against.
            precision: The precision type to use for the comparison. Defaults to "fp32".
            rtol: The relative tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
            atol: The absolute tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
        """
        tokenizer = get_tokenizer()
    
        test_proteins = [
            "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLA",
            "MKTVRQERLKSI<mask>RILERSKEPVSGAQLAEELS<mask>SRQVIVQDIAYLRSLGYN<mask>VATPRGYVLAGG",
        ]
        tokens = tokenizer(test_proteins, return_tensors="pt", padding=True, truncation=True).to("cuda")
        input_ids = tokens["input_ids"]
        attention_mask = tokens["attention_mask"]
    
        dtype = get_autocast_dtype(precision)
        nemo_config = ESM2Config(
            initial_ckpt_path=ckpt_path,
            include_embeddings=True,
            include_hiddens=True,
            params_dtype=dtype,
            pipeline_dtype=dtype,
            autocast_dtype=dtype,
            bf16=dtype is torch.bfloat16,
            fp16=dtype is torch.float16,
        )
        evaluator = ESM2ModelEvaluator(nemo_config)
        evaluator.setup()
        nemo_output = evaluator.eval(input_ids, attention_mask)
        evaluator.teardown()
    
        nemo_logits = nemo_output["token_logits"].transpose(0, 1).contiguous()[..., : tokenizer.vocab_size]
        nemo_hidden_state = nemo_output["hidden_states"]
    
        hf_model = AutoModelForMaskedLM.from_pretrained(model_tag, torch_dtype=get_autocast_dtype(precision)).cuda().eval()
        hf_output_all = hf_model(input_ids, attention_mask, output_hidden_states=True)
        hf_hidden_state = hf_output_all.hidden_states[-1]
    
        # Rather than directly comparing the logit or hidden state tensors, we compare their cosine similarity. These
        # should be essentially 1 if the outputs are equivalent, but is less sensitive to small numerical differences.
        # We don't care about the padding tokens, so we only compare the non-padding tokens.
        logit_similarity = torch.nn.functional.cosine_similarity(nemo_logits, hf_output_all.logits, dim=2)
        logit_similarity = logit_similarity[attention_mask == 1]
    
        hidden_state_similarity = torch.nn.functional.cosine_similarity(nemo_hidden_state, hf_hidden_state, dim=2)
        hidden_state_similarity = hidden_state_similarity[attention_mask == 1]
    
>       torch.testing.assert_close(logit_similarity, torch.ones_like(logit_similarity), rtol=rtol, atol=atol)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 132 / 132 (100.0%)
E       Greatest absolute difference: 0.23046875 at index (64,) (up to 1e-05 allowed)
E       Greatest relative difference: 0.23046875 at index (64,) (up to 0.016 allowed)

.../local/lib/python3.12.../esm2/testing/compare.py:91: AssertionError
sub-packages/bionemo-esm2/tests/bionemo/esm2/model/test_model.py::test_model_equivalence_with_huggingface_8m[fp32]
Stack Traces | 2.27s run time
precision = 'fp32'

    @pytest.mark.parametrize("precision", ["fp32", "bf16", "fp16", "bf16-mixed"])
    def test_model_equivalence_with_huggingface_8m(precision):
        model_tag = "facebook/esm2_t6_8M_UR50D"
        ckpt_path = load("esm2/8m:2.0")
        with megatron_parallel_state_utils.distributed_model_parallel_state(precision=precision):
>           assert_model_equivalence(ckpt_path, model_tag, precision=precision)

.../esm2/model/test_model.py:183: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ckpt_path = PosixPath('.../github/home/.cache/bionemo/2957b2c36d5978d0f595d6f1b72104b312621cf0329209086537b613c1c96d16-esm2_hf_converted_8m_checkpoint.tar.gz.untar')
model_tag = 'facebook/esm2_t6_8M_UR50D', precision = 'fp32', rtol = None
atol = None

    def assert_model_equivalence(
        ckpt_path: Path | str,
        model_tag: str,
        precision: PrecisionTypes = "fp32",
        rtol: float | None = None,
        atol: float | None = None,
    ) -> None:
        """Testing utility to compare the outputs of a NeMo2 checkpoint to the original HuggingFace model weights.
    
        Compares the cosine similarity of the logit and hidden state outputs of a NeMo2 model checkpoint to the outputs of
        the corresponding HuggingFace model.
    
        Args:
            ckpt_path: A path to a NeMo2 checkpoint for an ESM-2 model.
            model_tag: The HuggingFace model tag for the model to compare against.
            precision: The precision type to use for the comparison. Defaults to "fp32".
            rtol: The relative tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
            atol: The absolute tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
        """
        tokenizer = get_tokenizer()
    
        test_proteins = [
            "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLA",
            "MKTVRQERLKSI<mask>RILERSKEPVSGAQLAEELS<mask>SRQVIVQDIAYLRSLGYN<mask>VATPRGYVLAGG",
        ]
        tokens = tokenizer(test_proteins, return_tensors="pt", padding=True, truncation=True).to("cuda")
        input_ids = tokens["input_ids"]
        attention_mask = tokens["attention_mask"]
    
        dtype = get_autocast_dtype(precision)
        nemo_config = ESM2Config(
            initial_ckpt_path=ckpt_path,
            include_embeddings=True,
            include_hiddens=True,
            params_dtype=dtype,
            pipeline_dtype=dtype,
            autocast_dtype=dtype,
            bf16=dtype is torch.bfloat16,
            fp16=dtype is torch.float16,
        )
        evaluator = ESM2ModelEvaluator(nemo_config)
        evaluator.setup()
        nemo_output = evaluator.eval(input_ids, attention_mask)
        evaluator.teardown()
    
        nemo_logits = nemo_output["token_logits"].transpose(0, 1).contiguous()[..., : tokenizer.vocab_size]
        nemo_hidden_state = nemo_output["hidden_states"]
    
        hf_model = AutoModelForMaskedLM.from_pretrained(model_tag, torch_dtype=get_autocast_dtype(precision)).cuda().eval()
        hf_output_all = hf_model(input_ids, attention_mask, output_hidden_states=True)
        hf_hidden_state = hf_output_all.hidden_states[-1]
    
        # Rather than directly comparing the logit or hidden state tensors, we compare their cosine similarity. These
        # should be essentially 1 if the outputs are equivalent, but is less sensitive to small numerical differences.
        # We don't care about the padding tokens, so we only compare the non-padding tokens.
        logit_similarity = torch.nn.functional.cosine_similarity(nemo_logits, hf_output_all.logits, dim=2)
        logit_similarity = logit_similarity[attention_mask == 1]
    
        hidden_state_similarity = torch.nn.functional.cosine_similarity(nemo_hidden_state, hf_hidden_state, dim=2)
        hidden_state_similarity = hidden_state_similarity[attention_mask == 1]
    
>       torch.testing.assert_close(logit_similarity, torch.ones_like(logit_similarity), rtol=rtol, atol=atol)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 132 / 132 (100.0%)
E       Greatest absolute difference: 0.23029422760009766 at index (65,) (up to 1e-05 allowed)
E       Greatest relative difference: 0.23029422760009766 at index (65,) (up to 1.3e-06 allowed)

.../local/lib/python3.12.../esm2/testing/compare.py:91: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
@farhadrgh
Copy link
Collaborator Author

@pstjohn any idea why the golden value tests are not passing? I am getting AssertionError: Tensor-likes are not close! from sub-packages/bionemo-esm2/tests/bionemo/esm2/model/test_convert.py

@pstjohn
Copy link
Collaborator

pstjohn commented Jan 24, 2025

@farhadrgh looks like you're changing the model comparison eval function; I'm guessing we're not getting the same values out. Maybe try running through a debugger to see where the change is happening?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants