Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can not get the relative error which is less than e-3. #5

Open
ge1mina023 opened this issue Apr 15, 2023 · 15 comments
Open

can not get the relative error which is less than e-3. #5

ge1mina023 opened this issue Apr 15, 2023 · 15 comments

Comments

@ge1mina023
Copy link

ge1mina023 commented Apr 15, 2023

I am learning the Transformer_Captioning.ipynb in assignment3. After I run the cell of testing MultiHeadAttention, I get some incorrect results:

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

I even copied your MultiHeadAttention code. But, I still get the same result:

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

I even downloaded your assignment3 code, and I still get the same output.

Is there anything else I missed?

@mantasu
Copy link
Owner

mantasu commented Apr 16, 2023

I am not sure why you get these values, I just tried running a fresh copy of this repository on Google Colab and everything worked fine. It could be a problem with the environment. Can you try cloning a fresh repository and then run Transformer_Captioning.ipynb (with only changes to the first code cell)? Or maybe you can try running in Colab?

@ge1mina023
Copy link
Author

I am not sure why you get these values, I just tried running a fresh copy of this repository on Google Colab and everything worked fine. It could be a problem with the environment. Can you try cloning a fresh repository and then run Transformer_Captioning.ipynb (with only changes to the first code cell)? Or maybe you can try running in Colab?

Yeah. That what I did(with only changes to the first code cell) locally. I guess It's environment issue. My configuration: MacBook Pro M1、Python 3.8.12、conda 4.11.0、torch 2.1.0.dev20230415、macOS Monterey Version 12.5

@mantasu
Copy link
Owner

mantasu commented Apr 17, 2023

Maybe you can try Python 3.10 and PyTorch 2.0? Colab currently uses 3.9.16 and 2.0.0

@ge1mina023
Copy link
Author

ge1mina023 commented Apr 17, 2023

Maybe you can try Python 3.10 and PyTorch 2.0? Colab currently uses 3.9.16 and 2.0.0

I updated the python and PyTorch to 3.9.16 and 2.0.0 respectively, and pip install -r requirement.txt for me locally.Because it has some errors, so, I changed some version numbers and installed it successfully.
When all of this is done, I also got the same results

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

@mantasu
Copy link
Owner

mantasu commented Apr 18, 2023

I see, I will try running this in my local environment and I'll let you know. A temporary workaround for now would be to run it on Colab

@ge1mina023
Copy link
Author

I see, I will try running this in my local environment and I'll let you know. A temporary workaround for now would be to run it on Colab

Because of some policy reasons, Colab is not convenient for me~

@mantasu
Copy link
Owner

mantasu commented Apr 24, 2023

I tried in my local environment and still everything seems fine. Can you reproduce the following:

  1. Clone a completely fresh repository:
    git clone https://github.com/mantasu/cs231n
  2. Install PyTorch as described here
  3. Install further requirements:
    pip install h5py numpy matplotlib imageio ipykernel
  4. Change the first code cell in assignment3/Transformer_Captioning.ipynb to
    %cd cs231n/datasets/
    !bash get_datasets.sh
    %cd ../../
  5. Run the cells

@ge1mina023
Copy link
Author

ge1mina023 commented May 3, 2023

I tried in my local environment and still everything seems fine. Can you reproduce the following:

  1. Clone a completely fresh repository:
    git clone https://github.com/mantasu/cs231n
  2. Install PyTorch as described here
  3. Install further requirements:
    pip install h5py numpy matplotlib imageio ipykernel
  4. Change the first code cell in assignment3/Transformer_Captioning.ipynb to
    %cd cs231n/datasets/
    !bash get_datasets.sh
    %cd ../../
  5. Run the cells

I have finished all steps of yours(only Jupyter was installed extra), and I have python 3.9.16 and PyTorch 2.0.
But, it's still doesn't work.
I still get

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

in my first computable cell.

@mantasu
Copy link
Owner

mantasu commented May 5, 2023

Hmm, do you only get that in Transformer_Captioning.ipynb? Also, have you tried other solutions (e.g., from other repositories)?

@ge1mina023
Copy link
Author

ge1mina023 commented May 7, 2023

Hmm, do you only get that in Transformer_Captioning.ipynb? Also, have you tried other solutions (e.g., from other repositories)?

Yeah, I have tried other solutions. And only Transformer_Captioning.ipynb has this error in assignment3.

@mantasu
Copy link
Owner

mantasu commented May 7, 2023

If even other solutions don't work, it's most likely a problem with the environment. I would suggest trying the following:

  • Reinstall Python and Conda
  • Running on another device
  • Running on a server, e.g., Kaggle, Azure Notebook, SageMaker, CoCalc. I would still suggest using Colab, e.g., with VPN
  • Running on a virtual machine

@ge1mina023
Copy link
Author

If even other solutions don't work, it's most likely a problem with the environment. I would suggest trying the following:

  • Reinstall Python and Conda
  • Running on another device
  • Running on a server, e.g., Kaggle, Azure Notebook, SageMaker, CoCalc. I would still suggest using Colab, e.g., with VPN
  • Running on a virtual machine

Thanks, I will try.

@xiaoyatang
Copy link

If even other solutions don't work, it's most likely a problem with the environment. I would suggest trying the following:

  • Reinstall Python and Conda
  • Running on another device
  • Running on a server, e.g., Kaggle, Azure Notebook, SageMaker, CoCalc. I would still suggest using Colab, e.g., with VPN
  • Running on a virtual machine

Thanks, I will try.
Hi, have you solved the issue? i just met the same one and solved. I would say your error is due to calculation itself rather than environment. It is not like an issue of computational precision. My suggestion is that make sure you follow the right order , querykey->mask->softmax->attn_dropout-> attnvalue->projection.

@putskan
Copy link

putskan commented Dec 16, 2023

If even other solutions don't work, it's most likely a problem with the environment. I would suggest trying the following:

  • Reinstall Python and Conda
  • Running on another device
  • Running on a server, e.g., Kaggle, Azure Notebook, SageMaker, CoCalc. I would still suggest using Colab, e.g., with VPN
  • Running on a virtual machine

Thanks, I will try.
Hi, have you solved the issue? i just met the same one and solved. I would say your error is due to calculation itself rather than environment. It is not like an issue of computational precision. My suggestion is that make sure you follow the right order , query_key->mask->softmax->attn_dropout-> attn_value->projection.

Had the same problem with my solution, and using this repo's solution also gave me the exact same error.
Are you sure it's a calculation error?

Also using M1. I don't think it's a coincidence.
Or are you saying the repo's solution is also inaccurate?

@PeterHUistyping
Copy link

PeterHUistyping commented Dec 9, 2024

Also having such problem with Mac M1 silicon. The reason is due to the incompleteness of PyTorch reproducibility across different platforms (see details at the below link).

completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds.

By step by step debugging, one potential reason is the nn.Dropout(), also reported at Dropout isn't deterministic between Intel and M1 MacBooks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants