can not get the relative error which is less than e-3. #5

ge1mina023 · 2023-04-15T15:05:38Z

I am learning the Transformer_Captioning.ipynb in assignment3. After I run the cell of testing MultiHeadAttention, I get some incorrect results:

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

I even copied your MultiHeadAttention code. But, I still get the same result:

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

I even downloaded your assignment3 code, and I still get the same output.

Is there anything else I missed?

The text was updated successfully, but these errors were encountered:

mantasu · 2023-04-16T17:34:11Z

I am not sure why you get these values, I just tried running a fresh copy of this repository on Google Colab and everything worked fine. It could be a problem with the environment. Can you try cloning a fresh repository and then run Transformer_Captioning.ipynb (with only changes to the first code cell)? Or maybe you can try running in Colab?

ge1mina023 · 2023-04-17T03:37:52Z

I am not sure why you get these values, I just tried running a fresh copy of this repository on Google Colab and everything worked fine. It could be a problem with the environment. Can you try cloning a fresh repository and then run Transformer_Captioning.ipynb (with only changes to the first code cell)? Or maybe you can try running in Colab?

Yeah. That what I did(with only changes to the first code cell) locally. I guess It's environment issue. My configuration: MacBook Pro M1、Python 3.8.12、conda 4.11.0、torch 2.1.0.dev20230415、macOS Monterey Version 12.5

mantasu · 2023-04-17T06:02:18Z

Maybe you can try Python 3.10 and PyTorch 2.0? Colab currently uses 3.9.16 and 2.0.0

ge1mina023 · 2023-04-17T09:36:57Z

Maybe you can try Python 3.10 and PyTorch 2.0? Colab currently uses 3.9.16 and 2.0.0

I updated the python and PyTorch to 3.9.16 and 2.0.0 respectively， and pip install -r requirement.txt for me locally.Because it has some errors, so, I changed some version numbers and installed it successfully.
When all of this is done, I also got the same results

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

mantasu · 2023-04-18T21:07:56Z

I see, I will try running this in my local environment and I'll let you know. A temporary workaround for now would be to run it on Colab

ge1mina023 · 2023-04-19T14:07:54Z

I see, I will try running this in my local environment and I'll let you know. A temporary workaround for now would be to run it on Colab

Because of some policy reasons, Colab is not convenient for me~

mantasu · 2023-04-24T12:39:18Z

I tried in my local environment and still everything seems fine. Can you reproduce the following:

Clone a completely fresh repository:

git clone https://github.com/mantasu/cs231n

Install PyTorch as described here

Install further requirements:

pip install h5py numpy matplotlib imageio ipykernel

Change the first code cell in assignment3/Transformer_Captioning.ipynb to
```
%cd cs231n/datasets/
!bash get_datasets.sh
%cd ../../
```
Run the cells

ge1mina023 · 2023-05-03T03:51:01Z

I tried in my local environment and still everything seems fine. Can you reproduce the following:
Clone a completely fresh repository:
git clone https://github.com/mantasu/cs231n
Install PyTorch as described here
Install further requirements:
pip install h5py numpy matplotlib imageio ipykernel
Change the first code cell in assignment3/Transformer_Captioning.ipynb to
%cd cs231n/datasets/
!bash get_datasets.sh
%cd ../../
Run the cells

I have finished all steps of yours(only Jupyter was installed extra), and I have python 3.9.16 and PyTorch 2.0.
But, it's still doesn't work.
I still get

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

in my first computable cell.

mantasu · 2023-05-05T20:20:54Z

Hmm, do you only get that in Transformer_Captioning.ipynb? Also, have you tried other solutions (e.g., from other repositories)?

ge1mina023 · 2023-05-07T16:26:12Z

Hmm, do you only get that in Transformer_Captioning.ipynb? Also, have you tried other solutions (e.g., from other repositories)?

Yeah, I have tried other solutions. And only Transformer_Captioning.ipynb has this error in assignment3.

mantasu · 2023-05-07T20:24:09Z

If even other solutions don't work, it's most likely a problem with the environment. I would suggest trying the following:

Reinstall Python and Conda
Running on another device
Running on a server, e.g., Kaggle, Azure Notebook, SageMaker, CoCalc. I would still suggest using Colab, e.g., with VPN
Running on a virtual machine

ge1mina023 · 2023-05-21T11:25:06Z

If even other solutions don't work, it's most likely a problem with the environment. I would suggest trying the following:

Reinstall Python and Conda

Running on another device

Running on a server, e.g., Kaggle, Azure Notebook, SageMaker, CoCalc. I would still suggest using Colab, e.g., with VPN

Running on a virtual machine

Thanks, I will try.

xiaoyatang · 2023-07-05T23:07:23Z

If even other solutions don't work, it's most likely a problem with the environment. I would suggest trying the following:

Reinstall Python and Conda

Running on another device

Running on a server, e.g., Kaggle, Azure Notebook, SageMaker, CoCalc. I would still suggest using Colab, e.g., with VPN

Running on a virtual machine

Thanks, I will try.
Hi, have you solved the issue? i just met the same one and solved. I would say your error is due to calculation itself rather than environment. It is not like an issue of computational precision. My suggestion is that make sure you follow the right order , querykey->mask->softmax->attn_dropout-> attnvalue->projection.

putskan · 2023-12-16T12:25:06Z

If even other solutions don't work, it's most likely a problem with the environment. I would suggest trying the following:

Reinstall Python and Conda

Running on another device

Running on a server, e.g., Kaggle, Azure Notebook, SageMaker, CoCalc. I would still suggest using Colab, e.g., with VPN

Running on a virtual machine

Thanks, I will try.
Hi, have you solved the issue? i just met the same one and solved. I would say your error is due to calculation itself rather than environment. It is not like an issue of computational precision. My suggestion is that make sure you follow the right order , query_key->mask->softmax->attn_dropout-> attn_value->projection.

Had the same problem with my solution, and using this repo's solution also gave me the exact same error.
Are you sure it's a calculation error?

Also using M1. I don't think it's a coincidence.
Or are you saying the repo's solution is also inaccurate?

PeterHUistyping · 2024-12-09T16:48:27Z

Also having such problem with Mac M1 silicon. The reason is due to the incompleteness of PyTorch reproducibility across different platforms (see details at the below link).

completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds.

By step by step debugging, one potential reason is the nn.Dropout(), also reported at Dropout isn't deterministic between Intel and M1 MacBooks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can not get the relative error which is less than e-3. #5

can not get the relative error which is less than e-3. #5

ge1mina023 commented Apr 15, 2023 •

edited

Loading

mantasu commented Apr 16, 2023

ge1mina023 commented Apr 17, 2023

mantasu commented Apr 17, 2023

ge1mina023 commented Apr 17, 2023 •

edited

Loading

mantasu commented Apr 18, 2023

ge1mina023 commented Apr 19, 2023

mantasu commented Apr 24, 2023

ge1mina023 commented May 3, 2023 •

edited

Loading

mantasu commented May 5, 2023

ge1mina023 commented May 7, 2023 •

edited

Loading

mantasu commented May 7, 2023

ge1mina023 commented May 21, 2023

xiaoyatang commented Jul 5, 2023

putskan commented Dec 16, 2023 •

edited

Loading

PeterHUistyping commented Dec 9, 2024 •

edited

Loading

can not get the relative error which is less than e-3. #5

can not get the relative error which is less than e-3. #5

Comments

ge1mina023 commented Apr 15, 2023 • edited Loading

mantasu commented Apr 16, 2023

ge1mina023 commented Apr 17, 2023

mantasu commented Apr 17, 2023

ge1mina023 commented Apr 17, 2023 • edited Loading

mantasu commented Apr 18, 2023

ge1mina023 commented Apr 19, 2023

mantasu commented Apr 24, 2023

ge1mina023 commented May 3, 2023 • edited Loading

mantasu commented May 5, 2023

ge1mina023 commented May 7, 2023 • edited Loading

mantasu commented May 7, 2023

ge1mina023 commented May 21, 2023

xiaoyatang commented Jul 5, 2023

putskan commented Dec 16, 2023 • edited Loading

PeterHUistyping commented Dec 9, 2024 • edited Loading

ge1mina023 commented Apr 15, 2023 •

edited

Loading

ge1mina023 commented Apr 17, 2023 •

edited

Loading

ge1mina023 commented May 3, 2023 •

edited

Loading

ge1mina023 commented May 7, 2023 •

edited

Loading

putskan commented Dec 16, 2023 •

edited

Loading

PeterHUistyping commented Dec 9, 2024 •

edited

Loading