You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to reproduce the results of your publication for my course project. However, I think there is some issue with the "Pascal: JPEGImages | SegmentationClass" data set. It keeps on giving the error "File not found". The complete error has been provided below:
FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn(
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[2024-04-13 19:40:45,183][INFO] - {'criterion': {'kwargs': {'use_weight': False}, 'type': 'CELoss'},
'dataset': {'ignore_label': 255,
'mean': [0.485, 0.456, 0.406],
'n_sup': 662,
'std': [0.229, 0.224, 0.225],
'train': {'batch_size': 8,
'crop': {'size': [513, 513], 'type': 'rand'},
'data_list': './data/splitsall/pascal_u2pl/662/labeled.txt',
'data_root': './data/VOC2012',
'flip': True,
'rand_resize': [0.5, 2.0],
'resize_base_size': 500,
'strong_aug': {'flag_use_random_num_sampling': True,
'num_augs': 3}},
'type': 'pascal_semi',
'val': {'batch_size': 1,
'data_list': './data/splitsall/pascal_u2pl/val.txt',
'data_root': './data/VOC2012'},
'workers': 4},
'exp_path': './exps/zrun_vocs_u2pl/voc_semi662',
'log_path': './exps/zrun_vocs_u2pl/voc_semi662/log',
'net': {'decoder': {'kwargs': {'dilations': [6, 12, 18],
'inner_planes': 256,
'low_conv_planes': 48},
'type': 'augseg.models.decoder.dec_deeplabv3_plus'},
'ema_decay': 0.999,
'encoder': {'kwargs': {'multi_grid': True,
'replace_stride_with_dilation': [False,
False,
True],
'zero_init_residual': True},
'pretrain': './pretrained/resnet101.pth',
'type': 'augseg.models.resnet.resnet101'},
'num_classes': 21,
'sync_bn': True},
'save_path': './exps/zrun_vocs_u2pl/voc_semi662/checkpoints',
'saver': {'pretrain': '', 'snapshot_dir': 'checkpoints', 'use_tb': False},
'trainer': {'epochs': 80,
'evaluate_student': True,
'lr_scheduler': {'kwargs': {'power': 0.9}, 'mode': 'poly'},
'optimizer': {'kwargs': {'lr': 0.001,
'momentum': 0.9,
'weight_decay': 0.0001},
'type': 'SGD'},
'sup_only_epoch': 0,
'unsupervised': {'flag_extra_weak': False,
'loss_weight': 1.0,
'threshold': 0.95,
'use_cutmix': True,
'use_cutmix_adaptive': True,
'use_cutmix_trigger_prob': 1.0}}}
[Info] Load ImageNet pretrain from './pretrained/resnet101.pth'
missing_keys: []
unexpected_keys: ['fc.weight', 'fc.bias']
[Info] Load ImageNet pretrain from './pretrained/resnet101.pth'
missing_keys: []
unexpected_keys: ['fc.weight', 'fc.bias']
[2024-04-13 19:40:55,377][INFO] - # samples: 662
[2024-04-13 19:40:55,390][INFO] - # samples: 9920
[2024-04-13 19:40:55,396][INFO] - # samples: 1449
[2024-04-13 19:40:55,396][INFO] - Get loader Done...
[Info] Load ImageNet pretrain from './pretrained/resnet101.pth'
missing_keys: []
unexpected_keys: ['fc.weight', 'fc.bias']
[Info] Load ImageNet pretrain from './pretrained/resnet101.pth'
missing_keys: []
unexpected_keys: ['fc.weight', 'fc.bias']
[2024-04-13 19:40:58,584][INFO] - -------------------------- start training --------------------------
Traceback (most recent call last):
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 591, in <module>
main(args)
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 172, in main
res_loss_sup, res_loss_unsup = train(
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 301, in train
_, image_u_weak, image_u_aug, _ = loader_u_iter.next()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
data = self._next_data()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
return self._process_data(data)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/DATA2/dse316/grp_007/augseg/augseg/dataset/pascal_voc.py", line 63, in __getitem__
label = self.img_loader(label_path, "L")
File "/DATA2/dse316/grp_007/augseg/augseg/dataset/base.py", line 44, in img_loader
with open(path, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: './data/VOC2012/SegmentationClassAug/2008_006330.png'
Traceback (most recent call last):
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 591, in <module>
main(args)
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 172, in main
res_loss_sup, res_loss_unsup = train(
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 301, in train
_, image_u_weak, image_u_aug, _ = loader_u_iter.next()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
data = self._next_data()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
return self._process_data(data)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/DATA2/dse316/grp_007/augseg/augseg/dataset/pascal_voc.py", line 63, in __getitem__
label = self.img_loader(label_path, "L")
File "/DATA2/dse316/grp_007/augseg/augseg/dataset/base.py", line 44, in img_loader
with open(path, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: './data/VOC2012/SegmentationClassAug/2008_000085.png'
Exception in thread Thread-1 (_pin_memory_loop):
Traceback (most recent call last):
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/queues.py", line 122, in get
return _ForkingPickler.loads(res)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 297, in rebuild_storage_fd
fd = df.detach()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/resource_sharer.py", line 86, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 508, in Client
answer_challenge(c, authkey)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 752, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 686256) of binary: /home/dse316/miniconda3/envs/grp_007/bin/python
Traceback (most recent call last):
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
./train_semi.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2024-04-13_19:41:03
host : pragyan
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 686257)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-04-13_19:41:03
host : pragyan
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 686256)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Anticipating a positive response. Please cross check the data source file contains all the files and the link on the GitHub repository is correct.
The text was updated successfully, but these errors were encountered:
Hello there!
I am trying to reproduce the results of your publication for my course project. However, I think there is some issue with the "Pascal: JPEGImages | SegmentationClass" data set. It keeps on giving the error "File not found". The complete error has been provided below:
Anticipating a positive response. Please cross check the data source file contains all the files and the link on the GitHub repository is correct.
The text was updated successfully, but these errors were encountered: