Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with Input Data #28

Open
adityamishraaaa opened this issue Apr 13, 2024 · 2 comments
Open

Problem with Input Data #28

adityamishraaaa opened this issue Apr 13, 2024 · 2 comments

Comments

@adityamishraaaa
Copy link

Hello there!

I am trying to reproduce the results of your publication for my course project. However, I think there is some issue with the "Pascal: JPEGImages | SegmentationClass" data set. It keeps on giving the error "File not found". The complete error has been provided below:

FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

 warnings.warn(
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
[2024-04-13 19:40:45,183][INFO] - {'criterion': {'kwargs': {'use_weight': False}, 'type': 'CELoss'},
 'dataset': {'ignore_label': 255,
             'mean': [0.485, 0.456, 0.406],
             'n_sup': 662,
             'std': [0.229, 0.224, 0.225],
             'train': {'batch_size': 8,
                       'crop': {'size': [513, 513], 'type': 'rand'},
                       'data_list': './data/splitsall/pascal_u2pl/662/labeled.txt',
                       'data_root': './data/VOC2012',
                       'flip': True,
                       'rand_resize': [0.5, 2.0],
                       'resize_base_size': 500,
                       'strong_aug': {'flag_use_random_num_sampling': True,
                                      'num_augs': 3}},
             'type': 'pascal_semi',
             'val': {'batch_size': 1,
                     'data_list': './data/splitsall/pascal_u2pl/val.txt',
                     'data_root': './data/VOC2012'},
             'workers': 4},
 'exp_path': './exps/zrun_vocs_u2pl/voc_semi662',
 'log_path': './exps/zrun_vocs_u2pl/voc_semi662/log',
 'net': {'decoder': {'kwargs': {'dilations': [6, 12, 18],
                                'inner_planes': 256,
                                'low_conv_planes': 48},
                     'type': 'augseg.models.decoder.dec_deeplabv3_plus'},
         'ema_decay': 0.999,
         'encoder': {'kwargs': {'multi_grid': True,
                                'replace_stride_with_dilation': [False,
                                                                 False,
                                                                 True],
                                'zero_init_residual': True},
                     'pretrain': './pretrained/resnet101.pth',
                     'type': 'augseg.models.resnet.resnet101'},
         'num_classes': 21,
         'sync_bn': True},
 'save_path': './exps/zrun_vocs_u2pl/voc_semi662/checkpoints',
 'saver': {'pretrain': '', 'snapshot_dir': 'checkpoints', 'use_tb': False},
 'trainer': {'epochs': 80,
             'evaluate_student': True,
             'lr_scheduler': {'kwargs': {'power': 0.9}, 'mode': 'poly'},
             'optimizer': {'kwargs': {'lr': 0.001,
                                      'momentum': 0.9,
                                      'weight_decay': 0.0001},
                           'type': 'SGD'},
             'sup_only_epoch': 0,
             'unsupervised': {'flag_extra_weak': False,
                              'loss_weight': 1.0,
                              'threshold': 0.95,
                              'use_cutmix': True,
                              'use_cutmix_adaptive': True,
                              'use_cutmix_trigger_prob': 1.0}}}
[Info] Load ImageNet pretrain from './pretrained/resnet101.pth' 
missing_keys:  [] 
unexpected_keys:  ['fc.weight', 'fc.bias']
[Info] Load ImageNet pretrain from './pretrained/resnet101.pth' 
missing_keys:  [] 
unexpected_keys:  ['fc.weight', 'fc.bias']
[2024-04-13 19:40:55,377][INFO] - # samples: 662
[2024-04-13 19:40:55,390][INFO] - # samples: 9920
[2024-04-13 19:40:55,396][INFO] - # samples: 1449
[2024-04-13 19:40:55,396][INFO] - Get loader Done...
[Info] Load ImageNet pretrain from './pretrained/resnet101.pth' 
missing_keys:  [] 
unexpected_keys:  ['fc.weight', 'fc.bias']
[Info] Load ImageNet pretrain from './pretrained/resnet101.pth' 
missing_keys:  [] 
unexpected_keys:  ['fc.weight', 'fc.bias']
[2024-04-13 19:40:58,584][INFO] - -------------------------- start training --------------------------
Traceback (most recent call last):
  File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 591, in <module>
    main(args)
  File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 172, in main
    res_loss_sup, res_loss_unsup = train(
  File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 301, in train
    _, image_u_weak, image_u_aug, _ = loader_u_iter.next()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
    return self._process_data(data)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/DATA2/dse316/grp_007/augseg/augseg/dataset/pascal_voc.py", line 63, in __getitem__
    label = self.img_loader(label_path, "L")
  File "/DATA2/dse316/grp_007/augseg/augseg/dataset/base.py", line 44, in img_loader
    with open(path, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: './data/VOC2012/SegmentationClassAug/2008_006330.png'

Traceback (most recent call last):
  File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 591, in <module>
    main(args)
  File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 172, in main
    res_loss_sup, res_loss_unsup = train(
  File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 301, in train
    _, image_u_weak, image_u_aug, _ = loader_u_iter.next()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
    return self._process_data(data)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/DATA2/dse316/grp_007/augseg/augseg/dataset/pascal_voc.py", line 63, in __getitem__
    label = self.img_loader(label_path, "L")
  File "/DATA2/dse316/grp_007/augseg/augseg/dataset/base.py", line 44, in img_loader
    with open(path, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: './data/VOC2012/SegmentationClassAug/2008_000085.png'

Exception in thread Thread-1 (_pin_memory_loop):
Traceback (most recent call last):
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop
    r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 297, in rebuild_storage_fd
    fd = df.detach()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/resource_sharer.py", line 86, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 508, in Client
    answer_challenge(c, authkey)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 752, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 686256) of binary: /home/dse316/miniconda3/envs/grp_007/bin/python
Traceback (most recent call last):
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/run.py", line 752, in run
    elastic_launch(
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
./train_semi.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2024-04-13_19:41:03
  host      : pragyan
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 686257)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-04-13_19:41:03
  host      : pragyan
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 686256)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Anticipating a positive response. Please cross check the data source file contains all the files and the link on the GitHub repository is correct.

@ZhenZHAO
Copy link
Owner

Hi, please use this link to prepare the labels,

@adityamishraaaa
Copy link
Author

Thank you very much for sharing the link!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants