Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflict in psconfig resolve with DNS alias #475

Closed
rhclopes opened this issue May 15, 2024 · 7 comments
Closed

Conflict in psconfig resolve with DNS alias #475

rhclopes opened this issue May 15, 2024 · 7 comments
Assignees

Comments

@rhclopes
Copy link

The command 'psconfig config remote add url' will fail to add a test if an ip address resolves to an alias. For example,

raullopes@JMQWPYG263DJ-CE ~ % host ps-slough-lat.perf.ja.net
ps-slough-lat.perf.ja.net has address 194.81.18.229
ps-slough-lat.perf.ja.net has IPv6 address 2001:630:3c:f803::a
raullopes@JMQWPYG263DJ-CE ~ % host 194.81.18.229
229.18.81.194.in-addr.arpa is an alias for 229.224/27.18.81.194.in-addr.arpa.
229.224/27.18.81.194.in-addr.arpa domain name pointer ps-slough-lat.perf.ja.net.

When we run

psconfig remote add "https://..."

No latencybg tests are generated for the host ps-slough-lat.perf.ja.net.

The problem is fixed by adding the following lines to /etc/hosts

194.81.18.229 ps-slough-lat.perf.ja.net
2001:630:3c:f803::a ps-slough-lat.perf.ja.net

and prioritising hosts in /etc/nsswitch.conf.

BTW, I think that traceroute and tracepath will segfault in the presence of that alias, even if the alias is well defined.

Raul

@timchown
Copy link

That is so weird, but we have seen it before. Would be really nice to understand what causes it.

@github-project-automation github-project-automation bot moved this to Ready in perfSONAR May 16, 2024
@arlake228 arlake228 self-assigned this May 17, 2024
@arlake228 arlake228 moved this from Ready to In Progress in perfSONAR May 20, 2024
@arlake228
Copy link
Contributor

Just adding some notes from our debug session on this last week:

  • To clarify, there is nothing wrong with the psconfig remote add command. It correctly adds the URL to the file. The problem is that once the URL is added that the psconfig pscheduler agent does not create the tests with the aliased names
  • Based on the logs we saw in debugging session is looked like the automatic address detection was finding the ps-slough-lat.perf.ja.net name, but still was not creating tests. I would like to confirm this locally, so for some reason even after it discovers the name the test was not matching.

@arlake228
Copy link
Contributor

arlake228 commented May 20, 2024

@rhclopes do you happen to have /var/log/perfsonar/psconfig-pscheduler-agent.log from a time that the issue occurred. Specifically looking for a line like the following (except with your addresses):

2024-05-20 19:29:45 INFO pid=4153354 prog=_run_start line=104 guid=f0274777-95bf-4e68-a077-8da5cf5a8a20 pscheduler_assist_url=https://localhost/pscheduler match_addresses=["ps-dev-staging-el9-tk2.c.esnet-perfsonar.internal", "10.128.15.219", "fe80::fedc:61d7:480c:7ed4"] msg=Auto-detected match addresses

IIRC when you were screen sharing ps-slough-lat.perf.ja.net was in the match_addresses list, but wanted to be sure. Also, prior to modifying /etc/hosts, was their an entry for 194.81.18.229 and/or 2001:630:3c:f803::a already?

@rhclopes
Copy link
Author

Andy,

The requested logs are attached.

psconfig-pscheduler-agent.log

I spent quite a few days looking into this problem. At some point I had the entries in /etc/hosts, removed, change to other values, and nothing was helping because there were other problems. Eventually, I closed on one problem left.: the DNS issue. I added the entries and changed switch,conf because I had seen a similar problem with perfsonar 5.0, traceroute (and my distributed storage) and I just followed the same solution.

@arlake228
Copy link
Contributor

Thanks! I think we are closer to the cause of the issue. It was detecting the name ps-slough-lat.ja.net instead of ps-slough-lat.perf.ja.net (the latter has .perf). Do you have any ideas where ps-slough-lat.ja.net might have been coming from? /etc/hosts maybe? At least currently I don't see that name in DNS.

@timchown
Copy link

The ps-slough-lat.ja.net is a previous name used for the interface, as was ps-slough-1g.ja.net. We were asked to move our network performance systems all under .perf.ja.net (which seemed quite reasonable) so that's the only name you should see now.
I recall Raul did a complete re-install so that name shouldn't be held or cached anywhere in the system or its configuration.
The problem seems to be around the aliasing in the reverse delegation?

arlake228 added a commit to perfsonar/psconfig that referenced this issue May 24, 2024
…ing in /etc/hosts if name existed. Now combines all sources.
@arlake228 arlake228 moved this from In Progress to In Review in perfSONAR May 24, 2024
@arlake228
Copy link
Contributor

This should be corrected now. We can re-open if surfaces again.

@github-project-automation github-project-automation bot moved this from In Review to Done in perfSONAR Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants