Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubios discover fails after restart on UDM #206

Open
cleidich opened this issue Jan 6, 2025 · 8 comments
Open

Ubios discover fails after restart on UDM #206

cleidich opened this issue Jan 6, 2025 · 8 comments

Comments

@cleidich
Copy link

cleidich commented Jan 6, 2025

I've noticed that discovery of client names from UnifiOs/Ubios fails on the first run of ctrld after my UDM is restarted. This has been happening since at least v1.3.10 of ctrld and v4.0.20 of UnifiOS; although it may have been occurring for longer and I just didn't notice it. Unfortunately I can't confirm the exact start date/versions.

When ctrld starts after a device restart (either a true power off/power on or just a software reboot), I do see an error logged about Ubios discovery failing to initialize:
{"level":"error","error":"exit status 1","time":"2025-01-06T13:45:20-05:00.759","message":"could not init Ubios discover"}

If I restart the ctrld daemon, UnifiOS discovery will work properly. I'm positing that this is just a case of the discovery init occurring too early in the boot process although I can't be certain. Perhaps a fix is to just have the init re-attempt every x minutes until successful, etc.

This doesn't cause any resolution issues, but does mean that I get some duplicate clients for that endpoint as the clients get reported using another discovered name (usually via PTR or mDNS) until I restart ctrld, at which point they report using their UbiOS names again.

A full debug log after a recent restart is below; I don't see a lot that says why the discovery init is failing but I am happy to do additional testing and debugging if it helps.

Full log

{"level":"info","time":"2025-01-06T13:45:16-05:00.141","message":"starting ctrld v1.3.11"}                              {"level":"info","time":"2025-01-06T13:45:16-05:00.261","message":"os: linux Debian GNU/Linux 11 (bullseye) (4.19.152-ui-alpine)"}                                                                                                               {"level":"debug","time":"2025-01-06T13:45:20-05:00.433","message":"control server started: /var/run/ctrld_control.sock"}
{"level":"debug","time":"2025-01-06T13:45:20-05:00.435","message":"resolving \"dns.controld.com\" using bootstrap DNS [\"76.76.2.22:53\" \"[fe80::201:5cff:fe6b:2a46]:53\"]"}                                                                   {"level":"error","error":"dial udp 76.76.2.22:53: connect: network is unreachable\ndial udp [fe80::201:5cff:fe6b:2a46]:53: connect: invalid argument","time":"2025-01-06T13:45:20-05:00.436","message":"could not lookup \"AAAA\" record for domain \"dns.controld.com\""}                                                                                              {"level":"error","error":"dial udp [fe80::201:5cff:fe6b:2a46]:53: connect: invalid argument\ndial udp 76.76.2.22:53: connect: network is unreachable","time":"2025-01-06T13:45:20-05:00.436","message":"could not lookup \"A\" record for domain \"dns.controld.com\""}                                                                                                 {"level":"warn","time":"2025-01-06T13:45:20-05:00.436","message":"could not resolve bootstrap IPs, retrying..."}        {"level":"debug","time":"2025-01-06T13:45:20-05:00.446","message":"resolving \"dns.controld.com\" using bootstrap DNS [\"76.76.2.22:53\" \"[fe80::201:5cff:fe6b:2a46]:53\"]"}                                                                   {"level":"error","error":"dial udp 76.76.2.22:53: connect: network is unreachable\ndial udp [fe80::201:5cff:fe6b:2a46]:53: connect: invalid argument","time":"2025-01-06T13:45:20-05:00.446","message":"could not lookup \"AAAA\" record for domain \"dns.controld.com\""}
{"level":"error","error":"dial udp 76.76.2.22:53: connect: network is unreachable\ndial udp [fe80::201:5cff:fe6b:2a46]:53: connect: invalid argument","time":"2025-01-06T13:45:20-05:00.446","message":"could not lookup \"A\" record for domain \"dns.controld.com\""}
{"level":"warn","time":"2025-01-06T13:45:20-05:00.447","message":"could not resolve bootstrap IPs, retrying..."}        {"level":"debug","time":"2025-01-06T13:45:20-05:00.481","message":"resolving \"dns.controld.com\" using bootstrap DNS [\"76.76.2.22:53\" \"[fe80::201:5cff:fe6b:2a46]:53\"]"}
{"level":"debug","time":"2025-01-06T13:45:20-05:00.521","message":"got answer from nameserver: 76.76.2.22"}
{"level":"debug","time":"2025-01-06T13:45:20-05:00.545","message":"got answer from nameserver: 76.76.2.22"}
{"level":"debug","time":"2025-01-06T13:45:20-05:00.545","message":"bootstrap IPs: [2606:1a40::22 76.76.2.22]"}
{"level":"info","time":"2025-01-06T13:45:20-05:00.545","message":"bootstrap IPs for upstream.0: [\"2606:1a40::22\" \"76.76.2.22\"]"}
{"level":"debug","time":"2025-01-06T13:45:20-05:00.547","message":"resolving \"dns.controld.com\" using bootstrap DNS [\"76.76.2.22:53\" \"[fe80::201:5cff:fe6b:2a46]:53\"]"}
{"level":"debug","time":"2025-01-06T13:45:20-05:00.574","message":"got answer from nameserver: 76.76.2.22"}
{"level":"debug","time":"2025-01-06T13:45:20-05:00.594","message":"got answer from nameserver: 76.76.2.22"}
{"level":"debug","time":"2025-01-06T13:45:20-05:00.594","message":"bootstrap IPs: [2606:1a40::22 76.76.2.22]"}
{"level":"info","time":"2025-01-06T13:45:20-05:00.594","message":"bootstrap IPs for upstream.1: [\"2606:1a40::22\" \"76.76.2.22\"]"}
{"level":"info","bootstrap_ip":"8.8.8.8","time":"2025-01-06T13:45:20-05:00.594","message":"using bootstrap IP for upstream.2"}
{"level":"info","time":"2025-01-06T13:45:20-05:00.594","message":"starting DNS server on listener.0: 0.0.0.0:5354"}
{"level":"debug","time":"2025-01-06T13:45:20-05:00.595","message":"router setup on start"}
{"level":"debug","time":"2025-01-06T13:45:20-05:00.595","message":"start checking DNS loop"}
{"level":"debug","time":"2025-01-06T13:45:20-05:00.595","message":"skipping external: upstream.0"}
{"level":"debug","time":"2025-01-06T13:45:20-05:00.595","message":"skipping external: upstream.1"}
{"level":"debug","time":"2025-01-06T13:45:20-05:00.595","message":"skipping external: upstream.2"}
{"level":"debug","time":"2025-01-06T13:45:20-05:00.595","message":"end checking DNS loop"}
{"level":"debug","iface":"lo","time":"2025-01-06T13:45:20-05:00.604","message":"Restoring DNS for interface"}
{"level":"error","error":"exit status 1","time":"2025-01-06T13:45:20-05:00.759","message":"could not init Ubios discover"}
@ericnixmd
Copy link

Probably related to this: #196

With my EFG, it sometimes doesn't provide custom names (from Network) and sometimes it does.

@yegors
Copy link
Contributor

yegors commented Feb 12, 2025

Issue reproduced. Will fix.

@yegors
Copy link
Contributor

yegors commented Feb 14, 2025

You can try running the dev version using this install command, it should resolve the problem.

sh -c 'sh -c "$(curl -sSL https://api.controld.dev/dl)"'

@ericnixmd
Copy link

ericnixmd commented Feb 14, 2025 via email

@yegors
Copy link
Contributor

yegors commented Feb 14, 2025

How do you install it on a shadow device? I thought there is no SSH there.

We have the ability to view chat logs, but we never do unless we're troubleshooting something.

@ericnixmd
Copy link

I'm going to unplug the primary console and install via SSH on shadow while primary is down. Probably best way to install it.

@cleidich
Copy link
Author

cleidich commented Mar 1, 2025

Sorry for the delay in responding but I did have a chance to try this out with the dev version (installed today).

After a restart of the hardware I still get a ubios discover failure, but now there's more detail. Looks like it might just be taking too long for the built-in Mongo instance to come up?

-- Boot f353f0b0a60440ac84120bf1a0f4eb79 --
Mar 01 12:52:15 CL-UDM-Home systemd[1]: Started ctrld.service.
Mar 01 12:52:16 CL-UDM-Home ctrld[5297]: Mar  1 12:52:16.000 WRN unable to create log ipc connection error="dial unix /var/run/ctrld_start.sock: connect: no such file or directory"
Mar 01 12:52:23 CL-UDM-Home ctrld[5297]: Mar  1 12:52:23.792 WRN failed to init Ubios discover error="out: 2025-03-01T12:52:23.780-0500 W NETWORK  [thread1] Failed to connect to 127.0.0.1:27117, in(checking socket for error after poll), reason: Connection refused\n2025-03-01T12:52:23.781-0500 E QUERY    [thread1] Error: couldn't connect to server localhost:27117, connection attempt failed :\nconnect@src/mongo/shell/mongo.js:275:13\n@(connect):1:21\nexception: connect failed\n, err: exit status 1"

I also get that "unable to create log ipc connection error" on every start of the service, not just on hardware reboots; but I can't see that it affects anything.

@cuonglm
Copy link
Collaborator

cuonglm commented Mar 2, 2025

Sorry for the delay in responding but I did have a chance to try this out with the dev version (installed today).

After a restart of the hardware I still get a ubios discover failure, but now there's more detail. Looks like it might just be taking too long for the built-in Mongo instance to come up?

-- Boot f353f0b0a60440ac84120bf1a0f4eb79 --
Mar 01 12:52:15 CL-UDM-Home systemd[1]: Started ctrld.service.
Mar 01 12:52:16 CL-UDM-Home ctrld[5297]: Mar  1 12:52:16.000 WRN unable to create log ipc connection error="dial unix /var/run/ctrld_start.sock: connect: no such file or directory"
Mar 01 12:52:23 CL-UDM-Home ctrld[5297]: Mar  1 12:52:23.792 WRN failed to init Ubios discover error="out: 2025-03-01T12:52:23.780-0500 W NETWORK  [thread1] Failed to connect to 127.0.0.1:27117, in(checking socket for error after poll), reason: Connection refused\n2025-03-01T12:52:23.781-0500 E QUERY    [thread1] Error: couldn't connect to server localhost:27117, connection attempt failed :\nconnect@src/mongo/shell/mongo.js:275:13\n@(connect):1:21\nexception: connect failed\n, err: exit status 1"

I also get that "unable to create log ipc connection error" on every start of the service, not just on hardware reboots; but I can't see that it affects anything.

Did you re-install the service or just upgrading new binary?

You need to re-install ctrld service, so new dependency setup could take effects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants