-
Notifications
You must be signed in to change notification settings - Fork 39
Debugging an SDK Agent
There are many different ways to debug your agent while it is running on the switch. This document will explore some of these options.
You do not have to use the daemon Agent
CLI commands to start your agent. Instead you can drop into bash and manually run your agent, assuming you use an EOS release pre 4.17.0F (after that it was disabled for legal reasons, see also Create a Mount-Profile for details of what happens at the 4.17.0F juncture) or since 4.26.0 (legal restrictions removed):
switch> enable
switch# bash
bash# cd /mnt/flash
bash# ./MyAgent
[Warning] AGENT_PROCESS_NAME is not set; using agent name 'MyAgent'
...
You can ignore the AGENT_PROCESS_NAME warning. This variable is used to associate your agent with its collection of configuration and status in Sysdb (the collections that are accessed via the option name value foo
CLI commands in the daemon
submode). You may set this environment variable if you wish, otherwise it will try and pick a sane default.
To run your agent manually with tracing, you can can set the TRACE
environment variable, like this bash# TRACE=EosSdk*,MyAgentTracing ./MyAgent
.
If you want to use the cli commands to set options for your agent, or to show its status, then you should create your agent in the cli (just not unshut it) and when starting it from bash (manual unshut), provide as AGENT_PROCESS_NAME the daemon name you have chosen:
ol442(config)#daemon Hallo
ol442(config-daemon-Hallo)#exe /tmp/HelloWorld
ol442(config-daemon-Hallo)#option name value Rista
ol442(config-daemon-Hallo)#show daemon Hallo
Process: Hallo (shutdown)
Configuration:
Option Value
------------ -----
name Rista
No status data stored.
ol442(config-daemon-Hallo)#bash daemonize bash -c "AGENT_PROCESS_NAME=Hallo /tmp/HelloWorld"
ol442(config-daemon-Hallo)#show daemon
Agent: Hallo (shutdown)
Configuration:
Option Value
------------ -----
name Rista
Status:
Data Value
-------------- ------------
greeting Hello Rista! <==== agent is running and has updated his status
Python agents can be difficult to debug because whenever an exception escapes an on_*
handler, Arista's event loop eats it up and spews out a difficult to follow backtrace. We recommend subclassing the eossdk_utils.EosSdkAgent
, which will wrap all of your handlers in debugging code. If you run the agent from a TTY, this code will drop you into an interactive pdb
(Python debugger) session when an error occurs. Otherwise, the agent will print out an informative backtrace in it's agent log file. You can see an example on how to use this class in the InotifyExample
agent.
Besides enabling tracing, it may be useful to run your agent under GDB. To do this, first compile your agent with debugging symbols. If you're using g++
, this you can pass the -g
flag to add symbols during compilation.
Then, on the switch, you can simply run the agent using the GDB you know and love:
bash# gdb ./MyAgent
(gdb) set environment TRACE=EosSdk*,MyAgentTracing
(gdb) b MyAgent.cpp:38
Breakpoint 1 at 0x804d469: file MyAgent.cpp, line 38.
(gdb) run
...
If you're running into an eos::panic
message, it may be hard to understand where the agent is crashing, because the exception unwinds the stack until we reach the event loop. This can easily be debugged by installing a custom panic handler in your agent:
#include <csignal>
#include <eos/panic.h>
static void panic_handler(eos::error const & exception) {
std::raise(SIGINT);
exception.raise();
}
And then in your constructor or on_initialized()
callback, add the line eos::exception_handler_is(panic_handler);
. You'll now get an interrupt whenever a panic
is triggered, making debugging your agent via GDB extremely simple.
This is unfortunately the new world... but you can still attach gdb to your agent after starting it through the daemon
cli command. If you need to debug right at initialization time, then just add volatile int stop = 1; while(stop);
right in your 'main', then attach gdb and set variable stop = 0
and continue with your debugging task.
Or even simpler and more work-conserving: just add this kill(getpid(),19);
into your code, then attach gdb.
Note that if an agent misses to have a heartbeat, it will get restarted by ProcMgr. To prevent this from happening while the agent is halted in gdb, run this: bash sudo service ProcMgr stoppm
. When done, run bash sudo service ProcMgr start
(so that crashing agents do get restarted, although now with a delay because the new ProcMgr is no longer their parent).
Have a comment, question, or found a typo? Open an issue!