Skip to content

Debugging an SDK Agent

ruferp edited this page Apr 27, 2021 · 7 revisions

How to debug an EOS SDK agent

There are many different ways to debug your agent while it is running on the switch. This document will explore some of these options.

Manually running your agent

You do not have to use the daemon Agent CLI commands to start your agent. Instead you can drop into bash and manually run your agent, assuming you use an EOS release pre 4.17.0F (after that it was disabled for legal reasons, see also Create a Mount-Profile for details of what happens at the 4.17.0F juncture) or since 4.26.0 (legal restrictions removed):

switch> enable
switch# bash
bash# cd /mnt/flash
bash# ./MyAgent
[Warning] AGENT_PROCESS_NAME is not set; using agent name 'MyAgent'
...

You can ignore the AGENT_PROCESS_NAME warning. This variable is used to associate your agent with its collection of configuration and status in Sysdb (the collections that are accessed via the option name value foo CLI commands in the daemon submode). You may set this environment variable if you wish, otherwise it will try and pick a sane default.

To run your agent manually with tracing, you can can set the TRACE environment variable, like this bash# TRACE=EosSdk*,MyAgentTracing ./MyAgent.

If you want to use the cli commands to set options for your agent, or to show its status, then you should create your agent in the cli (just not unshut it) and when starting it from bash (manual unshut), provide as AGENT_PROCESS_NAME the daemon name you have chosen:

ol442(config)#daemon Hallo
ol442(config-daemon-Hallo)#exe /tmp/HelloWorld
ol442(config-daemon-Hallo)#option name value Rista
ol442(config-daemon-Hallo)#show daemon Hallo
Process: Hallo (shutdown)
Configuration:
Option       Value
------------ -----
name         Rista

No status data stored.

ol442(config-daemon-Hallo)#bash daemonize bash -c "AGENT_PROCESS_NAME=Hallo /tmp/HelloWorld"
ol442(config-daemon-Hallo)#show daemon
Agent: Hallo (shutdown)
Configuration:
Option       Value
------------ -----
name         Rista

Status:
Data           Value
-------------- ------------
greeting       Hello Rista!   <==== agent is running and has updated his status

Debugging Python agents

Python agents can be difficult to debug because whenever an exception escapes an on_* handler, Arista's event loop eats it up and spews out a difficult to follow backtrace. We recommend subclassing the eossdk_utils.EosSdkAgent, which will wrap all of your handlers in debugging code. If you run the agent from a TTY, this code will drop you into an interactive pdb (Python debugger) session when an error occurs. Otherwise, the agent will print out an informative backtrace in it's agent log file. You can see an example on how to use this class in the InotifyExample agent.

Using GDB for C++ agents

Besides enabling tracing, it may be useful to run your agent under GDB. To do this, first compile your agent with debugging symbols. If you're using g++, this you can pass the -g flag to add symbols during compilation.

Then, on the switch, you can simply run the agent using the GDB you know and love:

bash# gdb ./MyAgent
(gdb) set environment TRACE=EosSdk*,MyAgentTracing
(gdb) b MyAgent.cpp:38
Breakpoint 1 at 0x804d469: file MyAgent.cpp, line 38.
(gdb) run
...

If you're running into an eos::panic message, it may be hard to understand where the agent is crashing, because the exception unwinds the stack until we reach the event loop. This can easily be debugged by installing a custom panic handler in your agent:

#include <csignal>                                       
#include <eos/panic.h>                                       

static void panic_handler(eos::error const & exception) {
   std::raise(SIGINT);                                   
   exception.raise();                                    
}                                                        

And then in your constructor or on_initialized() callback, add the line eos::exception_handler_is(panic_handler);. You'll now get an interrupt whenever a panic is triggered, making debugging your agent via GDB extremely simple.

If you cannot run it manually

This is unfortunately the new world... but you can still attach gdb to your agent after starting it through the daemon cli command. If you need to debug right at initialization time, then just add volatile int stop = 1; while(stop); right in your 'main', then attach gdb and set variable stop = 0 and continue with your debugging task. Or even simpler and more work-conserving: just add this kill(getpid(),19); into your code, then attach gdb.

Note that if an agent misses to have a heartbeat, it will get restarted by ProcMgr. To prevent this from happening while the agent is halted in gdb, run this: bash sudo service ProcMgr stoppm. When done, run bash sudo service ProcMgr start (so that crashing agents do get restarted, although now with a delay because the new ProcMgr is no longer their parent).