Friday, October 02, 2009

RHQ tip of the day: agent confused?

Darko asked the other day on the Jopr Irc channel:

I have an agent, which isn't collecting my cpu-load data any more. I also have many more problems with the agent. How can I reset the agent to collect all data again? It may have to do with the fact that I had a server crash on the system with the running agent. Since the crash the agent isn't working any more.

It looks like the internal state database of the agent got confused. When the agent is fully configured and working, it will save its inventory of locally managed resources (a part of the global inventory kept on the server) along with their measurement schedules in a local database.

The agent can use this database to start working on the next start even if the server is not reachable. Changes on the server will synchronized when the connection is up again.

Now back to the original question: To get the agent going again, you need to erase the bad inventory and sync with the server again. You can do this by passing option --purgedata to the agent commandline:

$ bin/rhq-agent.sh --purgedata

To see a full list of the agents command line options, you can pass the --help option to the agent start script.

No comments: