Troubleshooting OSAD and jabberd
Open File Count Exceeded
The number of maximum files that a jabber user can open is lower than the number of connected clients. OSAD clients cannot contact the SUSE Manager Server, and jabberd requires long periods of time to respond on port 5222. Each client requires one permanently open TCP connection and each connection requires one file handler.
To resolve OSAD connection problems, edit such lines in /etc/security/limits.conf
:
jabber soft nofile <#clients + 100> jabber hard nofile <#clients + 1000>
Itwill vary according to your setup. For example, in the case of 5000 clients:
jabber soft nofile 5100 jabber hard nofile 6000
Ensure you also update the max_fds
parameter in /etc/jabberd/c2s.xml
.
For example: <max_fds>6000</max_fds>
The soft file limit is the limit of the maximum number of open files for a single process. In SUSE Manager the highest consuming process is c2s, which opens a connection per client. 100 additional files are added, here, to accommodate for any non-connection file that c2s requires to work correctly. The hard limit applies to all processes belonging to the jabber user, and accounts for open files from the router, s2s and sm processes additionally.
jabberd Database Corruption
SYMPTOMS
: After a disk is full error or a disk crash event, the jabberd
database may have become corrupted.
jabberd
may then fail starting Spacewalk services:
Starting spacewalk services... Initializing jabberd processes... Starting router done Starting sm startproc: exit status of parent of /usr/bin/sm: 2 failed Terminating jabberd processes...
/var/log/messages
shows more details:
jabberd/sm[31445]: starting up jabberd/sm[31445]: process id is 31445, written to /var/lib/jabberd/pid/sm.pid jabberd/sm[31445]: loading 'db' storage module jabberd/sm[31445]: db: corruption detected! close all jabberd processes and run db_recover jabberd/router[31437]: shutting down
CURE
: Remove the jabberd
database and restart.
jabberd
will automatically re-create the database.
Enter at the command prompt:
spacewalk-service stop rm -rf /var/lib/jabberd/db/* spacewalk-service start
Capturing XMPP Network Data for Debugging Purposes
If you are experiencing bugs regarding OSAD, it can be useful to dump network messages in order to help with debugging. The following procedures provide information on capturing data from both the client and server side.
-
Install the tcpdump package on the server as root:
zypper in tcpdump
-
Stop the OSA dispatcher and Jabber processes:
rcosa-dispatcher stop rcjabberd stop
-
Start data capture on port 5222:
tcpdump -s 0 port 5222 -w server_dump.pcap
-
Open a second terminal and start the OSA dispatcher and Jabber processes:
rcosa-dispatcher start rcjabberd start
-
Operate the server and clients so the bug you formerly experienced is reproduced.
-
When you have finished your capture re-open the first terminal and stop the data capture with CTRL+c.
-
Install the tcpdump package on your client as root:
zypper in tcpdump
-
Stop the OSA process:
rcosad stop
-
Begin data capture on port 5222:
tcpdump -s 0 port 5222 -w client_client_dump.pcap
-
Open a second terminal and start the OSA process:
rcosad start
-
Operate the server and clients so the bug you formerly experienced is reproduced.
-
When you have finished your capture re-open the first terminal and stop the data capture with CTRL+c.