Troubleshooting hang/crash issues in Adobe Media Server


-Requirements
 
Prerequisite knowledge
This article assumes that you have basic knowledge of using the Adobe Media Server and know how to run a Flash-based SWF client.
Additional required other products
  • An OSMF-based SWF client for playback (Learn about OSMF )
 
User level
Intermediate
 
Required products
 
In this article you will get guidance on procedures for troubleshooting Adobe Media Server(AMS) hang and crash issues. A hang can occur for many reasons, but often stems from a deadlock in application code, API code, or library code, and may result in functional failures of certain features or a complete freeze of a process. A general procedure to analyze a hang or crash issue would be to generate a hang/crash dump or stack trace of the process under inspection. However, due to the AMS Fiber system, which is enabled by default, the hang/crash dump may not provide a clean view of the callstack; so you'll want to turn Fibers off, re-run the test, and take a hang/crash dump.
 
Disabling fibers on Adobe Media Server
 
These instructions are common for all the analysis steps mentioned in this article. Adobe Media Server can be configured to run without fibers.
 
To disable fibers:
 
  • Open the Server.xml file in a text editor. The Server.xml file is located in the root_install/conf folder.
  • In the Server.xml file, add a new tag, , under Root/Server and set it to false. The section you need to add is highlighted in Figure 1.

 

<Root>

<Server>

<Fibers>false</Fibers>

</Server>

</Root>

 

Figure 1. Server.xml changes to disable fibers in AMS
Figure 1. Server.xml changes to disable fibers in AMS
 
   3. Restart the server.
 
Crashes on Windows
 
To take hang/crash dumps on Windows, use a utility called ADPlus. Install Microsoft Debugging Tools for Windows. You need to run ADPlus in monitoring mode, invoked with the process name rather than the pid. When a crash occurs, ADPlus will create a hang/crash dump in the output folder provided. Run the following command to generate hang/crash dumps:

adplus -crash -pn process_name –o output_path_crash_dump

 
Examples:

C:\Program Files\Debugging Tools for Windows>adplus -crash –pn AMSCore.exe –o c:\amsdumpsC:\Program Files\Debugging Tools for Windows>adplus -crash –pn AMSCore.exe –pn AMSEdge.exe –o c:\amsdumpscscript adplus.vbs -crash -pn AMSCore.exe -o c:\amsdumps

 
This script, once run, will wait for AMSCore to crash or quit; so if you shut down AMS in between, you will generate false dumps, which get generated while AMS is shutting down.
 
Keep monitoring output dump folder to observe any core crash. Share the output DMP files and your AMS log files with us for further analysis. To do this, file a bug here, and we will contact you for your files. If you are using custom plugins with AMS, then please provide their PDB files as well.
 
Crashes on Linux
 
When a process crashes, Linux automatically creates a core dump. Please check and make sure that you're running with permissions appropriate to generate a core dump. Also check if ulimit is set to unlimited. To check this, type the following command in the shell.
 
 

ulimit

 

It should provide the current systems settings. Set it to unlimited using the following command:
 
 

ulimit –c unlimited

 
Alternatively, you can edit the "server" script placed under the root_install_folder. Add the following:
 
 

ulimit –c unlimited

to remove the limit setting for core dumps. To be sure of your settings on a debug system, temporarily run AMS as root.
 
To run AMS as root, open the ams.ini file in a text editor. The ams.ini file is located in the root_install/conf folder. In the ams.ini file, set the following values (shown in Figure 2), and restart the server.
 
 

SERVER.PROCESS_UID = 0
SERVER.PROCESS_GID = 0

 
Figure 2. Configuration changes to run AMS as root
 
Figure 2. Configuration changes to run AMS as root
 
The next time any of the AMS processes crash, a core.#### file will be created. The string "####" will be the pid of the process that crashed. You can check from AMS logs which process crashed. Share your core dump files and your AMS log files with us for further analysis. To do this, file a bug here, and we will contact you for your files.
 
Hang issues on Windows
 
To take hang dumps on Windows, use a utility called adplus. Install Microsoft Debugging Tools for Windows.Please use the same architecture of ADPlus as that of the process you're running; in other words, if you are running a 32-bit process on a 64-bit machine, then install 32-bit ADPlus and take a hang dump.
 
Once you observe that a hang has occurred, please identify the pid of the hung process. Run the following command to generate hang dumps:

adplus -hang -p pid_of_hung_process –o output_path_hang_dumps

Example:

C:\Program Files\Debugging Tools for Windows (x86)>adplus -hang -p 9828 –o c:\amshangs

Please take two or three hang dumps of the hung process consecutively with an interval of 60 seconds.
 
If you don't know the pid of the hung process, then you can use the process name. Note: Multiple instances of a process may be running, and if you use this option, then a separate dump file will get generated for each instance. Hence this option is not preferred.
 

adplus -hang -pn process_name –o output_path_hang_dumps

Use Windows Explorer to navigate to the output folder. Share the output DMP files and your AMS log files with us for further analysis. To do this, file a bug here, and we will contact you for your files. If you are using custom plugins with AMS, then please provide their PDB files as well.
 
Hang issues on Linux
 
To troubleshoot hang issues on Linux, you need stack traces rather than hang dumps. To take stack traces on Linux, use a utility called ptrace or some similar utility. Install ptrace using the following command:

yum install ptrace

Once you identify that a hang has occurred, identify the pid of the hung process. Then run the following command to generate a stack trace:

pstack pid_of_hung_process > stack1.txt

 
The stack1.txt file should contain the call stack of all the threads running in the process.
 
Please take two or three stack traces of the hung process consecutively with an interval of 60 seconds. Share these stack traces and your AMS log files with us for further analysis. To do this, file a bug here, and we will contact you for your files.
 
In some cases, further analysis is required, for which you need to take a core dump. This is an invasive diagnostic technique and it will kill the process under observation. To generate a core dump from the running process, the easiest way is to execute the following command:
 

kill -6 pid_of_hung_process

Further steps are similar to those described in the section "Crashes on Linux."
 
Auto-invoke diagnostic script on AMSCore hang
 
Often, hang scenarios are not easily reproducible. The AMS system by default recognizes a hung AMSCore process and shuts it down, and then launches a new AMSCOre process. Sometimes the hung core will not shut down immediately. You can configure AMS to kill the hung core immediately via the <FastCoreShutdown> property. You can also configure AMS to invoke your custom diagnostic script, which can take a hang dump of the hung core.
 
Configuring FastCoreShutdown
To enable FastCoreShutdown, open the Server.xml file in a text editor. The Server.xml file is located in the root_install/conf folder. Navigate to the <Master> tag and add the following configuration tags, and then restart the server. The section you need to add is highlighted in Figure 3:
<Master> <!-- Enable fast shutdown of core. --> <FastCoreShutdown>true</FastCoreShutdown> <!-- How often to gc idle cores. --> <CoreGC>300</CoreGC> <!-- An idle core being gc'd is given at least this much to exit on its own. --> <CoreExitDelay>20</CoreExitDelay> </Master>
Figure 3. Configurations to enable fast core shutdown
Figure 3. Configurations to enable fast core shutdown
 
Configuration to invoke a diagnostic script
If you want to invoke a diagnostic script, first enable FastCoreShutdown as explained previously. To provide the appropriate path and arguments for the diagnostic process, open the Server.xml file in a text editor. The Server.xml file is located in the root_install/conf folder. Navigate to the <Master> tag and add the following configuration tags, and then restart the server. The section you need to add is highlighted in Figure 4:
<Master> <!-- Enable fast shutdown of core. --> <FastCoreShutdown>true</FastCoreShutdown> <!-- If a core hangs while shutdown and fast shutdown of core is enabled then you can run a diagnostic process to take statitics of shutting down core. Path is the absolute path to the executable: /usr/bin/perl Or c:\WINDOWS\system32\cmd.exe on Windows Args: Arguments to the process like the script to be executed: /tmp/diagnostic.sh -d Or /c E:\ams\Utils\shell\diagnostic.bat arg1 WaitTime: Time in seconds to wait for the diagnostic process to run before core is shutdown --> <DiagnosticProcess> <Path>c:\WINDOWS\system32\cmd.exe</Path> <Args>/c C:\ams\Utils\shell\diagnostic.bat arg1 </Args> <WaitTime>30</WaitTime> </DiagnosticProcess> <!-- How often to gc idle cores. --> <CoreGC>300</CoreGC> <!-- An idle core being gc'd is given at least this much to exit on its own. --> <CoreExitDelay>20</CoreExitDelay> </Master>
Figure 4. Configurations to launch a diagnostic process when AMSCore hangs
Figure 4. Configurations to launch a diagnostic process when AMSCore hangs
 
Sample code for diagnostic script on Windows
The code for this sample diagnostic.bat script for Windows is below; you may edit it as needed:
set filePath=E:\ams\Utils\shell\test.txt echo ======= >> %filePath% echo "%0" >> %filePath% echo "%1" >> %filePath% echo "%2" >> %filePath% time /T >> %filePath% adplus.exe -hang -p %2 -o c:\temp echo "==hang dump taken==" >> %filePath% time /T >> %filePath% exit
 
To validate the batch file is creating a dump, you can run it independently from a command prompt and pass it arg1 and the pid of any running process. It should create a dump.
 
Example:
 

C:\ams\Utils\shell>diagnostic.bat arg1 9912

 

Sample code for diagnostic script on Linux
 
Below is a simple perl script for Linux that assumes pstack is installed on your system. It will take multiple snapshots of a hung process.
 
#!/usr/bin/perl print "args @ARGV"; die "pstack not installed on system, please install and run again" unless ("" eq `pstack 2>&1|grep "command not found"`); $pid = $ARGV[1]; print "taking stack traces for $pid"; $count = 0; while($count < 3) { system( "pstack $pid > $pid-$count.txt" ); sleep(5); $count++; }
 
Reproducing hang scernarios for testing
You can easily reproduce an AMSCore hang scenario by playing a sample video on demand file. Then put a breakpoint in AMSCore using windbg and request same the same vod file to play again. This time, AMSCore will hang, and the master will invoke the batch file to create a dump.
 
You can even write a native program and provide its path in the <path> tag. It will receive the pid as arguments when invoked and can be used to launch ADPlus to take a hang dump or measure other system parameters, such as memory usage, CPU load, and so on.
 

Where to go from here

 

Where to go from here

 

You can get additional instructions, tools and documentation here:

 

Creative Commons License