Astricon 10 just finished. It was nice to be back after missing it for the last 2 years. This time I shared my experiences with the Asterisk WebRTC implementation.
Find the presentation here: https://moythreads.com/congresos/astricon2013/
Also available on SlideShare: https://www.slideshare.net/MoisesSilva6/implementation-lessons-using-webrtc
One of the highlights of the presentation is that if you’re trying to use Asterisk for WebRTC using secure WebSockets (TLS) you may notice that the connection is not reliable (may not work, hangs, etc). This is now a known problem and I’ve posted some patches/branches that address that issue, follow the activity in the Asterisk bug tracker: https://issues.asterisk.org/jira/browse/ASTERISK-21930
Starting with Asterisk 12 you also need to install the pjproject stack to use WebRTC at all, otherwise, no errors are printed on calls but simply you may end up without audio (due to lack of ICE support if pjproject libraries are not instlalled/compiled and linked to Asterisk)
I’ve udpated the Asterisk wiki WebRTC instructions to add this very same warning.
https://wiki.asterisk.org/wiki/display/AST/Installing+pjproject
]]>
Find the presentations below:
– Elastix World 2011, Mexico D.F. Nov 3-4 – “Negociación de codecs en Asterisk”
– FSL, Vallarta, México. Nov 5 – “FreeSWITCH – Asterisk con esteroides”
– 4K Conference, Buenos Aires, Argentina, Nov 24 – “Manejo de medios en FreeSWITCH“
]]>What is a core dump?
Sometimes problems with FreeSWITCH, Asterisk or just about any other program in Linux are hard to debug by just looking at the logs. Sometimes the process crashes and you don’t have a chance to use the CLI to look at stats. The logs may not reveal anything particularly interesting, or just some minimal information. Sometimes you need to dig deeper into the state of the program to poke around and see what is going on inside. Linux process core dumps are meant for that.
The most typical case for a core dump is when a process dies violently and unexpectedly. For example, if a programmer does something like this:
*(int *)0 = 0;
Is usually not that straight forward, it may be the that the programmer did not expect certain variable to contain a NULL (0) value. The process is killed by the Linux kernel because by default, a Linux process does not map the memory address 0 to anything. This causes a page fault in the processor which is trapped by the kernel, the kernel then sees that the given process does not have anything mapped at address 0 and then sends the SIGSEGV (Unix signal) to the process. This is called a segmentation fault.
The default signal handler for SIGSEGV dumps the memory of the process and kills the process. This memory dump contains all the memory for this process (and all the threads belonging to that process) and that is what is used to determine what went wrong with the program that attempted to reference an invalid address (0 in this case, but any invalid address can cause it).
You can learn more about it here: http://en.wikipedia.org/wiki/Segmentation_fault
How can I make sure the core dump will be saved?
Each process has a limit for how big this core can be. If the limit is exceeded no core dump will be saved. By default this limit is 0!, that means no core will be dumped by default.
Before starting the process you must use “ulimit“. The “ulimit” command sets various limits for the current process. If you execute it from the bash shell, that means the limits are applied to your bash shell process. This also means any processes that you start from bash will inherit those limits (because they are child processes from your bash shell).
ulimit -a
That shows you all the limits for your bash shell. In order to guarantee that a core will be dumped you must set the “core file size” limit to “unlimited”.
ulimit -c unlimited
If you are starting the process from an init script or something like that, the init script has to do it. Some programs are smart enough to raise their limits themselves, but is always better to make sure you have unlimited core file size for your bash shell. You may then want to add those ulimit instructions inside your $HOME/.bashrc file.
Where is the core dump saved?
Each process has a “working directory”. See http://en.wikipedia.org/wiki/Working_directory
That is where the process core dump will be saved by default. However, some system-wide settings affect where the core is dumped.
“/proc/sys/kernel/core_pattern” and “/proc/sys/kernel/core_uses_pid” are 2 files that control the base file name pattern for the core, and whether the core name will be appended with the PID (process ID).
The recommended settings are:
mkdir -p /var/core
echo "/var/core/core" > /proc/sys/kernel/core_pattern
echo 1 > /proc/sys/kernel/core_uses_pid
You can confirm what you just did with:
cat /proc/sys/kernel/core_pattern
cat /proc/sys/kernel/core_uses_pid
This settings will cause any process in the system that crashes, to dump the core at:
/var/core/core.
What if I just want a core dump without killing the process?
In some situations, if a process becomes unresponsive or the response times are not ideal. For example, you try to execute CLI commands in Asterisk or FreeSWITCH but there is no output, or worst, your command line prompt gets stuck. The process is still there, it may be even processing calls, but some things are taking lot of time or just don’t get done. You can use “gdb” (The GNU debugger) to dump a core of the process without killing the process and almost with no disruption of the service. I say almost because for a large process, dumping a core may take a second or two, in that time the process is freezed by the kernel, so active calls may drop some audio (if you’re debugging some real time audio system like Asterisk or FreeSWITCH).
The trick to do it fast is to first create a file with the GDB commands required to dump the core.
Latest versions of CentOS include (with the gdb RPM package) the “gcore” command to do everything for you. You only need to execute:
gcore $(pidof freeswitch)
To dump the core of the running process.
If you are in a system that does not include gcore, you can do the following:
echo -ne "generate-core-file\ndetach\nquit" > gdb-instructions.txt
The 3 instructions added to the file are:
generate-core-file
detach
quit
This will do exactly what we want. Generate the core file for the attached process, then detach from the process (to let it continue) and then quit gdb.
You then use GDB (you may need to install it with “yum install gdb”) to attach to the running process, dump the core, and get out as fast as possible.
gdb /usr/local/freeswitch/bin/freeswitch $(pidof freeswitch) -x gdb-instructions.txt
The arguments to attach to the process include the original binary that was used to start it and the PID of the running process. The -x switch tells GDB to execute the commands in the file after attaching.
The core will be named core.<pid> by default and the path is not affected by the /proc/sys/kernel settings of the system.
This core can now be used by the developer to troubleshoot the problem.
Sometimes though, the developer will be more interested in a full back trace (stack trace), because the core dump itself can’t be easily examined in any other box than the one where was generated, therefore it might be up to you to provide that stack trace, which you can do with:
gdb /usr/local/freeswitch/bin/freeswitch core.
(gdb) set logging file my_back_trace.txt
(gdb) thread apply all bt full
(gdb) quit
Then send the file my_back_trace.txt to the developer, or analyze it yourself, sometimes is easy to spot the problems even without development experience!
]]>Presentation in PDF and PPT available here:
http://www.moythreads.com/congresos/astricon2010/asterisk-pri-passive-recording.pdf
http://www.moythreads.com/congresos/astricon2010/asterisk-pri-passive-recording.ppt
]]>As you can see, you will receive the line from the CPE and plug it into the PN 633 and then plug the line from the NET too. At the other side of the adapter you can pull out 2 cables and connect them to your box with Sangoma boards specially configured in high impedance mode, which basically means will be passive and not affect the behavior of your PRI link, passively will be monitoring the E1/T1 link.
As you can tell from the image, there is also 2 cables involved for a single link, this means that you need an A102 card to monitor just 1 E1/T1 link, because one port is used for tx from the CPE and the other for tx from the NET. Until today, it was not possible to use Asterisk because Asterisk assumes each card port is meant to send and receive data for a circuit, Asterisk was meant to be an active component of the circuit, not a passive element.
Now Asterisk can do that. I just submitted 2 patches to Digium’s bug tracker. One for LibPRI, the library that takes care of the Q921 and Q931 signaling and the other for chan_dahdi.c, the Asterisk channel driver used to connect it to PSTN circuits.
LibPRI needed to be modified because it does a lot of state machine checking when receiving a Q921/Q931 frame, for example, if receives a Q931 message “PROCEEDING”, is going to check that a call was already in place and a SETUP message was previously sent, when working as a passive element in the circuit no state checks should be done, we just need to decode the message and return a meaningful telephony event to Asterisk (like a RING event when the SETUP message is seen on the line).
https://issues.asterisk.org/view.php?id=15970
The most important change is this:
pri_event *pri_read_event(struct pri *pri)
{
char buf[1024];
int res;
res = pri->read_func ? pri->read_func(pri, buf, sizeof(buf)) : 0;
/* this check should be at some routine in q921.c */
/* at least 4 bytes of Q921 and at check buf[4] for Q931 Network packet */
if (res < 5 || ((int)buf[4] != 8)) {
return NULL;
}
res = q931_read_event(pri, (q931_h*)(buf + 4), res - 4 - 2 /* remove 4 bytes of Q921 h and 2 of CRC */);
if (res == -1) {
return NULL;
}
if (res & Q931_RES_HAVEEVENT) {
return &pri->ev;
}
return NULL;
}
/* here we trust receiving q931 only (no maintenance or anything else)*/
int q931_read_event(struct pri *ctrl, q931_h *h, int len)
{
q931_mh *mh;
q931_call *c;
int cref;
int missingmand;
//q931_dump(ctrl, h, len, 0);
mh = (q931_mh *)(h->contents + h->crlen);
cref = q931_cr(h);
c = q931_getcall(ctrl, cref & 0x7FFF);
if (!c) {
pri_error(ctrl, "Unable to locate call %d\n", cref);
return -1;
}
if (prepare_to_handle_q931_message(ctrl, mh, c)) {
return 0;
}
missingmand = 0;
if (q931_process_ies(ctrl, h, len, mh, c, &missingmand)) {
return -1;
}
return post_handle_q931_message(ctrl, mh, c, missingmand, 0);
}
The rest is pretty much just modify post_handle_q931_message to skip state machine checking when invoked with the last parameter as 0, which is only done from q931_read_event, that is the passive q931 processing routine I added.
Wondering how can I get a cell tower on my property now? Visit this site and find more information.
Asterisk needed to be modified because it needs to correlate that a RING event received, let’s say in span 1 and channel 1, is probably going to be related to a PROGRESS message in span 2 channel 1 (which is the other side of the connection replying to the SETUP Q931 message). Furthermore, once the PROGRESS message is received, Asterisk must launch a regular channel and then create a DAHDI conference (using DAHDI pseudo channels) to mix the audio transmitted by the CPE and NET and return this mixed audio as a single frame to any Asterisk application reading from the channel.
https://issues.asterisk.org/view.php?id=15971
Some interesting code snippet of the changes to Asterisk is where I create the DAHDI conference to do the audio mixing, which is later retrieved by the core via the channel driver interface interface dahdi_read().
This is done during dahdi_new() routine, when creating the new Asterisk channel.
if (i->tappingpeer) {
struct dahdi_confinfo dahdic = { 0, };
/* create the mixing conference
* some DAHDI_SETCONF interface rules to keep in mind
* confno == -1 means create new conference with the given confmode
* confno and confmode == 0 means remove the channel from its conference */
dahdic.chan = 0; /* means use current channel (the one the fd belongs to)*/
dahdic.confno = -1; /* -1 means create new conference */
dahdic.confmode = DAHDI_CONF_CONFMON | DAHDI_CONF_LISTENER | DAHDI_CONF_PSEUDO_LISTENER;
fd = dahdi_open("/dev/dahdi/pseudo");
if (fd < 0 || ioctl(fd, DAHDI_SETCONF, &dahdic)) {
ast_log(LOG_ERROR, "Unable to create dahdi conference for tapping\n");
ast_hangup(tmp);
i->owner = NULL;
return NULL;
}
i->tappingconf = dahdic.confno;
i->tappingfd = fd;
/* add both parties to the conference */
dahdic.chan = 0;
dahdic.confno = i->tappingconf;
dahdic.confmode = DAHDI_CONF_CONF | DAHDI_CONF_TALKER;
if (ioctl(i->subs[SUB_REAL].dfd, DAHDI_SETCONF, &dahdic)) {
ast_log(LOG_ERROR, "Unable to add chan to conference for tapping devices: %s\n", strerror(errno));
ast_hangup(tmp);
i->owner = NULL;
return NULL;
}
dahdic.chan = 0;
dahdic.confno = i->tappingconf;
dahdic.confmode = DAHDI_CONF_CONF | DAHDI_CONF_TALKER;
if (ioctl(i->tappingpeer->subs[SUB_REAL].dfd, DAHDI_SETCONF, &dahdic)) {
ast_log(LOG_ERROR, "Unable to add peer chan to conference for tapping devices: %s\n", strerror(errno));
ast_hangup(tmp);
i->owner = NULL;
return NULL;
}
ast_log(LOG_DEBUG, "Created tapping conference %d with fd %d between dahdi chans %d and %d for ast channel %s\n",
i->tappingconf,
i->tappingfd,
i->channel,
i->tappingpeer->channel,
tmp->name);
i->tappingpeer->owner = i->owner;
}
Later we sent the Asterisk channel to the dial plan.
/* from now on, reading from the conference has the mix of both tapped channels, we can now launch the pbx thread */
if (ast_pbx_start(c) != AST_PBX_SUCCESS) {
ast_log(LOG_ERROR, "Failed to launch PBX thread for passive channel %s\n", c->name);
ast_hangup(c);
}
break;
And finally when the Asterisk core requests a frame from chan_dahdi we return the mixed audio.
/* if we have a tap peer we must read the mixed audio */
if (p->tappingpeer) {
/* passive channel reading */
/* first read from the 2 involved dahdi channels just to consume their frames */
res = read(p->subs[idx].dfd, readbuf, p->subs[idx].linear ? READ_SIZE * 2 : READ_SIZE);
CHECK_READ_RESULT(res);
res = read(p->tappingpeer->subs[idx].dfd, readbuf, p->subs[idx].linear ? READ_SIZE * 2 : READ_SIZE);
CHECK_READ_RESULT(res);
/* now read the mixed audio that will be returned to the core */
res = read(p->tappingfd, readbuf, p->subs[idx].linear ? READ_SIZE * 2 : READ_SIZE);
} else {
/* no tapping peer, normal reading */
res = read(p->subs[idx].dfd, readbuf, p->subs[idx].linear ? READ_SIZE * 2 : READ_SIZE);
}
Finally, you can use regular dial plan Asterisk rules to Record() the conversation, Dial() to someone interested in auditing the call etc. Of course, any audio transmitted to this passive channel will be dropped, therefore using applications like Playback() in this channels just don’t make sense, you can still do it though, but all the audio will be silently dropped. Also remember this is a regular channel (that just happens to ignore any transmitted media or signaling), therefore you can still retrieve the ANI, DNID etc.
[from-pstn]
exten => s,1,Answer()
;exten => s,n,Dial(SIP/moy)
exten => s,n,Record(advanced-recording%d:wav)
exten => s,n,Hangup()
exten => _X.,1,Goto(s,1)
The configuration required is minimal. Sangoma board configuration is described here. It’s not much different than configuring a regular T1/E1 link though. In the Asterisk side, you just have to specify the parameter “tappingpeerpos=next” or “tappingpeerpos=prev” in chan_dahdi.conf to specify which is the peer tapping span for the current span. If you set “tappingpeerpos=no” or any other value for that matter, tapping will be disabled for that span (and then will be a regular active span).
I have the code already in 3 branches. One branch for libpri and 2 branches for Asterisk, one based on trunk and the other in Asterisk 1.6.2, keep in mind that the one from 1.6.2 has a slightly different configuration at this point, the parameter to enable tapping is “passive=yes”, this does not let you specify if the peer tapping device is the next or previous one, therefore assumes your tapping spans always start at an even number (0, 2, 4 etc), I will change that soon, hopefully …
http://svn.digium.com/svn/asterisk/team/moy/dahdi-tap-trunk
http://svn.digium.com/svn/asterisk/team/moy/dahdi-tap-1.6.2
http://svn.digium.com/svn/libpri/team/moy/tap-1.4
Now everytime a new call is detected you will receive a call that you can send wherever you want
Enjoy!
]]>http://lists.digium.com/pipermail/asterisk-dev/2009-March/037262.html (English)
http://www.saghul.net/blog/2009/03/18/sobre-el-modelo-de-desarrollo-de-asterisk/ (Spanish)
For those not involved in the Asterisk users and/or developers community, it all comes down to users complaining about Asterisk reliability and the new (1.6) development model that will allow new features to be introduced in dot releases which (they say) will make it worst. Telephony systems are damn critical. Users are used to see their computer crash (yeah, even Linux users, not that often but it happens). But telephony lines are very reliable circuits (basically because they’re very simple in nature and have been around a long time). Asterisk started a revolution and I don’t think anyone can deny that, it has brought a lot of flexibility to telephony systems, but that revolution comes of course with a cost.
I started playing with Asterisk at the beginning of 2004, I’ve seen segfaults, deadlocks and all kind of funny behaviours here and there. Sometimes a brand new release of Asterisk has basic functionality broken (like originate calls from the manager). That’s true as well, it can be said that Asterisk, out of the box, is not reliable (scalability is another beast I don’t want to talk about now), and naturally users don’t like that.
Having said that, I need to clarify what I mean by “reliable”. In this context, by reliable I mean, if you take 1.4.N release, deploy applications on top of it and then you blindly upgrade from 1.4.N to 1.4.(N+1) WITHOUT TESTING and put it in production expecting it to just work, well, good luck with that, and enjoy your new job flipping burgers at McDonalds. That is, Asterisk is not reliable for users who are not willing to put some effort when upgrading. Asterisk is a complex beast (and built on top of somewhat still shaky core), the Asterisk developers had been doing a great job improving the core, however, nasty hacks like masquerade are still there.
I don’t have problems with Asterisk being criticized, that’s what it will make it better in the end. I do have a problem though with all that people that are nothing but leeches of the community. Here is the kind of user I have a problem with:
1. They typically just ask questions in the mailing list, never spend time helping other users.
2. They download for free Asterisk and expect it to work out-of-the-box for their particular (profit) purpose without spending time not even doing their homework, testing their particular scenarios etc.
3. They bitch about Digium not fixing bugs in the bug-tracker, bugs that according to them are so damn critical to their business that they not even put a bounty on the voip-info wiki or in the asterisk-biz mailing list for someone to fix it.
4. They do not download beta releases or release candidates, they just wait for the “stable” release and again, expect things to magically work for them to profit.
In short, they don’t want to spend a single cent, nor spend some time out of their busy life, all they expect is their life to be free of problems and cash big bucks. One common argument for these users is that they are not developers and cannot help, that’s just bs, there is other ways to help and in the end you can always spend some of your free-asterisk-based business revenue to place bounties for the development of test beds or whatever you feel is needed to improve Asterisk.
At the end, I agree Asterisk development has to improve. But Digium does not has unlimited resources, and is already paying for a big team of development creating Asterisk and you can download it for free. Being free is not an excuse for lacking quality, but just think, there is limited resources and Digium is allocating those resources where it makes sense for its business. Type “core show warranty” in your Asterisk CLI and tell me what your warranty is.
In the other hand, those leech-users should know that not only users, but also some developers are unhappy with the development model (but different arguments than leech-users had). That’s why CallWeaver was born, and not only that. The biggest example I like to use of one of the foundations of open source (if you don’t like it, then fix it) is the FreeSwitch project.
The FreeSwitch project was born out of the discontent (to put it nicely) of one of the top developers of Asterisk: Anthony Minessale II, he complains a lot about Asterisk, yeah, but he also did a lot more for Asterisk than anyone else I know of beyond Mark himself and just a couple of the top developers of Asterisk. So, yes, from my perspective in some way he has earned the right to talk shit about Asterisk, because he knows it, he has proposed solutions and he has actually brought solutions: FreeSwitch. FreeSwitch has brought some serious competition to Asterisk (sorry, but let’s be serious, CallWeaver and Yate had never been close to match Asterisk functionality and usage, let alone GNU Bayonne).
FreeSwitch, from my perspective has solved many of the core problems that Asterisk has, it also solves the licensing issues (it’s MPL) and it’s developer friendly (since there is no business driving force yet behind it and you can get an svn branch right away). Probably FreeSwitch is not a short-term solution for those who are already hooked into Asterisk business, but in any case, at this point I don’t think there is short-term solutions for the problems Asterisk users complain about, it’s gonna cost ya baby, one way or another, and from my perspective that is the way is supposed to be.
In the end, Olle Johansson did quite good being the first in reply to that post and resumed the situation, which is not simple, but one thing is for sure, the community has to stop bitching and start doing. Well, I really don’t care if you keep bitching as long as you fucking do something else to make the situation better.
]]>But people has asked me this a couple of times lately and my answer is always “I don’t know”. However ps can give you more information about it. In fact, this works for any application you have and you want to debug why is going crazy.
First, check which thread (Asterisk is a multi threaded application) is going crazy.
# ps -LlFm -p `pidof asterisk`
That should show you the % of CPU being used by each Asterisk thread in the column named “C”, then write down the LWP colum value for the thread you are interested on. (LWP is a light weight process number, roughly speaking, the thread id). Now that you have the thread id, you need to know what that thread is doing.
# pstack `pidof asterisk` > /tmp/asterisk.stack.txt
That will cause the asterisk process to dump the stack state to the /tmp/asterisk.stack.txt file. If you don’t have the pstack command google for it, I think in CentOS is as easy as yum install pstack.
Then open the file and search for the LWP that you just wrote down. Hopefully you will find some hints that let you know how to avoid it or at least a lot more information to post in bugs.digium.com
UPDATE:
One of the guys who asked this question later told me what he found:
Thread 10 (Thread 0x41d8f940 (LWP 3406)):
#0 0x00000033ce2ca436 in poll () from /lib64/libc.so.6
#1 0x00000000004933c0 in ast_io_wait ()
#2 0x00002aaabd9510cd in network_thread ()
#3 0x00000000004f8b2c in dummy_start ()
#4 0x00000033cee06367 in start_thread () from /lib64/libpthread.so.0
#5 0x00000033ce2d2f7d in clone () from /lib64/libc.so.6
A quick grep -rI “network_thread” in the Asterisk source code reveals this function belongs to chan_iax.c, disabling chan_iax.so in modules.conf is a good workaround to his problem, however further debugging would be needed to determine why the monitor thread is looping like that.
]]>More details in the following commit: http://lists.digium.com/pipermail/asterisk-commits/2009-March/031735.html
I want to thank all the people that supported the development of OpenR2 with code, testing and build infrastructure. Particularly thanks to:
Neocenter, company located at México, Distrito Federal, that supported the development of OpenR2 from the very beginning, even when I myself was not even sure it could work. Thanks Octavio, Pop and Alejandro.
Sangoma Technologies For their sponsorship during all this time.
Digium Inc for creating Asterisk.
Alexandre Alencar for all his contributions to the project in different areas.
There is obviously more people that has contributed, you all know who you are
]]>