Understanding CSTA - Part 1

What does it take to connect to a communications system using CSTA? What does it take to implement the service provider side of CSTA communications? What exactly is CSTA and what purpose does it serve? As for all questions, lets have a look at the Wikipedia article.

Computer Supported Telecommunications Applications (CSTA) is an abstraction layer for telecommunications applications. It is independent of underlying protocols. – Wikipedia

CSTA is used to connect applications to PBXs to be able to control the connected phones, control the routing of calls and monitor events happening inside the PBX. CSTA applications range from simple CTI (Computer Telephony Integration) to automatic call distribution (think call centers) solutions.

CSTA is available in 3 versions called phases, each phase added more functionality and extended the original CSTA specification. In theory, every client that does not rely on CSTA phase 2 or 3 functionality should be able to communicate with a server that is CSTA phase 1 compatible (The used protocol version used will be determined while establishing a connection). This backwards compatibility reached a point where even when CSTA-XML was introduced, ECMA decided to ignore the missing “r” in “TransferedEvent”.

CSTA messages may be encoded in two different schemes, ASN.1 and XML. ASN.1 allows for more efficient implementations on limited hardware like PBXs from 1992. ASN.1 encoding results in tightly packed binary data on the wire and can be read into C-style structs for speedy parsing. XML-Encoding of CSTA is much more verbose, but human readable and much easier to understand. The overhead of processing XML-Encoded CSTA messages can be negligible on modern systems.

As Wikipedia correctly states, CSTA is independent of underlying protocols. On many vintage PBXs, a serial port is used to connect to the CSTA interface. On this serial interface, raw ASN.1 encoded CSTA messages would be exchanged. Most modern systems feature means of Ethernet connectivity and provide a CSTA interface over TCP. Because of this abstraction, CSTA defines some obscure terms for its components. The server (in TCP terms) is known as the Switching Function and the client is known as the Computing Function. The Switching Function has many responsibilities in CSTA, it has to keep track of the complete systems state and notify any interested Computing Functions of any events that happened, even if the Computing Function triggered the event.

Every CSTA message includes an Invoke ID. The Invoke ID helps the Computing Function keep track of its actions and the associated results. The Invoke ID may be any randomly chosen four digit integer between 0 and 9998. There must only be one action associated with an Invoke ID at any given time or the Computing Function could not reliably determine the result of the action. The Invoke ID 9999 is reserved for CSTA Events. The Switching Function will set the Invoke ID to 9999 whenever it is sending an event notification to a Computing Function.

On the wire, CSTA-XML Messages are transmitted using the following structure (as specified in ECMA-323 Appendix J.2):

CSTA Message Structure
CSTA Message Structure

The Format Indicator is used to differentiate between TCP with SOAP and TCP without SOAP. 0x00 0x00 indicates TCP without SOAP, 0x00 0x01 indicates TCP with SOAP. The most commonly used variant seems to be TCP without SOAP, so I will not get into detail on the TCP with SOAP transport.

Following the Format Indicator is the length of the complete message, including the 8 Byte header, most significant byte first. The length of a CSTA message is therefore limited to 65535 bytes.

The Invoke ID is represented as ASCII numerical characters. In this example it is “9999”, representing a CSTA event.

Following the Invoke ID is the message body. The ECMA-323 document states that the message body is encoded as ASCII text. But the examples provided by ECMA do include XML definitions that set the encoding to UTF-8. The easiest method of parsing the XML message is to remove the 8 byte message header and parse the remainder using a XML library. The standard allows the concatenation of messages inside a single TCP packet (to reduce overhead) and the fragmentation of messages (to transmit large messages over low MTU connections). A CSTA implementation has to first split or reassemble these messages into single complete message for parsing.

An example XML message (MakeCall request) looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<MakeCall xmlns="http://www.ecma-international.org/standards/ecma-323/csta/ed6">
	<callingDevice>100</callingDevice>
	<calledDirectoryNumber>+494121...</calledDirectoryNumber>
</MakeCall>

A Computing Function sending this request must provide an Invoke ID and the total length to send this message to a Computing Function, the Invoke ID can be chosen randomly.

Prior to sending requests to a Switching Function, a Computing Function must authenticate with the Switching Function and the supported protocol version has to be determined. For this purpose the StartApplicationSession message is sent. The connecting application must provide a applicationID, username, password and a list of supported protocol versions. The application ID is a friendly name that might be shown to the administrator of the Switching Function to identify the client.

The Switching Function might either respond with a StartApplicationSessionPosResponse or with a StartApplicationSessionNegResponse. The Switching function might impose limits on the number of active Computing Functions or might not be able to comply with the requested protocol version. The complete specification of this process is specified in ETSI TS 102 344.

When an application session has been established successfully, the Computing Function has to reset the session timer periodically. The StartApplicationSessionPosResponse includes the actualSessionDuration. The Computing Function has to reset the timer before it elapses, Switching Functions are advised to allow for a grace period before the session is dropped.

This concludes part 1, part 2 will cover monitor points and events as well as device discovery.