Revisiting Cheap Action Cams

In the last blog post (from over two years ago!) I had a first look at the protocol that is used to control cheap action cameras via their corresponding smartphone apps. It turns out that this protocol is used by many similar cameras without any modifications and any app can be used for any camera.

I’ve been contacted by several people that had problems using the example code that I provided, written in Node.JS and there have been several issues, like delays and crashes, that I simply couldn’t fix without having a camera available for testing.

Thankfully Wes Honeycutt from the Oklahoma Biological Survey contacted me and asked for help to integrate his action cam with OpenCV. He provided me with a CamPark ACT76 camera to help me with development. He plans on using the cheap action cameras to reduce the total cost of the hardware in his fascinating LunAero project, tracking the nocturnal migration of birds (see also: Moonwatching at The Bridge Lab).

Photo of CamPark ACT76 Camera
Photo of CamPark ACT76 Camera

Reverse engineering additional commands

I installed the application that was meant to be used with the camera and started capturing traffic between my Android phone and the camera. The packet structure was the same I observed with the TecTecTec camera before:

Message structure of a single packet
Message structure of a single packet

I started reverse engineering more of the available commands and came up with the following list:

Command Description
0x00 0x00 0x01 0x10 Login
0x00 0x00 0x01 0x11 Login accepted
0x00 0x00 0x11 0x12 Alive request
0x00 0x00 0x11 0x13 Alive response
0x00 0x00 0x01 0x14 Discovery request
0x00 0x00 0x01 0x15 Discovery response
0x00 0x00 0x01 0xFF Start preview stream
0x00 0x00 0xA0 0x25 Request list of files
0x00 0x00 0xA0 0x26 List of files content
0x00 0x00 0xA0 0x34 Request firmware information
0x00 0x00 0xA0 0x35 Firmware information
0x00 0x00 0xA0 0x38 Capture a still image
0x00 0x00 0xA0 0x39 Still image captured
0x00 0x00 0xA0 0x3A Start/Stop video recording (control with first byte of 4 byte payload)
0x00 0x00 0xA0 0x3B Start/Stop video recording accepted

The basic communication scheme has not changed since I last had a look at the protocol. Some commands require a payload to be sent along, e.g. the start/stop video recording command requires 4 bytes of payload, the first byte decides if the camera should start or stop recording. The command to retrieve the list of files also requires 4 bytes of payload that I simply copied from the capture as I could not make any sense of it yet. There is a set of requests the app uses to determine the current settings of the camera and to change them through the interface of the app, I never wanted to change any of the settings during testing so I did not implement them, this would be trivial to add given the code structure of the new client utility. All settings can be changed through the cameras own UI as well, which makes this feature not too useful anyway.

Development of a new client library and utility

To keep the requirements low and increase the performance of the new client utility I switched to my new favorite programming language: Go. I’ve written easy to use library code that abstracts away almost all aspects of the communication protocol but also allows for the specification of raw messages to be sent for hacking and tinkering purposes.

Preview streaming now is more fluent and much more reliable, additionally it doesn’t consume much memory and CPU resources and could be used on low-end devices like the Raspberry-Pi.

Camera discovery

To make it easier to use, I also reverse engineered the method that most of the apps use to discover a camera in their network by sending UDP broadcasts. The App sends a Discovery request message to the networks broadcast address, most apps restrict this to the current subnet but it works using 255.255.255.255 as a target address too. The target port is either 22600 or 21600, I assume that there has been a change between different firmware versions. The app just sends requests on both of the ports. Upon receiving a Discovery request message, the camera will reply with a Discovery response message, containing its serial number to the source address and port it received the request from. My code simply waits for any response and accepts the first one it receives, extracts the source address and returns.

buffer := make([]byte, 80)
_, remoteAddr, err := conn.ReadFrom(buffer)

if err != nil {
    return nil, err
}

udpAddr := remoteAddr.(*net.UDPAddr)
return udpAddr.IP, nil

The 80 bytes buffer is uesd to hold the message including its payload portion, parsing the message or payload isn’t necessary to discover the camera.

RTP improvements

I still haven’t found an easy to use library that allows sending raw RTP data to the network without any setup or complicated stream description. I had to build my own naive RTP implementation again, which wasn’t that hard but required some tweaking for maximum performance. It took several passes of profiling and changing the code to reduce memory usage and leaks, and get the order of operations right to not waste any time with unnecessary parsing or preparation of packets. The resulting code is reusable and implements a simple RTP-Relay that listens for a camera preview stream and relays it to the specified target. Everytime an RTP packet has been fully assembled, after a Time packet has been received from the camera, the next packet will already be prepared. The work during the reception of a packet containing H.264 data has been reduced to just buffering the data.

case 0x0001: // H.264 Data
    frameBuffer.Write(payload)
case 0x0002: // Time
    // Append the Framebuffer
    packetBuffer.Write(frameBuffer.Bytes())

    // Send out the packet
    rtpConn.Write(packetBuffer.Bytes())

    // Prepare the next packet
    packetBuffer.Reset()
    packetBuffer.Write([]byte{0x80, 0x63})
    binary.Write(&packetBuffer, binary.BigEndian, sequenceNumber+1)  // RTP Sequence No
    binary.Write(&packetBuffer, binary.BigEndian, (uint32)(elapsed)) // RTP Timestamp
    binary.Write(&packetBuffer, binary.BigEndian, (uint64(0)))       // SSRC

    // Reset the Framebuffer
    frameBuffer.Reset()
    sequenceNumber++

    elapsed = binary.LittleEndian.Uint32(payload[12:])

I discovered that most RTP clients don’t really care about the elapsed timer and the camera seems to already send multiples of 90, which is a strong indication that this is already kind of respecting the H.264 clock rate of 90kHz according to RFC6148 (RTP Payload Format for H.264 Video).

RTSP server implementation

Not all clients allow easy opening of an SDP file to listen for incoming RTP data. To improve on this, I added a very rudimentary RTSP server to the utility, that listens for incoming connections and allows one client to setup a video stream. I tested the server with ffmpeg, VLC and mplayer and could reliably start a preview with all of them by just pointing them to the RTSP URL. For viewing the preview video, mplayer has proven to be the best choice. When given the -nocache option, latency between the camera and the preview video is reduced to a minimum. Both VLC and ffmpeg are not as easy to setup for low latency. While reading the RTSP specification I stumbled across the RECORD request, which I implemented to start recording on the camera. As of today, I haven’t found a single client that could trigger this.

Go utility and library

I’ve implemented the camera control library and utility as two mostly separate pieces to allow usage of the library in other projects. The libipcamera code can be integrated into any Go application and provides utility functions to perform all of the useful functions the client apps can do. The utility provides a command line interface to all of the libraries features and allows you to send custom commands and view raw responses while tinkering with your camera to discover new features.

All code is licensed under the Apache 2.0 License and can be accessed on GitHub. Feel free to file an issue or make your own additions through pull requests.