User Tools

Site Tools


optimizing_c_code_for_ros_nodes_on_embedded_systems

Datarates for RGBD images

My ROS node processes RGBD or color images plus a second depth image, i.e., RGB+D images. These images are generated by drivers that have significant computational cost and also have high bandwidth data requirements. For example, a 640×480 RGB color image and accompanying 640×480 depth image requires 640*480 pixel allocations where each pixel has a RGB byte values (3 bytes) and a floating point depth (32bit float = 4 bytes). Hence there are 7*640*480=2150400 raw data bytes per RGBD frame. Typically these are generated at 30 frames per second giving a datarate of 2150400*30 = 61.523MB/sec. That's a lot of data to process each second! Doing so requires careful consideration of computational tasks for your node.

Allocate ahead of time

Allocation of memory takes time. The more you allocate, generally, the more time it requires. Once I know the resolution of the images I will be processing I allocate memory for a bunch of objects that I know will be needed. These include a collection of float, int, and uchar type buffers provided by a BufferProvider class which my node extends. Whenever I need a buffer, I get a pointer to a buffer from the BufferProvider. My provider is very primitive so I need to make sure my code does not ask for an in-use buffer at any point in time. The sequential nature of most C++ code makes this task fairly easy for all cases I've encountered thus far.

BufferProvider.hpp
class BufferProvider {
public:
 
    BufferProvider() {
    };
 
    virtual ~BufferProvider() {
    };
 
    int allocatedBuffers() {
        return numBuffers;
    };
 
    int buffersize() {
        return bufsize;
    };
 
    float* getFloatBuffer(int index) {
        if (index >= 0 && index < numBuffers) {
            return fBuffers[index];
        }
        return NULL;
    };
 
    uchar* getUCharBuffer(int index) {
        if (index >= 0 && index < numBuffers) {
            return ucBuffers[index];
        }
        return NULL;
    };
 
    int* getIntBuffer(int index) {
        if (index >= 0 && index < numBuffers) {
            return iBuffers[index];
        }
        return NULL;
    };
 
    void allocateBuffers(int num, int numElems) {
        numBuffers = num;
        bufsize = numElems;
        fBuffers = new float*[num];
        for (int i = 0; i < num; ++i) {
            fBuffers[i] = new float[bufsize];
        }
        ucBuffers = new uchar*[num];
        for (int i = 0; i < num; ++i) {
            ucBuffers[i] = new uchar[bufsize];
        }
        iBuffers = new int*[num];
        for (int i = 0; i < num; ++i) {
            iBuffers[i] = new int[bufsize];
        }    
    };
 
    void deleteBuffers() {
        // De-Allocate memory to prevent memory leak
        for (int i = 0; i < numBuffers; ++i)
            delete [] fBuffers[i];
        delete [] fBuffers;
        for (int i = 0; i < numBuffers; ++i)
            delete [] ucBuffers[i];
        delete [] ucBuffers;
        for (int i = 0; i < numBuffers; ++i)
            delete [] iBuffers[i];
        delete [] iBuffers;        
    };
 
private:
    int numBuffers;
    float** fBuffers;
    uchar** ucBuffers;
    int** iBuffers;
    int width, height;
    int bufsize;
};

Notes / Links

Control how std template classes allocate memory. Make them much faster. http://howardhinnant.github.io/stack_alloc.html

When sorting of values is needed, entertain your options

Some obvious options for sorting values are std::qsort() and std::partial_sort(). Yet, depending on your data you may want to opt for a different sorting method. This site has code for a bunch of sorting algorithm options: http://home.westman.wave.ca/~rhenry/sort/

optimizing_c_code_for_ros_nodes_on_embedded_systems.txt · Last modified: 2015/10/14 09:13 by arwillis