Android: 实现RGBA转RGB及BGRA图象格式

常用的图象格式包括RGBA（32位）, RGB（24位）及BGRA(24位）在ARM 32位及64位平台上转换有什么好的方法。

清华大学AOSP代码镜像

1. https://aosp.tuna.tsinghua.edu.cn

2. https://mirrors.tuna.tsinghua.edu.cn/help/AOSP/

3. 代码下载：

 repo init -u https://aosp.tuna.tsinghua.edu.cn/platform/manifest -b android-7.1.1_r1 --repo-url=https://aosp.tuna.tsinghua.edu.cn/tools/repo.git

NOTE: 这里会参考skia的代码。

Ｃ语言实现

CPU为小端模式，RGBA 在内存中从低到高排列为 RR GG BB AA, 用int32_t表示为0xAABBGGRR

1. RGBA转RGB

void rgba_to_rgb(unsigned char *src, unsigned char *dst, int numPixels)
{
    int col;

    for (col = 0; col < numPixels; col++, src += 4, dst += 3) {
        dst[0] = src[0];
        dst[1] = src[1];
        dst[2] = src[2];
    }
}

2. RGBA转BGRA ，又称之为RB swap

void rgba_to_bgra(unsigned char *src, unsigned char *dst, int numPixels)
{
    int col;

    for (col = 0; col < numPixels; col++, src += 4, dst += 4) {
        dst[0] = src[2];
        dst[1] = src[1];
        dst[2] = src[0];
        dst[3] = src[3];
    }
}

ARM 32位平台带NEON

对于32位的ARM来说，NEON并不是标配，所以想运行下面的代码，还要看你的设备支不支持。

1. RGBA转BGRA

具体可以参考这篇文档：https://gist.github.com/micahpearlman/2512316

// Really awesome code taken from: http://apangborn.com/2011/05/pixel-processing-using-arm-assembly/
inline static void neon_rgba_to_bgra(unsigned char *src, unsigned char *dst, int numPixels)
{
#ifdef __ARM_NEON__
    int simd_pixels = numPixels & ~7; // round down to nearest 8
    int simd_iterations = simd_pixels >> 3;
    int col;
    if(simd_iterations) { // make sure at least 1 iteration
        __asm__ __volatile__ ("1: \n\t"
                              // structured load of 8 pixels into d0-d3 (64-bit) NEON registers
                              "vld4.8 {d0, d1, d2, d3}, [%[source]]! \n\t" // the "!" increments the pointer by number of bytes read
                              "vswp d0, d2 \n\t" // swap registers d0 and d2 (swaps red and blue, 8 pixels at a time)
                              "vst4.8 {d0, d1, d2, d3}, [%[dest]]! \n\t" // structured store the 8 pixels back, the "!" increments the pointer by number of bytes written
                              "subs %[iterations],%[iterations],#1 \n\t"
                              "bne 1b" // jump to label "1", "b" suffix means the jump is back/behind the current statement
                              : [source]"+r"(src), [dest] "+r"(dst), [iterations]"+r"(simd_iterations) // output parameters, we list read-write, "+", value as outputs. Read-write so that the auto-increment actually affects the 'src' and 'dst'
                              :  // no input parameters, they're all read-write so we put them in the output parameters
                              : "memory", "d0", "d1", "d2", "d3" // clobbered registers
                              );
    }
    // ...
#endif
}

需要用到的指令主要有vld4.8, vswap, vst4.8，一次可以处理8个像素。

2. RGBA转RGB

参考上面的代码，我们可以很容易地写出相应的代码：

void neon_rgba_to_rgb(unsigned char *src, unsigned char *dst, int numPixels)
{
    int simd_pixels = numPixels & ~7; // round down to nearest 8
    int simd_iterations = simd_pixels >> 3;
    int col;
    if(simd_iterations) { // make sure at least 1 iteration
        __asm__ __volatile__ ("1: \n\t"
              // structured load of 8 pixels into d0-d3 (64-bit) NEON registers
              "vld4.8 {d0, d1, d2, d3}, [%[source]]! \n\t" // the "!" increments the pointer by number of bytes read
              "vst3.8 {d0, d1, d2}, [%[dest]]! \n\t" // structured store the 8 pixels back, the "!" increments the pointer by number of bytes written
              "subs %[iterations],%[iterations],#1 \n\t"
              "bne 1b" // jump to label "1", "b" suffix means the jump is back/behind the current statement
            : [source]"+r"(src), [dest] "+r"(dst), [iterations]"+r"(simd_iterations) // output parameters, we list read-write, "+", value as outputs.
                                                                                     // Read-write so that the auto-increment actually affects the 'src' and 'dst'
            :  // no input parameters, they're all read-write so we put them in the output parameters
            : "memory", "d0", "d1", "d2", "d3" // clobbered registers
        );
    }

    // swap the leftover pixels
    // ...
}

3. 代码编译

编译代码的时候要加上如下参数

LOCAL_CFLAGS := \
    -march=armv7-a -mtune=cortex-a8 -mfpu=neon

4. 转换效率

这里以memcpy做参考，请看systrace:

ARM64位（AARCH64）平台上

在ARM 64位平台上，NEON就是标配。

本来以为ARM 32位的汇编代码直接编译成64位程序，可实际上并不是这样，指令都变了，所以还需要参考skia的代码：

版本：这里使用的android-7.1.1_r1版本上的skia, 代码在external/skia/src/opts/SkSwizzler_opts.h：

NOTE:

1. 这里我们需要使用arm_neon.h头文件，位于(@ Android NDK)：

toolchains/aarch64-linux-android-4.9/prebuilt/darwin-x86_64/lib/gcc/aarch64-linux-android/4.9.x/include/arm_neon.h

2. 这里使用到了vld4_u8()及vst4_u8()函数，但却没有与vswp指令对应的函数

相关的参考文档

https://mirrors.tuna.tsinghua.edu.cn/help/AOSP/
https://gist.github.com/micahpearlman/2512316

小伙伴们的智能之旅

发表评论取消回复

Under the hood

发表评论 取消回复

Under the hood

发表评论取消回复