Google Protocol Buffers tools (C code generator).

Overview

buildstatus appveyor coverage codecov nala

About

Google Protocol Buffers tools in Python 3.6+.

  • C source code generator.
  • Rust source code generator ( 🚧 🚧 🚧 under construction 🚧 🚧 🚧 ).
  • proto3 language parser.

Known limitations:

  • Options, services (gRPC) and reserved fields are ignored.
  • Public imports are not implemented.

Project homepage: https://github.com/eerimoq/pbtools

Documentation: https://pbtools.readthedocs.io

Installation

pip install pbtools

C source code design

The C source code is designed with the following in mind:

  • Clean and easy to use API.
  • No malloc/free. Uses a workspace/arena for memory allocations.
  • Fast encoding and decoding.
  • Small memory footprint.
  • Thread safety.

Known limitations:

  • char must be 8 bits.

ToDo:

  • Make map easier to use. Only one allocation should be needed before encoding, not one per sub-message item.

Memory management

A workspace, or arena, is used to allocate memory when encoding and decoding messages. For simplicity, allocated memory can't be freed, which puts restrictions on how a message can be modified between encodings (if one want to do that). Scalar value type fields (ints, strings, bytes, etc.) can be modified, but the length of repeated fields can't.

Scalar Value Types

Protobuf scalar value types are mapped to C types as shown in the table below.

Protubuf Type C Type
double double
float float
int32 int32_t
int64 int64_t
uint32 uint32_t
uint64 uint64_t
sint32 int32_t
sint64 int64_t
fixed32 int32_t
fixed64 int64_t
sfixed32 int32_t
sfixed64 int64_t
bool bool
string char *
bytes struct { uint8_t *buf_p, size_t size }

Message

A message is a struct in C.

For example, let's create a protocol specification.

syntax = "proto3";

package foo;

message Bar {
    bool v1 = 1;
}

message Fie {
    optional int32 v2 = 1;
    Bar v3 = 2;
}

One struct is generated per message.

struct foo_bar_t {
    bool v1;
};

struct foo_fie_t {
    struct {
        bool is_present;
        int32_t value;
    } v2;
    struct foo_bar_t *v3_p;
};

The sub-message v3 has to be allocated before encoding and checked if NULL after decoding.

struct foo_fie_t *fie_p;

/* Encode. */
fie_p = foo_fie_new(...);
fie_p->v2.is_present = true;
fie_p->v2.value = 5;
foo_fie_v3_alloc(fie_p);
fie_p->v3_p->v1 = true;
foo_fie_encode(fie_p, ...);

/* Decode. */
fie_p = foo_fie_new(...);
foo_fie_decode(fie_p, ...);

if (fie_p->v2.is_present) {
    printf("%d\n", fie_p->v2.value);
}

if (fie_p->v3_p != NULL) {
    printf("%d\n", fie_p->v3_p->v1);
}

Oneof

A oneof is an enum (the choice) and a union in C.

For example, let's create a protocol specification.

syntax = "proto3";

package foo;

message Bar {
    oneof fie {
        int32 v1 = 1;
        bool v2 = 2;
    };
}

One enum and one struct is generated per oneof.

enum foo_bar_fie_choice_e {
    foo_bar_fie_choice_none_e = 0,
    foo_bar_fie_choice_v1_e = 1,
    foo_bar_fie_choice_v2_e = 2
};

struct foo_bar_fie_oneof_t {
    enum foo_bar_fie_choice_e choice;
    union {
        int32_t v1;
        bool v2;
    } value;
};

struct foo_bar_t {
    struct foo_bar_fie_oneof_t fie;
};

The generated code can encode and decode messages. Call _<field>_init() to select which oneof field to encode. Use the choice member to check which oneof field was decoded (if any).

struct foo_bar_t *bar_p;

/* Encode with choice v1. */
bar_p = foo_bar_new(...);
foo_bar_fie_v1_init(bar_p);
bar_p->fie.value.v1 = -2;
foo_bar_encode(bar_p, ...);

/* Decode. */
bar_p = foo_bar_new(...);
foo_bar_decode(bar_p, ...);

switch (bar_p->fie.choice) {

case foo_bar_fie_choice_none_e:
    printf("Not present.\n");
    break;

case foo_bar_fie_choice_v1_e:
    printf("%d\n", bar_p->fie.value.v1);
    break;

case foo_bar_fie_choice_v2_e:
    printf("%d\n", bar_p->fie.value.v2);
    break;

default:
    printf("Can not happen.\n");
    break;
}

Benchmark

See benchmark for a benchmark of a few C/C++ protobuf libraries.

Rust source code design

🚧 🚧 🚧 🚧 🚧 Under construction - DO NOT USE 🚧 🚧 🚧 🚧 🚧

The Rust source code is designed with the following in mind:

  • Clean and easy to use API.
  • Fast encoding and decoding.

🚧 🚧 🚧 🚧 🚧 Under construction - DO NOT USE 🚧 🚧 🚧 🚧 🚧

Scalar Value Types

Protobuf scalar value types are mapped to Rust types as shown in the table below.

Protubuf Type Rust Type
double f64
float f32
int32 i32
int64 i64
uint32 u32
uint64 u64
sint32 i32
sint64 i64
fixed32 i32
fixed64 i64
sfixed32 i32
sfixed64 i64
bool bool
string String
bytes Vec<u8>

Message

A message is a struct in Rust.

For example, let's create a protocol specification.

syntax = "proto3";

package foo;

message Bar {
    bool v1 = 1;
}

message Fie {
    optional int32 v2 = 1;
    Bar v3 = 2;
}

One struct is generated per message.

pub struct Bar {
    pub v1: bool
};

pub struct Fie {
    pub v2: Option<i32>,
    pub v3: Option<Box<Bar>>;
};
// Encode.
let fie = Fie {
    v2: Some(5),
    v3: Some(Bar {
        v1: true
    })
};

let encoded = fie.encode();

// Decode.
fie = Default::default();
fie.decode(encoded);

if let Some(v2) = fie.v2 {
     println!("v2: {}", v2);
}

if let Some(v3) = fie.v3 {
     println!("v3.v1: {}", v3.v1);
}

Oneof

A oneof is an enum in Rust.

For example, let's create a protocol specification.

syntax = "proto3";

package foo;

message Bar {
    oneof fie {
        int32 v1 = 1;
        bool v2 = 2;
    };
}

One enum is generated per oneof.

mod bar {
    pub enum Fie {
        v1(i32),
        v2(bool)
    }
}

pub struct Bar {
    fie: Option<bar::Fie>;
}

The generated code can encode and decode messages.

// Encode with choice v1.
let mut bar: Bar {
    fie: Some(bar::Fie::v1(-2))
};

let encoded = bar.encode();

// Decode.
bar = Default::default();
bar.decode(encoded);

if let Some(fie) = bar.fie {
    match fie {
        bar::Fie::v1(v1) => println!("v1: {}", v1),
        bar::Fie::v2(v2) => println!("v2: {}", v2)
   }
}

Example usage

C source code

In this example we use the simple proto-file hello_world.proto.

syntax = "proto3";

package hello_world;

message Foo {
    int32 bar = 1;
}

Generate C source code from the proto-file.

$ pbtools generate_c_source examples/hello_world/hello_world.proto

See hello_world.h and hello_world.c for the contents of the generated files.

We'll use the generated types and functions below.

struct hello_world_foo_t {
   struct pbtools_message_base_t base;
   int32_t bar;
};

struct hello_world_foo_t *hello_world_foo_new(
    void *workspace_p,
    size_t size);

int hello_world_foo_encode(
    struct hello_world_foo_t *self_p,
    void *encoded_p,
    size_t size);

int hello_world_foo_decode(
    struct hello_world_foo_t *self_p,
    const uint8_t *encoded_p,
    size_t size);

Encode and decode the Foo-message in main.c.

#include <stdio.h>
#include "hello_world.h"

int main(int argc, const char *argv[])
{
    int size;
    uint8_t workspace[64];
    uint8_t encoded[16];
    struct hello_world_foo_t *foo_p;

    /* Encode. */
    foo_p = hello_world_foo_new(&workspace[0], sizeof(workspace));

    if (foo_p == NULL) {
        return (1);
    }

    foo_p->bar = 78;
    size = hello_world_foo_encode(foo_p, &encoded[0], sizeof(encoded));

    if (size < 0) {
        return (2);
    }

    printf("Successfully encoded Foo into %d bytes.\n", size);

    /* Decode. */
    foo_p = hello_world_foo_new(&workspace[0], sizeof(workspace));

    if (foo_p == NULL) {
        return (3);
    }

    size = hello_world_foo_decode(foo_p, &encoded[0], size);

    if (size < 0) {
        return (4);
    }

    printf("Successfully decoded %d bytes into Foo.\n", size);
    printf("Foo.bar: %d\n", foo_p->bar);

    return (0);
}

Build and run the program.

$ gcc -I lib/include main.c hello_world.c lib/src/pbtools.c -o main
$ ./main
Successfully encoded Foo into 2 bytes.
Successfully decoded 2 bytes into Foo.
Foo.bar: 78

See examples/hello_world for all files used in this example.

Command line tool

The generate C source subcommand

Below is an example of how to generate C source code from a proto-file.

$ pbtools generate_c_source examples/address_book/address_book.proto

See address_book.h and address_book.c for the contents of the generated files.

Comments
  • Incorrect field names when generating c code in pbtools/c-source/__init__.py proposed changes below

    Incorrect field names when generating c code in pbtools/c-source/__init__.py proposed changes below

    diff --git a/pbtools/c_source/__init__.py b/pbtools/c_source/__init__.py
    index a4403c9..e65d890 100644
    --- a/pbtools/c_source/__init__.py
    +++ b/pbtools/c_source/__init__.py
    @@ -330,7 +330,7 @@ void {oneof.full_name_snake_case}_encode(
     
     DECODE_MEMBER_FMT = '''\
             case {field.field_number}:
    -            self_p->{field.name_snake_case} = \
    +            self_p->{field.name} = \
     pbtools_decoder_read_{field.full_type_snake_case}(decoder_p, wire_type);
                 break;
     '''
    @@ -338,14 +338,14 @@ pbtools_decoder_read_{field.full_type_snake_case}(decoder_p, wire_type);
     DECODE_MEMBER_BYTES_FMT = '''\
             case {field.field_number}:
                 pbtools_decoder_read_bytes(\
    -decoder_p, wire_type, &self_p->{field.name_snake_case});
    +decoder_p, wire_type, &self_p->{field.name});
                 break;
     '''
     
     DECODE_MEMBER_STRING_FMT = '''\
             case {field.field_number}:
                 pbtools_decoder_read_string(\
    -decoder_p, wire_type, &self_p->{field.name_snake_case}_p);
    +decoder_p, wire_type, &self_p->{field.name}_p);
                 break;
     '''
     
    @@ -354,7 +354,7 @@ DECODE_REPEATED_MEMBER_FMT = '''\
                 pbtools_decoder_read_repeated_{field.full_type_snake_case}(
                     decoder_p,
                     wire_type,
    -                &self_p->{field.name_snake_case});
    +                &self_p->{field.name});
                 break;
     '''
     
    @@ -363,7 +363,7 @@ DECODE_REPEATED_ENUM_FMT = '''\
                 pbtools_decoder_read_repeated_int32(
                     decoder_p,
                     wire_type,
    -                &self_p->{field.name_snake_case});
    +                &self_p->{field.name});
                 break;
     '''
     
    @@ -372,7 +372,7 @@ DECODE_REPEATED_MESSAGE_MEMBER_FMT = '''\
                 {field.full_type_snake_case}_decode_repeated_inner(
                     decoder_p,
                     wire_type,
    -                &self_p->{field.name_snake_case});
    +                &self_p->{field.name});
                 break;
     '''
     
    @@ -381,14 +381,14 @@ DECODE_SUB_MESSAGE_MEMBER_FMT = '''\
                 pbtools_decoder_sub_message_decode(
                     decoder_p,
                     wire_type,
    -                &self_p->{field.name_snake_case}.base,
    +                &self_p->{field.name}.base,
                     (pbtools_message_decode_inner_t){field.full_type_snake_case}_decode_inner);
                 break;
     '''
     
     DECODE_ENUM_FMT = '''\
             case {field.field_number}:
    -            self_p->{field.name_snake_case} = pbtools_decoder_read_enum(\
    +            self_p->{field.name} = pbtools_decoder_read_enum(\
     decoder_p, wire_type);
                 break;
     '''
    
    opened by ggoggg 6
  • example make generate fails

    example make generate fails

    Hi,

    With a clean checkout of the master branch, a cd to examples/c/hello_world/ fails when I run make generate. PBTools installed using pip3 install pbtools. The error pasted below (with some manual filename <snip>'s), please let me know if there's something else I can try here? My (limited) python skillz didn't show an obvious problem...

    $ <snip>/pbtools/examples/c/hello_world# make generate
    rm -rf generated
    mkdir -p generated
    cd generated && \
        env PYTHONPATH=../../../.. \
            python3 -m pbtools generate_c_source ../hello_world.proto
    Traceback (most recent call last):
      File "/usr/lib/python3.5/runpy.py", line 174, in _run_module_as_main
        mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
      File "/usr/lib/python3.5/runpy.py", line 133, in _get_module_details
        return _get_module_details(pkg_main_name, error)
      File "/usr/lib/python3.5/runpy.py", line 109, in _get_module_details
        __import__(pkg_name)
      File "/<snip>/tests/pbtools/pbtools/__init__.py", line 4, in <module>
        from .parser import parse_file
      File "/<snip>/tests/pbtools/pbtools/parser.py", line 293
        self.name = f'({full_ident})'
                                    ^
    SyntaxError: invalid syntax
    Makefile:14: recipe for target 'generate' failed
    make: *** [generate] Error 1
    

    Cheers, -Harry

    opened by harryhaaren 3
  • Is pbtools thread safe?

    Is pbtools thread safe?

    I was wondering if pbtools can be considered thread safe with its functions re-entrant.

    For example, let's say this is used in an embedded system with RTOS and two UART peripheral communicating independently. Each one is handled by its own rx/tx threads and those are encoding/decoding protobuf messages. Is there any reason why those functions might be non re-entrant? Should the encoding and decoding be protected by some kind of lock for mutual exclusion to avoid concurrency issues?

    I know it is possible to look at the code, but it is always better to have a comment by the author of the library.

    Thanks, Alex.

    opened by alexpacini 2
  • alignof applied to an expression is a GNU extension

    alignof applied to an expression is a GNU extension

    My gcc compiler gives me this warning:

    '_Alignof' applied to an expression is a GNU extension [-Wgnu-alignof-expression]
    

    The alignof is used for example in:

    void pbtools_decoder_read_string(struct pbtools_decoder_t *self_p,
                                     int wire_type,
                                     char **value_pp)
    {
        uint64_t size;
    
        size = decoder_read_length_delimited(self_p, wire_type);
        *value_pp = decoder_heap_alloc(self_p,
                                       size + 1,
                                       alignof(**value_pp) );
    
        if (*value_pp == NULL) {
            return;
        }
    
        decoder_read(self_p, (uint8_t *)*value_pp, size);
        (*value_pp)[size] = '\0';
    }
    

    If I change it to __alignof__ (https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_73/rzarg/alignof.htm) this warning disappears.

    opened by alexpacini 2
  • utf8 error

    utf8 error

    when the format of proto file is utf8 type, the pbtools will get an error : error: 'gbk' codec can't decode byte 0xa4 in position 218: illegal multibyte sequence, hope you can fix it

    opened by JimsC 1
  • How do you use the Proto object generated by pbtools.parse_file()?

    How do you use the Proto object generated by pbtools.parse_file()?

    I am trying to get the list of messages/enums after parsing a .proto file, but can't seem to figure it out.

    Here's what I'm trying:

    proto = pbtools.parse_file(filepath)
    print(proto.messages)
    

    Result: [<pbtools.parser.Message object at 0x0000025B7E714F10>] Expected result: ["message_name_1", "message_name_2", etc.]

    If you can help me understand how to get the names of the .proto messages and field names inside each message, I would be happy to open a PR to add documentation for this.

    opened by ksemelka 1
  • Non-conformance with protolint and protoc

    Non-conformance with protolint and protoc

    Something I've been having trouble with is not seeing the same syntax enforcement between protolint, protoc, and the pbtools C code generator.

    For example, the following .proto definition will cause an error enforced by protolint and protoc, but not by pbtools:

    Enum value defined multiple times in the same message

    syntax = "proto3";
    
    message foo
    {
      enum bar 
      {
        UNSPECIFIED = 0;
      }
      enum zoo
      {
        UNSPECIFIED = 0;
      }
    }
    

    Error:

    $ protoc --python_out=.  foo.proto
    foo.proto:11:5: "UNSPECIFIED" is already defined in "foo".
    foo.proto:11:5: Note that enum values use C++ scoping rules, meaning that enum values are siblings of their type, not children of it.  Therefore, "UNSPECIFIED" must be unique within "foo", not just within "zoo".
    
    opened by ksemelka 6
  • How does it compare with upb?

    How does it compare with upb?

    Could you consider adding upb (https://github.com/protocolbuffers/upb) to your comparison?

    I think upb is made by Joshua Haberman, who seems to be working for Google. The repo has 1.1k stars, it seems quite popular.

    It is also based on arenas.

    opened by alexpacini 3
Releases(v1.0.1)
Owner
Erik Moqvist
Erik Moqvist
Protocol Buffers - Google's data interchange format

Protocol Buffers - Google's data interchange format Copyright 2008 Google Inc. https://developers.google.com/protocol-buffers/ Overview Protocol Buffe

Protocol Buffers 57.6k Jan 3, 2023
Protocol Buffers with small code size

Nanopb - Protocol Buffers for Embedded Systems Nanopb is a small code-size Protocol Buffers implementation in ansi C. It is especially suitable for us

null 3.3k Dec 31, 2022
Protocol Buffers implementation in C

Overview This is protobuf-c, a C implementation of the Google Protocol Buffers data serialization format. It includes libprotobuf-c, a pure C library

null 2.2k Dec 26, 2022
Cap'n Proto serialization/RPC system - core tools and C++ library

Cap'n Proto is an insanely fast data interchange format and capability-based RPC system. Think JSON, except binary. Or think Protocol Buffers, except

Cap'n Proto 9.5k Jan 1, 2023
Macesuted's Code

Macesuted's Code 这里存放了自 2021-5-1 以来我在学习 OI 过程中于各大 OJ 所做题目的 AC 代码。 仅供个人学习参考使用,请不要直接复制这些代码以 AC 对应题目,转载请注明出处。 我的 个人博客 中包含部分题目的题解。 我在这些 OJ 上的帐号: AtCoder:

Macesuted 25 Dec 19, 2022
A fast, byte-code interpreted language

Minima Minima is a small, portable, and fast programming language written in C. The Syntax Minima's syntax is optimized for a fast byte-code translati

null 43 Aug 16, 2022
Morse code decoding library

ggmorse Morse code decoding library ggmorse2.mp4 ggmorse0.mp4 ggmorse1.mp4 Try it out You can easily test the library using the free GGMorse applicati

Georgi Gerganov 106 Dec 23, 2022
Protocol Buffers - Google's data interchange format

Protocol Buffers - Google's data interchange format Copyright 2008 Google Inc. https://developers.google.com/protocol-buffers/ Overview Protocol Buffe

Protocol Buffers 57.5k Dec 29, 2022
Protocol Buffers - Google's data interchange format

Protocol Buffers - Google's data interchange format Copyright 2008 Google Inc. https://developers.google.com/protocol-buffers/ Overview Protocol Buffe

Protocol Buffers 57.6k Jan 3, 2023
Protocol Buffers with small code size

Nanopb - Protocol Buffers for Embedded Systems Nanopb is a small code-size Protocol Buffers implementation in ansi C. It is especially suitable for us

null 3.3k Dec 31, 2022
Protocol Buffers implementation in C

Overview This is protobuf-c, a C implementation of the Google Protocol Buffers data serialization format. It includes libprotobuf-c, a pure C library

null 2.2k Dec 26, 2022
A protocol buffers library for C

PBC PBC is a google protocol buffers library for C without code generation. Quick Example package tutorial; message Person { required string name =

云风 1.6k Dec 28, 2022
A Google Chat protocol plugin for libpurple/Pidgin/bitlbee/whatever

Google Chat Plugin for libpurple A WORK IN PROGRESS replacement prpl for Google Chat in Pidgin/libpurple to support the proprietary protocol that Goog

Eion Robb 90 Dec 26, 2022
This repository contains the tools we used in our research on the Google Titan M chip

Titan M tools In this repository, we publish the tools we used in our research on the Google Titan M chip. We presented our results at Black Hat EU 21

Quarkslab 149 Dec 5, 2022
Simple and fast C library implementing a thread-safe API to manage hash-tables, linked lists, lock-free ring buffers and queues

libhl C library implementing a set of APIs to efficiently manage some basic data structures such as : hashtables, linked lists, queues, trees, ringbuf

Andrea Guzzo 392 Dec 3, 2022
CUDA Custom Buffers and example blocks

gr-cuda CUDA Support for GNU Radio using the custom buffer changes introduced in GR 3.10. Custom buffers for CUDA-enabled hardware are provided that c

GNU Radio 5 Aug 17, 2022
miniz: Single C source file zlib-replacement library, originally from code.google.com/p/miniz

Miniz Miniz is a lossless, high performance data compression library in a single source file that implements the zlib (RFC 1950) and Deflate (RFC 1951

Rich Geldreich 1.6k Jan 5, 2023
Automatically exported from code.google.com/p/vartypes

======================================================================== VarTypes Author: Stefan Zickler <http://szickler.net>, (C) 2007-2015 Avai

null 14 Jan 22, 2022
Part copy from the Google Code repository by Shiru, part fork of SNES Game Sound System by Shiru

Part copy from the Google Code repository by Shiru, part fork of SNES Game Sound System by Shiru

null 5 Sep 6, 2022