http request/response parser for c

Overview

HTTP Parser

http-parser is not actively maintained. New projects and projects looking to migrate should consider llhttp.

Build Status

This is a parser for HTTP messages written in C. It parses both requests and responses. The parser is designed to be used in performance HTTP applications. It does not make any syscalls nor allocations, it does not buffer data, it can be interrupted at anytime. Depending on your architecture, it only requires about 40 bytes of data per message stream (in a web server that is per connection).

Features:

  • No dependencies
  • Handles persistent streams (keep-alive).
  • Decodes chunked encoding.
  • Upgrade support
  • Defends against buffer overflow attacks.

The parser extracts the following information from HTTP messages:

  • Header fields and values
  • Content-Length
  • Request method
  • Response status code
  • Transfer-Encoding
  • HTTP version
  • Request URL
  • Message body

Usage

One http_parser object is used per TCP connection. Initialize the struct using http_parser_init() and set the callbacks. That might look something like this for a request parser:

http_parser_settings settings;
settings.on_url = my_url_callback;
settings.on_header_field = my_header_field_callback;
/* ... */

http_parser *parser = malloc(sizeof(http_parser));
http_parser_init(parser, HTTP_REQUEST);
parser->data = my_socket;

When data is received on the socket execute the parser and check for errors.

size_t len = 80*1024, nparsed;
char buf[len];
ssize_t recved;

recved = recv(fd, buf, len, 0);

if (recved < 0) {
  /* Handle error. */
}

/* Start up / continue the parser.
 * Note we pass recved==0 to signal that EOF has been received.
 */
nparsed = http_parser_execute(parser, &settings, buf, recved);

if (parser->upgrade) {
  /* handle new protocol */
} else if (nparsed != recved) {
  /* Handle error. Usually just close the connection. */
}

http_parser needs to know where the end of the stream is. For example, sometimes servers send responses without Content-Length and expect the client to consume input (for the body) until EOF. To tell http_parser about EOF, give 0 as the fourth parameter to http_parser_execute(). Callbacks and errors can still be encountered during an EOF, so one must still be prepared to receive them.

Scalar valued message information such as status_code, method, and the HTTP version are stored in the parser structure. This data is only temporally stored in http_parser and gets reset on each new message. If this information is needed later, copy it out of the structure during the headers_complete callback.

The parser decodes the transfer-encoding for both requests and responses transparently. That is, a chunked encoding is decoded before being sent to the on_body callback.

The Special Problem of Upgrade

http_parser supports upgrading the connection to a different protocol. An increasingly common example of this is the WebSocket protocol which sends a request like

    GET /demo HTTP/1.1
    Upgrade: WebSocket
    Connection: Upgrade
    Host: example.com
    Origin: http://example.com
    WebSocket-Protocol: sample

followed by non-HTTP data.

(See RFC6455 for more information the WebSocket protocol.)

To support this, the parser will treat this as a normal HTTP message without a body, issuing both on_headers_complete and on_message_complete callbacks. However http_parser_execute() will stop parsing at the end of the headers and return.

The user is expected to check if parser->upgrade has been set to 1 after http_parser_execute() returns. Non-HTTP data begins at the buffer supplied offset by the return value of http_parser_execute().

Callbacks

During the http_parser_execute() call, the callbacks set in http_parser_settings will be executed. The parser maintains state and never looks behind, so buffering the data is not necessary. If you need to save certain data for later usage, you can do that from the callbacks.

There are two types of callbacks:

  • notification typedef int (*http_cb) (http_parser*); Callbacks: on_message_begin, on_headers_complete, on_message_complete.
  • data typedef int (*http_data_cb) (http_parser*, const char *at, size_t length); Callbacks: (requests only) on_url, (common) on_header_field, on_header_value, on_body;

Callbacks must return 0 on success. Returning a non-zero value indicates error to the parser, making it exit immediately.

For cases where it is necessary to pass local information to/from a callback, the http_parser object's data field can be used. An example of such a case is when using threads to handle a socket connection, parse a request, and then give a response over that socket. By instantiation of a thread-local struct containing relevant data (e.g. accepted socket, allocated memory for callbacks to write into, etc), a parser's callbacks are able to communicate data between the scope of the thread and the scope of the callback in a threadsafe manner. This allows http_parser to be used in multi-threaded contexts.

Example:

 typedef struct {
  socket_t sock;
  void* buffer;
  int buf_len;
 } custom_data_t;


int my_url_callback(http_parser* parser, const char *at, size_t length) {
  /* access to thread local custom_data_t struct.
  Use this access save parsed data for later use into thread local
  buffer, or communicate over socket
  */
  parser->data;
  ...
  return 0;
}

...

void http_parser_thread(socket_t sock) {
 int nparsed = 0;
 /* allocate memory for user data */
 custom_data_t *my_data = malloc(sizeof(custom_data_t));

 /* some information for use by callbacks.
 * achieves thread -> callback information flow */
 my_data->sock = sock;

 /* instantiate a thread-local parser */
 http_parser *parser = malloc(sizeof(http_parser));
 http_parser_init(parser, HTTP_REQUEST); /* initialise parser */
 /* this custom data reference is accessible through the reference to the
 parser supplied to callback functions */
 parser->data = my_data;

 http_parser_settings settings; /* set up callbacks */
 settings.on_url = my_url_callback;

 /* execute parser */
 nparsed = http_parser_execute(parser, &settings, buf, recved);

 ...
 /* parsed information copied from callback.
 can now perform action on data copied into thread-local memory from callbacks.
 achieves callback -> thread information flow */
 my_data->buffer;
 ...
}

In case you parse HTTP message in chunks (i.e. read() request line from socket, parse, read half headers, parse, etc) your data callbacks may be called more than once. http_parser guarantees that data pointer is only valid for the lifetime of callback. You can also read() into a heap allocated buffer to avoid copying memory around if this fits your application.

Reading headers may be a tricky task if you read/parse headers partially. Basically, you need to remember whether last header callback was field or value and apply the following logic:

(on_header_field and on_header_value shortened to on_h_*)
 ------------------------ ------------ --------------------------------------------
| State (prev. callback) | Callback   | Description/action                         |
 ------------------------ ------------ --------------------------------------------
| nothing (first call)   | on_h_field | Allocate new buffer and copy callback data |
|                        |            | into it                                    |
 ------------------------ ------------ --------------------------------------------
| value                  | on_h_field | New header started.                        |
|                        |            | Copy current name,value buffers to headers |
|                        |            | list and allocate new buffer for new name  |
 ------------------------ ------------ --------------------------------------------
| field                  | on_h_field | Previous name continues. Reallocate name   |
|                        |            | buffer and append callback data to it      |
 ------------------------ ------------ --------------------------------------------
| field                  | on_h_value | Value for current header started. Allocate |
|                        |            | new buffer and copy callback data to it    |
 ------------------------ ------------ --------------------------------------------
| value                  | on_h_value | Value continues. Reallocate value buffer   |
|                        |            | and append callback data to it             |
 ------------------------ ------------ --------------------------------------------

Parsing URLs

A simplistic zero-copy URL parser is provided as http_parser_parse_url(). Users of this library may wish to use it to parse URLs constructed from consecutive on_url callbacks.

See examples of reading in headers:

Issues
  • Implementing extension methods in a way that does not slow down performance...

    Implementing extension methods in a way that does not slow down performance...

    This does two things:

    1. If an extension method is detected, an optional callback is called
    2. All the characters in the method are validated using TOKEN(ch)
    opened by jasnell 24
  • Connection header not parsed properly, breaks FF websockets

    Connection header not parsed properly, breaks FF websockets

    See https://bugzilla.mozilla.org/show_bug.cgi?id=730742 and joyent/node#2849.

    In a nutshell, FF sends a header that looks like Connection: keep-alive, upgrade while http-parser only understands Connection: upgrade. Should be fixed on short notice.

    opened by bnoordhuis 23
  • API CHANGE: Remove path, query, fragment CBs.

    API CHANGE: Remove path, query, fragment CBs.

    • Get rid of support for these callbacks in http_parser_settings.
    • Retain state transitions between different URL portions in http_parser_execute() so that we're making the same correctness guarantees as before.
    • These are being removed because making multiple callbacks for the same byte makes it more difficult to pause the parser.
    opened by pgriess 20
  • Request with Transfer-Encoding: chunked and Content-Length is valid per RFC, but rejected with HPE_UNEXPECTED_CONTENT_LENGTH

    Request with Transfer-Encoding: chunked and Content-Length is valid per RFC, but rejected with HPE_UNEXPECTED_CONTENT_LENGTH

    Current implementation rejects requests with Transfer-Encoding: chunked and Content-Length headers with HPE_UNEXPECTED_CONTENT_LENGTH. https://github.com/nodejs/http-parser/blob/master/http_parser.c#L1804-L1815

    But per https://tools.ietf.org/html/rfc7230#section-3.3.3 that's a valid http request, Content-Length must be ignored:

           If a message is received with both a Transfer-Encoding and a
           Content-Length header field, the Transfer-Encoding overrides the
           Content-Length. 
    

    llhttp parser has the correct behavior: https://github.com/nodejs/llhttp/blob/master/src/native/http.c#L50

    opened by veshij 18
  • assert(!messages[num_messages].message_begin_cb_called);

    assert(!messages[num_messages].message_begin_cb_called);

    my application is running by sisngle thread ,so i defines tatic struct message messages[1];.every time i need parse http data,i will do that num_messages = 0; bzero(messages,sizeof(messages)); , messages[num_messages].message_begin_cb_called = FALSE;, http_parser_init(parser, HTTP_BOTH); http_parser_execute(parser, &settings, data, slDataLen);

    But i always exit caused by assert(!messages[num_messages].message_begin_cb_called); in message_begin_cb, .i dont known what happend.please give me some advice.thank you

    opened by jiamuluo 18
  • RFC-7230 Sec 3.2.4 expressly forbids line-folding in header field-names.

    RFC-7230 Sec 3.2.4 expressly forbids line-folding in header field-names.

    This change no longer allows obsolete line-folding between the header field-name and the colon. If HTTP_PARSER_STRICT is unset, the parser still allows space characters.

    opened by jpinner 17
  • Can no longer parse header

    Can no longer parse header "X-Custom-Header-Bytes: …" despite browsers doing this fine

    This is causing jsdom to fail several web-platform-tests where it is tested that its behavior emulates XMLHttpRequest (https://github.com/tmpvar/jsdom/issues/1380). I believe this is a regression as of Node.js 5.6.0.

    The test is a combination of https://github.com/w3c/web-platform-tests/blob/master/XMLHttpRequest/resources/headers.py and https://github.com/w3c/web-platform-tests/blob/master/XMLHttpRequest/getresponseheader-special-characters.htm where it tests that the server setting

    X-Custom-Header-Bytes: …
    

    results in the client receiving "\xE2\x80\xA6" (i.e. "…").

    We would like to be able to achieve this behavior in jsdom just like browsers are specified to do.

    /cc @annevk for the standards-based perspective. I imagine it's something like "HTTP underspecifies interoperable behavior" but couldn't find anything in Fetch that specifically addresses the issue of what to do when headers are not valid according to HTTP's field-content production. Maybe the test is wrong?

    opened by domenic 15
  • http response body can't be totally parsed

    http response body can't be totally parsed

    2017-09-21 19:48:16 [thread:10505] DEBUG (line=534, func=on_message_complete): tid:10505 on_message_complete 2017-09-21 19:48:16 [thread:10505] DEBUG file=http_analyzer.c (line=369, func=on_message_begin): tid:10505, 2017-09-21 19:48:16 [thread:10505] DEBUG file=http_analyzer.c (line=371, func=on_message_begin): tid:10505

    Another problem is that http response was partially parsed. The totally bytes of http response is 744. 580 bytes of the http response was parsed. Part of the parsed content is: HTTP/1.1 404 Not Found Server: Tengine Date: Thu, 21 Sep 2017 11:48:16 GMT Content-Type: text/htq

    And part of the un-parsed content :

    Server: lb1
    opened by tianchao-haohan 14
  • Refactor method parsing for clearer and faster code

    Refactor method parsing for clearer and faster code

    I've refactored the code that parses HTTP methods to replace with a single switch the multiple if/else in multiple depth. The code is now much easier to read and faster. In the second commit, the parsing for methods PUT/PATCH is put at the top of the table as those requests are much more common than the other methods in this second level dispatch.

    opened by dolmen 13
  • Do not accept PUN/GEM methods as PUT/GET.

    Do not accept PUN/GEM methods as PUT/GET.

    • Encountering them returns an error, HPE_INVALID_METHOD
    • Tests have been added.

    There's not a clear distinction between what erroneous methods should trigger HPE_INVALID_METHOD vs. HPE_UNKNOWN (there are extant tests for HPE_UNKNOWN against methods like "C*****"), advice would be appreciated. Tested with make test on OSX 10.7.5.

    This fixes joyent/node#6078.

    opened by chrisdickinson 13
  • Version release

    Version release

    A query more than an issue: When is the next planned version release? v1.0 was back on May 11, 2011, and there have been a ton of changes since then.

    opened by scunningham 13
  • Split marco 'HTTP_ERRNO_MAP'

    Split marco 'HTTP_ERRNO_MAP'

    The macro 'HTTP_ERRNO_MAP' has two kinds error (callback-error and parsing-error). when we are coding, we always used the two kinds error, but we can't good for write the code. So, Split marco 'HTTP_ERRNO_MAP' to 'HTTP_ERRNO_CALLBACK_MAP' and 'HTTP_ERRNO_PARSING_MAP' help us for coding.

    opened by ProjectDInitial 0
  • http: unset `F_CHUNKED` on new `Transfer-Encoding` (Fixes CVE-2020-8287)

    http: unset `F_CHUNKED` on new `Transfer-Encoding` (Fixes CVE-2020-8287)

    This change has only been integrated with the bundled version of http-parser in the node sources;

    https://github.com/nodejs/node/commit/fc70ce08f5818a286fb5899a1bc3aff5965a745e

    Can it please be synced here, with a release being made as well?

    opened by jellelicht 0
  • if I put two http request buf, function http_context_parser will crash,why???

    if I put two http request buf, function http_context_parser will crash,why???

    two request once by tcp tools:

    GET /favicon.ico HTTP/1.1 Host: 127.0.0.1 Connection: keep-alive Sec-Fetch-Site: none Sec-Fetch-Mode: no-cors User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36 Accept-Encoding: gzip, deflate, br Accept-Language: zh-CN,zh;q=0.9,en;q=0.8

    GET /favicon.ico HTTP/1.1 Host: 227.0.0.1 Connection: keep-alive Sec-Fetch-Site: none Sec-Fetch-Mode: no-cors User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36 Accept-Encoding: gzip, deflate, br Accept-Language: zh-CN,zh;q=0.9,en;q=0.8

    opened by luckybirdtom 0
  • Test for the size of struct http_parser fails on 32 bit systems where there is padding/alignment for void*

    Test for the size of struct http_parser fails on 32 bit systems where there is padding/alignment for void*

    Bonjour,

    Porting http-parser to AIX, where there are 32 and 64 bit environments, there is a test on sizeof (struct http_parser) which fails due to 8 byte reservation for void* on both 32 and 64 bit compiles.

    A clean option might be to change the void data element of struct http_parser to be a union of uint64 and void.

    But this may not work if the 32 bit size of struct http_parser must remain at a total of 28 bytes (and in which case the test is in correct).

    However, supposing that this is an error in the test code, I provisionally used the following patch for the 32 bit build

    --- ./test.c.ORIG       2020-11-27 22:02:53 +0100
    +++ ./test.c    2020-11-27 22:57:18 +0100
    @@ -4234,7 +4234,7 @@
       printf("http_parser v%u.%u.%u (0x%06lx)\n", major, minor, patch, version);
     
       printf("sizeof(http_parser) = %u\n", (unsigned int)sizeof(http_parser));
    -  assert(sizeof(http_parser) == 4 + 4 + 8 + 2 + 2 + 4 + sizeof(void *));
    +  assert(sizeof(http_parser) == 4 + 4 + 8 + 2 + 2 + 4 + 2*sizeof(void *));
     
       //// API
       test_preserve_data();
    

    We would like to ask for your expert advice on how best to resolve this issue.

    Thanks.

    opened by frmichael 6
Owner
Node.js
Node.js
tiny HTTP parser written in C (used in HTTP::Parser::XS et al.)

PicoHTTPParser Copyright (c) 2009-2014 Kazuho Oku, Tokuhiro Matsuno, Daisuke Murase, Shigeo Mitsunari PicoHTTPParser is a tiny, primitive, fast HTTP r

H2O 1.5k Jun 22, 2022
H2O - the optimized HTTP/1, HTTP/2, HTTP/3 server

H2O - an optimized HTTP server with support for HTTP/1.x, HTTP/2 and HTTP/3 (experimental) Copyright (c) 2014-2019 DeNA Co., Ltd., Kazuho Oku, Tatsuhi

H2O 10k Jun 25, 2022
High-performance Fortran program to calculate polarizability and inverse dielectric response function.

DielectricKit First-principles HPC toolkit for simulating dielectric responses Introduction DielectricKit is a high-performance computing toolkit to c

Meng Wu 4 Feb 2, 2022
Return custom A2S_INFO and A2S_PLAYER response

sm-ext-fakequeries (CSGO) Return fake AS2_INFO and A2S_PLAYER response. Currently only tested on csgo. And I think it can be ported to any other games

null 11 Jun 19, 2022
A forward proxy module for CONNECT request handling

name This module provides support for the CONNECT method request. This method is mainly used to tunnel SSL requests through proxy servers. Table of Co

Xiaochen Wang 1.1k Jun 27, 2022
A collection of C++ HTTP libraries including an easy to use HTTP server.

Proxygen: Facebook's C++ HTTP Libraries This project comprises the core C++ HTTP abstractions used at Facebook. Internally, it is used as the basis fo

Facebook 7.5k Jun 25, 2022
Pushpin is a reverse proxy server written in C++ that makes it easy to implement WebSocket, HTTP streaming, and HTTP long-polling services.

Pushpin is a reverse proxy server written in C++ that makes it easy to implement WebSocket, HTTP streaming, and HTTP long-polling services. The project is unique among realtime push solutions in that it is designed to address the needs of API creators. Pushpin is transparent to clients and integrates easily into an API stack.

Fanout 3.1k Jun 22, 2022
cuehttp is a modern c++ middleware framework for http(http/https)/websocket(ws/wss).

cuehttp 简介 cuehttp是一个使用Modern C++(C++17)编写的跨平台、高性能、易用的HTTP/WebSocket框架。基于中间件模式可以方便、高效、优雅的增加功能。cuehttp基于boost.asio开发,使用picohttpparser进行HTTP协议解析。内部依赖了nl

xcyl 25 Jun 21, 2022
Gromox - Groupware server backend with MAPI/HTTP, RPC/HTTP, IMAP, POP3 and PHP-MAPI support for grommunio

Gromox is the central groupware server component of grommunio. It is capable of serving as a replacement for Microsoft Exchange and compatibles. Conne

grommunio 119 Jun 27, 2022
Lightweight URL & URI parser (RFC 1738, RFC 3986)

Lightweight URL & URI parser (RFC 1738, RFC 3986) (C) Sergey Kosarevsky, 2015-2020 @corporateshark [email protected] http://www.linderdaum.com http://

Sergey Kosarevsky 81 Jun 21, 2022
HTTP and WebSocket built on Boost.Asio in C++11

HTTP and WebSocket built on Boost.Asio in C++11 Branch Linux/OSX Windows Coverage Documentation Matrix master develop Contents Introduction Appearance

Boost.org 3.4k Jun 28, 2022
Cross-platform, efficient, customizable, and robust asynchronous HTTP/WebSocket server C++14 library with the right balance between performance and ease of use

What Is RESTinio? RESTinio is a header-only C++14 library that gives you an embedded HTTP/Websocket server. It is based on standalone version of ASIO

Stiffstream 863 Jun 23, 2022
A C++ header-only HTTP/HTTPS server and client library

cpp-httplib A C++11 single-file header-only cross platform HTTP/HTTPS library. It's extremely easy to setup. Just include the httplib.h file in your c

null 7.2k Jun 24, 2022
Ultra fast and low latency asynchronous socket server & client C++ library with support TCP, SSL, UDP, HTTP, HTTPS, WebSocket protocols and 10K connections problem solution

CppServer Ultra fast and low latency asynchronous socket server & client C++ library with support TCP, SSL, UDP, HTTP, HTTPS, WebSocket protocols and

Ivan Shynkarenka 848 Jun 26, 2022
A modern C++ network library for developing high performance network services in TCP/UDP/HTTP protocols.

evpp Introduction 中文说明 evpp is a modern C++ network library for developing high performance network services using TCP/UDP/HTTP protocols. evpp provid

Qihoo 360 3k Jun 27, 2022
C++ library for creating an embedded Rest HTTP server (and more)

The libhttpserver reference manual Tl;dr libhttpserver is a C++ library for building high performance RESTful web servers. libhttpserver is built upon

Sebastiano Merlino 660 Jun 20, 2022
Mongoose Embedded Web Server Library - a multi-protocol embedded networking library with TCP/UDP, HTTP, WebSocket, MQTT built-in protocols, async DNS resolver, and non-blocking API.

Mongoose - Embedded Web Server / Embedded Networking Library Mongoose is a networking library for C/C++. It implements event-driven non-blocking APIs

Cesanta Software 8.5k Jun 29, 2022
nghttp2 - HTTP/2 C Library and tools

nghttp2 - HTTP/2 C Library This is an implementation of the Hypertext Transfer Protocol version 2 in C. The framing layer of HTTP/2 is implemented as

nghttp2 4k Jun 22, 2022
C library to create simple HTTP servers and Web Applications.

Onion http server library Travis status Coverity status Onion is a C library to create simple HTTP servers and Web Applications. master the developmen

David Moreno Montero 1.8k Jun 21, 2022