Making Type Info Library (TIL) file for Apache modules

Overview

Creating TIL files for IDA

Intro

Creating a Type Information Library makes it easier to reverse engineer binaries by providing IDA with detailed and acurate information about types.

Types include:

  • function prototypes
  • structures
  • enums

The main point is that IDA will apply function prototypes to the imports and include the relevant data types in the database.

Creating a TIL file for Apache

As an example, we will create a TIL file which can help reversing Apache modules.

Everything here will be done on a Debian Sid, amd64 from March 2021, but most of it will work on most Linux distros.

Prerequisites

We need the source code of the libraries we want to analyze. My target used Apache 2.2, so let's fetch it:

wget https://archive.apache.org/dist/httpd/httpd-2.2.34.tar.bz2
wget https://archive.apache.org/dist/httpd/httpd-2.2.34.tar.bz2.asc 
curl https://downloads.apache.org/httpd/KEYS | gpg2 --import
gpg2 --verify httpd-2.2.34.tar.bz2.asc httpd-2.2.34.tar.bz2 

The archive contains things we want to include in our TIL:

  • the headers for writing modules
  • the Apache Runtime (apr) lib

First, we need to do a ./configure to have the right headers generated. Of course, this phase will need to reflect the configuration that was used by your target.

In my case, the binary was compiled with GCC: (GNU) 3.2.3 20030502 (Red Hat Linux 3.2.3-56), which is ancient. But in theory, there should not be real differences in ABI between a recent and old GCC compiler on Linux amd64, so let's proceed anyway.

TIL

Compiler config

First, we need to get the right configuration for the compiler options in tilib: depending on the architecture and target ABI, the structures padding, type sizes, etc. will vary.

This is the "documentation":

$ ./tilib -C?
  -C... specifies the compiler information
  It has the -Cx# form, where # - value, x is one of the following:
  c-compiler id, m-model, p-sizeof(near*), g-defalign (0/1/2/4/8/6 for16)
  b-sizeof(bool), e-sizeof(enum), i-sizeof(int), s-sizeof(short)
  l-sizeof(long), L-sizeof(longlong), R-explicit stack offsets
  v-calling convention, B-bitness (3 for 32 or 6 for 64), D-sizeof(long double)
  8-4 byte alignment for 8byte scalars (__int64/double) inside structures (y/n)
  a-shorthand for cmpgbeislLvB8. The default is us40144248i3n
Compiler ids:        Pointer sizes:
  0 or u: Unknown          1: sizeof(near*)=1, sizeof(far*)=2
  1 or v: Visual C++       2: sizeof(near*)=2, sizeof(far*)=4
  2 or b: Borland C++      4: sizeof(near*)=4, sizeof(far*)=6
  3 or w: Watcom C++       8: sizeof(near*)=8, sizeof(far*)=8
  6 or g: GNU C++         Memory models:
  7 or a: Visual Age C++   s: small   (code=near, data=near)
  8 or d: Delphi           l: large   (code=far, data=far)
                           c: compact (code=near, data=far)
                           m: medium  (code=far, data=near)
Calling conventions:
  i: invalid    s: stdcall      u: unknown (default)
  v: void       p: pascall
  c: cdecl      r: fastcall
  e: (...)      t: thiscall
For example, BCC small model v3.1: -Cabs2122224
             GNU C++:              -Cags44444248u

As you can see, -C is difficult to master. Here's how to read the -Cags44444 which you can find in tilib's gcc.cfg:

; from GCC 32 config:
; -Cags44444
; cmpgbeislLvB8 (expansion for for "Ca")
; us40144248i3n (default)
; gs44444
; |||||||||||||_ 8bytes scalars alignment
; ||||||||||||__ bitness
; |||||||||||___ calling convention
; ||||||||||____ sizeof(longlong)
; |||||||||_____ sizeof(long)    : 
; ||||||||______ sizeof(short) : 4
; |||||||_______ sizeof(int)   : 4
; ||||||________ sizeof(enum)  : 4
; |||||_________ sizeof(bool)  : 4
; ||||__________ defalign: 4
; |||___________ pointer size: 4
; ||____________ mem model: small
; |_____________ compiler: gcc

Creating our own config

  • Use sizes.c
  • cp gcc.cfg gcc64.cfg
  • Update gcc64.cfg

Note: the (updated) gcc64.cfg was provided by Igor Skochinsky from Hex-Rays, I just added the comments.

Building TIL steps

First we need to make a top level header which includes everything: apache_all.h.

Then, we will preprocess it using gcc -E to preprocess everything and facilitate the ingestion by tilib.

Then we begin the loop of fixing errors and warnings.

The most important hacks are:

  • Adding #define __asm__(arg) to our apache_all.h file, to "nop" inline asm
  • Adding -D__extension__= \ to the tilib call, which will "nop" the unsupported __extension__ keyword
  • Adding "-D__builtin_va_list=void *" which will work around the need for the internal definition of va_list
  • Add -D__UNKNOWN_ATTR__=UNKNOWN_ATTR in gcc64.cfg

Of course the command line options could be included in the .cfg file.

See make_til.sh for the final result.

Fixing "opaque" structures

Identify which structures have no "size" in the .til file:

$ tilib  -l apache22-debian64.til  | grep "FFFFFFFF struct"
[...]
FFFFFFFF struct ap_conf_vector_t;
FFFFFFFF struct ap_filter_provider_t;
FFFFFFFF struct apr_allocator_t;
FFFFFFFF struct apr_bucket_alloc_t;
[...]

some are opaque by "design", such as ap_conf_vector_t, others should be added in the apache_all.h file by copy pasting.

Result

The TIL file should be put inside til/pc in IDA dir to be discovered.

After loading the TIL file (Shift-F11, Insert), and defining the module export as module, note how all the Apache related imports are now in bold, with their types defined: Before / After

You might also like...
LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism

LineFS repository for the Artifact Evaluation of SOSP 2021 1. System requirements (Tested environment) 1.1. Hardware requirements 1.1.1. Host machine

CubbyDNN - Deep learning framework using C++17 in a single header file
CubbyDNN - Deep learning framework using C++17 in a single header file

CubbyDNN CubbyDNN is C++17 implementation of deep learning. It is suitable for deep learning on limited computational resource, embedded systems and I

The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs based on CUDA.

dgSPARSE Library Introdution The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs bas

C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library

Build Status Travis CI VM: Linux x64: Raspberry Pi 3: Jetson TX2: Backstory I set to build ccv with a minimalism inspiration. That was back in 2010, o

 Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

The Robotics Library (RL) is a self-contained C++ library for rigid body kinematics and dynamics, motion planning, and control.

Robotics Library The Robotics Library (RL) is a self-contained C++ library for rigid body kinematics and dynamics, motion planning, and control. It co

A GPU (CUDA) based Artificial Neural Network library
A GPU (CUDA) based Artificial Neural Network library

Updates - 05/10/2017: Added a new example The program "image_generator" is located in the "/src/examples" subdirectory and was submitted by Ben Bogart

Header-only library for using Keras models in C++.
Header-only library for using Keras models in C++.

frugally-deep Use Keras models in C++ with ease Table of contents Introduction Usage Performance Requirements and Installation FAQ Introduction Would

simple neural network library in ANSI C
simple neural network library in ANSI C

Genann Genann is a minimal, well-tested library for training and using feedforward artificial neural networks (ANN) in C. Its primary focus is on bein

Owner
Raphaël Rigo
Raphaël Rigo
A meme system info tool for Linux, based on nyan/uwu trend on r/linuxmasterrace.

UwUFetch A meme system info tool for Linux, based on nyan/UwU trend on r/linuxmasterrace. Currently supported distros Nyarch Linuwu, Nyartix Linuwu, D

TheDarkBug 484 Jan 3, 2023
International Business Machines 10 Dec 20, 2022
TensorRT implementation of RepVGG models from RepVGG: Making VGG-style ConvNets Great Again

RepVGG RepVGG models from "RepVGG: Making VGG-style ConvNets Great Again" https://arxiv.org/pdf/2101.03697.pdf For the Pytorch implementation, you can

weiwei zhou 69 Sep 10, 2022
Plaidml - PlaidML is a framework for making deep learning work everywhere.

A platform for making deep learning work everywhere. Documentation | Installation Instructions | Building PlaidML | Contributing | Troubleshooting | R

PlaidML 4.5k Jan 7, 2023
C++ library for getting full ROS message definition or MD5 sum given message type as string

rosmsg_cpp C++ library for getting full message definition, MD5 sum and more given just the message type as string. This package provides both C++ lib

Vision for Robotics and Autonomous Systems 3 Jan 5, 2022
C++20 modules examples

C++20 Modules Examples This repository contains a number of examples that demonstrate various C++20 modules features and their support in build2. For

The build2 project 43 Jan 9, 2023
Open source modules to interface Metavision Intelligence Suite with event-based vision hardware equipment

Metavision: installation from source This page describes how to compile and install the OpenEB codebase. For more information, refer to our online doc

PROPHESEE 106 Dec 27, 2022
UE4 Plugin to execute trained PyTorch modules

SimplePyTorch UE4 Plugin to execute trained PyTorch modules ------- Packaging ------- Download PyTorch C++ distributions: https://pytorch.org/cppdocs/

null 50 Dec 6, 2022
Gigaleak | Import HMS file to GEO file for sm64 decomp

Convert HMS to GEO This is a conventer HMS to GEO for Super Mario 64. Requires SM64 decomp and a knowledge of how levels work. NOTE: This is super eas

Swip 2 Dec 26, 2021
A software pipeline to decode the Falcon 9 telemetry from the 6MS/s baseband file.

falcon9_pipeline A software pipeline to decode the Falcon 9 telemetry from the 6MS/s baseband file. This is a work in progress, and you need to source

Mike Field 12 May 13, 2021