Avatar

LIEF - Library to Instrument Executable Formats

ionicons-v5-k Romain Thomas April 18, 2017
Wave

Tl;DR

LIEF is a library to parse and manipulate ELF, PE and Mach-O formats. Source code is available on GitHub and use cases are here.

Executable File Formats in a Nutshell

When dealing with executable files, the first layer of information is the format in which the code is wrapped. We can see an executable file format as an envelope. It contains information so that the postman (i.e. Operating System) can handle and deliver (i.e. execute) it. The message wrapped by this envelope would be the machine code.

There are mainly three mainstream formats, one per OS:

  • Portable Executable (PE) for Windows systems
  • Executable and Linkable Format (ELF) for UN*X systems (Linux, Android…).
  • Mach-O for OS-X, iOS…

Other executable file formats, such as COFF, exist but they are less relevant.

Usually each format has a header which describes at least the target architecture, the program’s entry point and the type of the wrapped object (executable, library…) Then we have blocks of data that will be mapped by the OS’s loader. These blocks of data could hold machine code (.text), read-only data (.rodata) or other OS specific information.

For PE there is only one kind of such block: Section. For ELF and Mach-O formats, a section has a different meaning. In these formats, sections are used by the linker at the compilation step, whereas segments (second type of block) are used by the OS’s loader at execution step. Thus sections are not mandatory for ELF and Mach-O formats and can be removed without affecting the execution.

Purpose of LIEF

It turns out that many projects need to parse executable file formats but don’t use a standard library and re-implement their own parser (and the wheel). Moreover, these parsers are usually bound to one language.

On Unix system one can find the objdump and objcopy utilities but they are limited to Unix and the API is not user-friendly.

The purpose of LIEF is to fill this void:

  • Providing a cross platform library which can parse and modify (in a certain extent) ELF, PE and Mach-O formats using a common abstraction
  • Providing an API for different languages (Python, C++, C…)
  • Abstract common features from the different formats (Section, header, entry point, symbols…)

The following snippets show how to obtain information about an executable using different API of LIEF:

 1import lief
 2# ELF
 3binary = lief.parse("/usr/bin/ls")
 4print(binary)
 5
 6# PE
 7binary = lief.parse("C:\\Windows\\explorer.exe")
 8print(binary)
 9
10# Mach-O
11binary = lief.parse("/usr/bin/ls")
12print(binary)

With the C++ API:

 1#include <LIEF/LIEF.hpp>
 2int main(int argc, const char** argv) {
 3  LIEF::ELF::Binary*   elf   = LIEF::ELF::Parser::parse("/usr/bin/ls");
 4  LIEF::PE::Binary*    pe    = LIEF::PE::Parser::parse("C:\\Windows\\explorer.exe");
 5  LIEF::MachO::Binary* macho = LIEF::MachO::Parser::parse("/usr/bin/ls");
 6
 7  std::cout << *elf   << std::endl;
 8  std::cout << *pe    << std::endl;
 9  std::cout << *macho << std::endl;
10
11  delete elf;
12  delete pe;
13  delete macho;
14}

And finally with the C API:

 1#include <LIEF/LIEF.h>
 2int main(int argc, const char** argv) {
 3
 4  Elf_Binary_t*    elf_binary     = elf_parse("/usr/bin/ls");
 5  Pe_Binary_t*     pe_binary      = pe_parse("C:\\Windows\\explorer.exe");
 6  Macho_Binary_t** macho_binaries = macho_parse("/usr/bin/ls");
 7
 8  Pe_Section_t**    pe_sections    = pe_binary->sections;
 9  Elf_Section_t**   elf_sections   = elf_binary->sections;
10  Macho_Section_t** macho_sections = macho_binaries[0]->sections;
11
12  for (size_t i = 0; pe_sections[i] != NULL; ++i) {
13    printf("%s\n", pe_sections[i]->name)
14  }
15
16  for (size_t i = 0; elf_sections[i] != NULL; ++i) {
17    printf("%s\n", elf_sections[i]->name)
18  }
19
20  for (size_t i = 0; macho_sections[i] != NULL; ++i) {
21    printf("%s\n", macho_sections[i]->name)
22  }
23
24  elf_binary_destroy(elf_binary);
25  pe_binary_destroy(pe_binary);
26  macho_binaries_destroy(macho_binaries);
27}

LIEF supports FAT-MachO and one can iterate over binaries as follows:

1import lief
2binaries = lief.MachO.parse("/usr/lib/libc++abi.dylib")
3for binary in binaries:
4  print(binary)

Note

The above script uses the lief.MachO.parse function instead of the lief.parse function because lief.parse returns a single lief.MachO.binary object whereas lief.MachO.parse returns a list of lief.MachO.binary (according to the FAT-MachO format).

Along with standard format components like headers, sections, import table, load commands, symbols, etc. LIEF is also able to parse PE Authenticode:

1import lief
2driver = lief.parse("driver.sys")
3
4for crt in driver.signature.certificates:
5  print(crt)
1Version:             3
2Serial Number:       61:07:02:dc:00:00:00:00:00:0b
3Signature Algorithm: SHA1_WITH_RSA_ENCRYPTION
4Valid from:          2005-9-15 21:55:41
5Valid to:            2016-3-15 22:5:41
6Issuer:              DC=com, DC=microsoft, CN=Microsoft Root Certificate Authority
7Subject:             C=US, ST=Washington, L=Redmond, O=Microsoft Corporation, CN=Microsoft Windows Verification PCA
8...

Full API documentation is available here

Architecture

In the LIEF architecture, each format implements at least the following classes:

  • Parser: Parse the format and decompose it into a Binary class
  • Binary: Modelize the format and provide an API to modify and explore it.
  • Builder: Transform the binary object into a valid file.
Architecture

To factor common characteristics in formats we have an inheritance relationship between these characteristics.

For symbols it gives the following diagram:

LIEF Symbol Inheritance

It enables to write cross-format utility like nm. nm is a Unix utility to list symbols in an executable. The source code is available here: binutils

With the given inheritance relationship one can write this utility for the three formats in a single script:

 1import lief
 2import sys
 3
 4def nm(binary):
 5  for symbol in binary.symbols:
 6    print(symbol)
 7
 8  return 0
 9
10if __name__ == "__main__":
11  r = nm(sys.argv[1])
12  sys.exit(r)

Conclusion

As LIEF is still a young project we hope to have feedback, ideas, suggestions and pull requests.

The source code is available here: https://github.com/lief-project (under Apache 2.0 license) and the associated website: http://lief.quarkslab.com

If you are interested in use cases, you can take a look at these tutorials:

The project will be presented at the Third French Japanese Meeting on Cybersecurity

Contact

Thanks

Thanks to Serge Guelton and Adrien Guinet for their advice about the design and their code review. Thanks to Quarkslab for making this project open-source.

Avatar
Romain Thomas Posted on April 18, 2017