Note format

From Infinity Wiki
Jump to: navigation, search
This document is NOT FINAL

Basics

  1. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
  2. Please familiarize yourself with LEB128 encoding.
  3. Note providers are executables or shared libraries containing Infinity notes.
  4. Note consumers are tools that access Infinity notes from note providers.
  5. This document specifies three reasons to reject notes. Note consumers SHOULD differentiate between these in error messages, etc.

Note rejection reasons

  • A CORRUPT note is one that cannot be decoded (either fully or at all) because it does not follow the specification.
  • An UNHANDLED note is one that requires a feature that the note consumer does not implement.
  • An INVALID note is one that is both decodable by and within the capabilities of the note consumer, but it cannot be processed because it is in some other way unusable.

Outer format

Infinity notes are embedded in into executables or shared libraries, so part of the note format depends on the format of the containing file. The only currently supported containing file is ELF. If more formats are supported they should be added here.

For ELF files

Each Infinity note is contained within an ELF note with a note name of "GNU\0" and a note type of NT_GNU_INFINITY. The contents of the desc field are as described in #Inner format below. Each ELF note with a note name of "GNU\0" and a note type of NT_GNU_INFINITY MUST contain exactly one Infinity note.

NT_GNU_INFINITY is currently defined as 8995, though this will change to a low integer when Infinity becomes final.

Some information on ELF notes may be found here: http://www.netbsd.org/docs/kernel/elf-notes.html

Inner format

Each Infinity note is built up from chunks. The format of each chunk is as follows:

uleb128/ur            chunk_type
uleb128               chunk_version
uleb128               chunk_data_size
byte[chunk_data_size] chunk_data
  • Note consumers MUST cope with chunks in any order.
  • Note consumers MUST skip over chunks they don't understand or don't wish to process.
  • Notes SHOULD NOT contain chunks of zero chunk_data_size. Note consumers MUST treat chunks of zero chunk_data_size as if they were not present.
  • If a chunk is truncated then note consumers SHOULD reject the note as CORRUPT.
  • If a note consumer needs to process a particular chunk but the chunk's version is not supported then the consumer SHOULD reject the note as UNHANDLED.
  • If a note does not contain a chunk the consumer requires then the consumer SHOULD reject the note as UNHANDLED.
  • The chunk type has User ranges:
    • Notes MUST NOT contain chunks with chunk_types in user ranges.
    • Note consumers SHOULD reject notes with chunk_types in user ranges as INVALID.

Chunks

Signature chunks

Each note SHOULD have exactly one signature chunk. Signature chunks have a chunk_type of I8_CHUNK_SIGNATURE == 1. The current chunk_version for signature chunks is 2, which has the following mandatory fields:

uleb128 provider_offset
uleb128 name_offset
uleb128 ptypes_offset
uleb128 rtypes_offset

The four offsets reference strings in the note's string table and are used to construct the function's signature. Provider names starting with the string "i8" are reserved.

  • Note consumers SHOULD reject notes without exactly one signature chunk as UNHANDLED.
  • Note consumers SHOULD reject notes as CORRUPT if an signature chunk is present but truncated in the middle of a field.
  • Note consumers SHOULD process the strings referenced by provider_offset, name_offset, ptypes_offset and rtypes_offset as detailed in #Function signatures and #Encoded type lists below.
  • Note consumers MUST ignore any data in signature chunks beyond the fields they understand.
  • Note consumers SHOULD reject notes as UNHANDLED if the chunk does not extend to a field they require.
  • Note consumers SHOULD reject notes as INVALID if the provider specified by the info chunk starts with the string "i8".

Code info chunks

Each note SHOULD have exactly 0 or 1 code info chunks. Code info chunks have a chunk_type of I8_CHUNK_CODEINFO == 5. The current chunk_version for code info chunks is 1, which has the following mandatory fields:

2byte   arch_spec_mark
uleb128 max_stack

arch_spec_mark is an architecture specifier mark specifying the word size and byte ordering of the note's bytecode chunk. max_stack is the maximum stack depth this function's bytecode can generate.

  • Note consumers SHOULD reject notes with more than one code info chunk as UNHANDLED.
  • Notes with no bytecode chunk SHOULD NOT contain a code info chunk.
  • Note consumers MAY reject notes as UNHANDLED if the code info chunk specifies a different word size and/or byte ordering than the note-containing file.

Bytecode chunks

Each note SHOULD have exactly 0 or 1 bytecode chunks. Bytecode chunks have a chunk_type of I8_CHUNK_BYTECODE == 2. The current chunk_version for bytecode chunks is 3. The content of the bytecode chunk is a serialized DWARF expression as described in Infinity bytecode.

  • Note consumers SHOULD reject notes with more than one code chunk as UNHANDLED.
  • Notes with no bytecode chunk MUST be handled as if they have zero length bytecode.

Externals table chunks

Each note SHOULD have exactly 0 or 1 externals table chunks. Externals table chunks have a chunk_type of I8_CHUNK_EXTERNALS == 3. The current chunk_version for externals table chunks is 2. Externals table chunks comprise one or more externals table entries concatenated together.

  • Note consumers SHOULD reject notes with more than one externals table chunk as UNHANDLED.
  • Note consumers SHOULD reject notes as CORRUPT if an externals table chunk is present but truncated in the middle of an entry.

The format of an externals table entry is as follows:

uleb128 provider_offset
uleb128 name_offset
uleb128 ptypes_offset
uleb128 rtypes_offset

The four fields have the same meaning as the fields of the same name in the signature chunk and SHOULD be processed in the same way with the exception that providers starting with "i8" are allowed here.

String table chunks

Each note SHOULD have exactly 0 or 1 string table chunks. String table chunks have a chunk_type of I8_CHUNK_STRINGS == 4. The current chunk_version for string table chunks is 1. The content of a string table chunk is one or more NUL-terminated strings concatenated together. Strings are encoded in Modified UTF-8 format to allow embedded NULs. Fields in other chunks reference strings by their offset from the start of the string table. Note that any offset into the table that yields a NUL-terminated Modified UTF-8 string is permitted, so in a string table chunk whose chunk_data field contains:

example\0string\xC0\x80table\0

The obvious strings are "example" at offset 0 and "string\0table" at offset 8, but note that an offset of 15 will yield the string "table" and offsets of 7 or 20 will both yield the empty string.

  • Note consumers SHOULD reject notes with more than one string table chunk as UNHANDLED.
  • Note consumers SHOULD reject notes that reference strings but do not contain a string table chunk as UNHANDLED.
  • String tables MUST end with a terminal NUL. Note consumers SHOULD reject notes containing string tables that do not end with a terminal NUL as CORRUPT.
  • If a note contains a reference into the string beyond the final NUL in the string table then the note consumer SHOULD reject the note as CORRUPT.

Note that no current use of strings in Infinity allows characters outside of A-Za-z0-9()_ (i.e. nobody needs to write a Modified UTF-8 decoder just yet!)

Architecture specifier marks

An architecture specifier mark encodes a word size and byte ordering. An architecture specifier mark is two bytes with the following meaning:

First byte Second byte Word size Byte ordering
0x18 0x49 32 bits big endian
0x49 0x18 32 bits little endian
0x78 0x29 64 bits big endian
0x29 0x78 64 bits little endian
  • Note consumers SHOULD reject notes as UNHANDLED if the note contains an unknown architecture specifier mark.

Encoded type lists

Lists of Infinity types (e.g. parameter types, return types) are encoded as strings. The basic types int, ptr and opaque are encoded as "i", "p" and "o" respectively. Function types are encoded as:

"F" + encoded return types + "(" + encoded parameter types + ")"

Examples:

  1. A function that accepts one ptr parameter and returns two int values has an encoded type of "Fii(p)".
  2. A function that accepts two parameters, 1) a function with an opaque parameter followed by an int parameter that returns an int and a ptr, and 2) an opaque parameter, that returns two ptr values has an encoded type of "Fpp(Fip(oi)o)".
  3. A function that accepts one int parameter and returns a function that accepts one ptr parameter and returns two int values has an encoded type of "FFii(p)(i)".

Processing:

  • Note consumers SHOULD reject notes containing encoded types lists containing characters other than ipoF() as UNHANDLED.
  • Note consumers SHOULD reject notes containing otherwise undecodable encoded types lists as CORRUPT.
  • Note consumers SHOULD NOT attempt to handle notes with invalid encoded types lists in any way other than trivial things like displaying information.

Function signatures

Infinity functions are referenced by their signature. A function's signature is constructed from its provider, its name, and its encoded parameter and return types lists as follows:

 provider + "::" + name + "(" + encoded_paramtypes + ")" + encoded_returntypes

A function called "a_function" with provider "example_provider" that accepts one ptr parameter and returns two int values has a signature of "example_provider::a_function(p)ii".

  • Note consumers SHOULD reject notes as UNHANDLED if either provider or name are the empty string.
  • Note consumers SHOULD reject notes as UNHANDLED if the first characters of either provider or name are not in the range A-Za-z_.
  • Note consumers SHOULD reject notes as UNHANDLED if any subsequent characters of either provider or name are not in the range A-Za-z0-9_.
  • Note consumers SHOULD NOT attempt to handle notes with invalid values of provider or name in any way other than trivial things like displaying information.
  • Note consumers SHOULD process both encoded_paramtypes and encoded_returntypes as detailed in #Encoded type lists above.