Format String Vulnerability: What is under the hood of printf?

Intro

What is under the hood of printf in glibc? WHERE and HOW does printf get arguments if only the format string provided, i.e. format string vulnerability? I’d like to share my findings and open up a discuss on it, as this question has puzzled me for a long time.

Leave a comment for me if you have further thoughts!🤠

Printf and Variable arguments

Printf accepts variable arguments which are indexed in a structure called ap_list . When printf calling to vs_printf which later calls to vsprintf_internal, the ap_list address is one of vsprintf_internal’s argument. You can set a breakpoint at vsprintf_internal and execute info args in gdb to print out the ap_list pointer value.

image.png

  • s: The output stream (stdout in this case).
  • format: The format string used in printf.
  • ap: The va_list containing the variable arguments.
  • mode_flags: Additional flags (not critical for our purpose).

We know that ap_list is a structure that maintains both register argument list and stack arguments list that are candidates for format string specifiers.

The register argument list contains copy of register arguments before they are later used by other purposes.

Whenever a format specifier(like %s) is encountered in the format string, vsprintf_internal will consume an argument from either of the list maintained in the ap_list .

1
2
3
4
5
6
7
8
9
10
11
typedef struct {
unsigned int gp_offset;
unsigned int fp_offset;
void* overflow_arg_area; // store the pointer to **stack arguments list**
void* reg_save_area; // store the pointer to r**egister argument list**
} __va_list_tag[1];

gp_offset: Offset for general-purpose registers.
fp_offset: Offset for floating-point registers.
overflow_arg_area: Pointer to the stack area where additional arguments are stored.
reg_save_area: Pointer to the saved area of register arguments.

We can inspect ap_list structure and both lists by below when paused at the vsprintf_internal:

1
x/16gx $ap

image.png

1
2
x/16gx 0x7fffffffd9f0 // stack arguments list
x/16gx 0x7fffffffd930 //saved register arguments list

image.png

And if you are trying to conduct a leak attack, like printf("%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p") , it will output

1
0x555555559830.0x555555559830.(nil).(nil).0x1.0x555555559830.(nil).0x2e70252e70252e70.(nil).0x7fffffffda80.0x555555555b1a.0x5555555592a0.0xa00000000.(nil).(nil).(nil).

You will see the leaked memory values are from both the stack arguments and register arguments list stored in different memory locations.

image.png

And if you are curious why the first register argument 0x555555557dd8 is not leaked, don’t forget the format string is also an argument and yes, this address is where the format string is stored at and it had already been consumed by the printf so no leak.

Reference

  • Wenliang Du

  • ChatGPT