iPhone Mach-O文件格式与代码签名

错误现象
1) 直接运行
/Applications/MobileFonex.app/MobileFonex
Killed: 9

2)gdb调试
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 50 at address: 0x00043030

背景知识

所有的可执行文件,库文件都需要Apple签名才可以运行在iOS中
内核会在调用execve之前检测Mach-O文件中的LC_CODE_SIGNATURE段是否有效和可信任

iOS内核以及内核扩展 都以 加密的形式 存储在KernelCache文件中
可以在 the iphone wiki上查找解密的key

一般来说,越狱后,Codesigning是禁止的, 可执行的代码页 是可写的

iPhone 上的每个二进制文件都有数字签名
每个内存页都有sha1校验
签名由内核进行检查
访问 标记为可执行的页面时 执行检查
如果签名不合法,进程被内核杀掉

如何检查二进制文件的数字签名

otool -l debugserver | grep LC_CODE_SIGNATURE
         cmd LC_CODE_SIGNATURE

或者
grep -b “Apple Code Signing Certification Authority” debugserver

用codesign能显示更详细的签名信息

codesign -dvvvv debugserver
Executable=/private/tmp/x/debugserver
Identifier=debugserver
Format=Mach-O thin (armv7)
CodeDirectory v=20100 size=900 flags=0x0(none) hashes=37+5 location=embedded
CDHash=577b90c4091310762cbd2350e4e32ee919deccf2
Signature size=1599
Authority=iPhone Developer
Signed Time=Feb 9, 2012 11:29:10 PM
Info.plist=not bound
Sealed Resources=none
Internal requirements count=1 size=112

参考资料
https://developer.apple.com/library/mac/#technotes/tn2206/_index.html

Mach-O文件格式

Mach-O 文件分为三个区域: 头部、载入命令区Section和原始段数据.
头部和载入命令区描述文件功能、布局和其他特性;
原始段数据包含由载入命令引用的字节序列。

otools -l 可以查看
Section
sectname __const
segname __TEXT
addr 0x00043908
size 0x000002fc
offset 272648
align 2^2 (4)
reloff 0
nreloc 0
flags 0x00000000
reserved1 0
reserved2 0

Load command 7
cmd LC_UUID
cmdsize 24
uuid 5202B55F-F094-2350-9B4C-8E2C83358B11

Load command 8
cmd LC_UNIXTHREAD
cmdsize 84
flavor ARM_THREAD_STATE
count ARM_THREAD_STATE_COUNT
r0 0x00000000 r1 0x00000000 r2 0x00000000 r3 0x00000000
r4 0x00000000 r5 0x00000000 r6 0x00000000 r7 0x00000000
r8 0x00000000 r9 0x00000000 r10 0x00000000 r11 0x00000000
r12 0x00000000 sp 0x00000000 lr 0x00000000 pc 0x00002000
cpsr 0x00000000

可以看到 start 的地址是 0x2000

Load command 9
cmd LC_ENCRYPTION_INFO
cmdsize 20
cryptoff 4096
cryptsize 270336
cryptid 0

Load command 23
cmd LC_CODE_SIGNATURE
cmdsize 16
dataoff 590896
datasize 3040

注意到 LC_ENCRYPTION_INFO的cryptid是0,表示没有加密

http://opensource.apple.com/source/security_systemkeychain/security_systemkeychain-55105/src/cs_dump.cpp
// if the code is not signed, stop here
if (!api.get(kSecCodeInfoIdentifier))
MacOSError::throwMe(errSecCSUnsigned);

https://developer.apple.com/library/mac/#documentation/DeveloperTools/Conceptual/MachORuntime/Reference/reference.html

目标格式
存储目标代码和相关的元数据的文件格式
在内存中建立进程镜像的蓝本
通常由编译器或者汇编器产生

布局
3个主要部分:header, load comands和sections
每个segemtn comand同多个section相关联

header结构可以在 /usr/include/mach-o/loader.h 找到

/*
 * The 32-bit mach header appears at the very beginning of the object file for
 * 32-bit architectures.
 */
struct mach_header {
	uint32_t	magic;		/* mach magic number identifier */
	cpu_type_t	cputype;	/* cpu specifier */
	cpu_subtype_t	cpusubtype;	/* machine specifier */
	uint32_t	filetype;	/* type of file */
	uint32_t	ncmds;		/* number of load commands */
	uint32_t	sizeofcmds;	/* the size of all the load commands */
	uint32_t	flags;		/* flags */
};

可以用 otool -h 命令查看

load commmand直接跟在 header 部分的后面,结构定义如下

/*
 * The load commands directly follow the mach_header.  The total size of all
 * of the commands is given by the sizeofcmds field in the mach_header.  All
 * load commands must have as their first two fields cmd and cmdsize.  The cmd
 * field is filled in with a constant for that command type.  Each command type
 * has a structure specifically for it.  The cmdsize field is the size in bytes
 * of the particular load command structure plus anything that follows it that
 * is a part of the load command (i.e. section structures, strings, etc.).  To
 * advance to the next load command the cmdsize can be added to the offset or
 * pointer of the current load command.  The cmdsize for 32-bit architectures
 * MUST be a multiple of 4 bytes and for 64-bit architectures MUST be a multiple
 * of 8 bytes (these are forever the maximum alignment of any load commands).
 * The padded bytes must be zero.  All tables in the object file must also
 * follow these rules so the file can be memory mapped.  Otherwise the pointers
 * to these tables will not work well or at all on some machines.  With all
 * padding zeroed like objects will compare byte for byte.
 */
struct load_command {
	uint32_t cmd;		/* type of load command */
	uint32_t cmdsize;	/* total size of command in bytes */
};

一个查看Mach-O结构的工具  MachOView

http://sourceforge.net/projects/machoview/files/current/

第3部分是segments, 每个segment含有0个到多个sections
每个section都含有数据或代码
每个secgment都定义了一个虚拟内存的区域, 动态连接器 把这个区域映射到进程的地址空间。

在用户级的完全链接后的Mach-O 文件中,最后一个segement是__LINKEDIT段。
这个segment含有 link edit 信息的表,比如符号表,字符串表,等。

segment通过指定一个in-memory size就可以在运行时要求比实际磁盘中更多的大小
连接器生成的__PAGEZERO段, 有一个虚拟内存大小,而在磁盘上的空间是0. 因为__PAGEZERO不包含数据,所以不需要占用任何磁盘空间

静态链接器 会生成一个 __PAGEZERO段 作为可执行文件的第1个段。这个段位于虚拟内存的0地址, 并没有保护权限设置。这样就可保证程序访问NULL指针时,会立刻crash. 该段的大小是一个 当前的体系结构的完整的虚拟内存页面的大小(对于arm,intel 和powerpc都是4096个字节)

__TEXT段 含有可执行代码和只读的数据。 为了让内核将它 直接从可执行文件映射到共享内存, 静态连接器设置该段的虚拟内存权限为不允许写。当这个段被映射到内存后,可以被所有进程共享。(这主要用在frameworks, bundles和共享库等程序中,也可以为同一个可执行文件的多个进程拷贝使用)

__LINKEDIT段 含有为动态链接库使用的原始数据,比如符号,字符串,重定位表条目等等。

节/Section

一个段可能含有多个节。
__TEXT, __text 可执行机器码
__TEXT, __cstring 常量C字符串

/*
 * The segment load command indicates that a part of this file is to be
 * mapped into the task's address space.  The size of this segment in memory,
 * vmsize, maybe equal to or larger than the amount to map from this file,
 * filesize.  The file is mapped starting at fileoff to the beginning of
 * the segment in memory, vmaddr.  The rest of the memory of the segment,
 * if any, is allocated zero fill on demand.  The segment's maximum virtual
 * memory protection and initial virtual memory protection are specified
 * by the maxprot and initprot fields.  If the segment has sections then the
 * section structures directly follow the segment command and their size is
 * reflected in cmdsize.
 */
struct segment_command { /* for 32-bit architectures */
	uint32_t	cmd;		/* LC_SEGMENT */
	uint32_t	cmdsize;	/* includes sizeof section structs */
	char		segname[16];	/* segment name */
	uint32_t	vmaddr;		/* memory address of this segment */
	uint32_t	vmsize;		/* memory size of this segment */
	uint32_t	fileoff;	/* file offset of this segment */
	uint32_t	filesize;	/* amount to map from the file */
	vm_prot_t	maxprot;	/* maximum VM protection */
	vm_prot_t	initprot;	/* initial VM protection */
	uint32_t	nsects;		/* number of sections in segment */
	uint32_t	flags;		/* flags */
};

要注意segement的cmdsize要包括它拥有的所有的section的结构的大小

fileoff 是从文件偏移的哪里开始映射到 vmaddr
filesize可以比 vmsize小

更在segment_command后面的就是 它所用于的 section数据结构的数组

/*
 * A segment is made up of zero or more sections.  Non-MH_OBJECT files have
 * all of their segments with the proper sections in each, and padded to the
 * specified segment alignment when produced by the link editor.  The first
 * segment of a MH_EXECUTE and MH_FVMLIB format file contains the mach_header
 * and load commands of the object file before its first section.  The zero
 * fill sections are always last in their segment (in all formats).  This
 * allows the zeroed segment padding to be mapped into memory where zero fill
 * sections might be. The gigabyte zero fill sections, those with the section
 * type S_GB_ZEROFILL, can only be in a segment with sections of this type.
 * These segments are then placed after all other segments.
 *
 * The MH_OBJECT format has all of its sections in one segment for
 * compactness.  There is no padding to a specified segment boundary and the
 * mach_header and load commands are not part of the segment.
 *
 * Sections with the same section name, sectname, going into the same segment,
 * segname, are combined by the link editor.  The resulting section is aligned
 * to the maximum alignment of the combined sections and is the new section's
 * alignment.  The combined sections are aligned to their original alignment in
 * the combined section.  Any padded bytes to get the specified alignment are
 * zeroed.
 *
 * The format of the relocation entries referenced by the reloff and nreloc
 * fields of the section structure for mach object files is described in the
 * header file .
 */
struct section { /* for 32-bit architectures */
	char		sectname[16];	/* name of this section */
	char		segname[16];	/* segment this section goes in */
	uint32_t	addr;		/* memory address of this section */
	uint32_t	size;		/* size in bytes of this section */
	uint32_t	offset;		/* file offset of this section */
	uint32_t	align;		/* section alignment (power of 2) */
	uint32_t	reloff;		/* file offset of relocation entries */
	uint32_t	nreloc;		/* number of relocation entries */
	uint32_t	flags;		/* flags (section type and attributes)*/
	uint32_t	reserved1;	/* reserved (for offset or index) */
	uint32_t	reserved2;	/* reserved (for count or sizeof) */
};

其中, addr 指定这个section在虚拟内存中的地址
offset 制定这个section在可执行文件中 的偏移
reloff 第1个重定位条目在文件中的偏移

/*
 * The linkedit_data_command contains the offsets and sizes of a blob
 * of data in the __LINKEDIT segment.
 */
struct linkedit_data_command {
    uint32_t	cmd;		/* LC_CODE_SIGNATURE or LC_SEGMENT_SPLIT_INFO */
    uint32_t	cmdsize;	/* sizeof(struct linkedit_data_command) */
    uint32_t	dataoff;	/* file offset of data in __LINKEDIT segment */
    uint32_t	datasize;	/* file size of data in __LINKEDIT segment  */
};

可以用 pagestuff mybinary -a 来查看

File Page 144 contains data of code signature
File Page 145 contains data of code signature

签名动作会修改可执行文件
对一个程序进行签名,会修改它的主执行文件。
1)如果你的程序有一个自验证模式,检查到文件被改变,那么你的代码会拒绝运行
2)如果在签名前附加了数据到 可执行文件上,那么签名的过程中,这些附加数据可能会被删除,或者放到其他位置
如果签名后,再次修改可执行文件,或者bundle它们, 代码签名验证引擎 会觉察到这些改变,并做适当的动作。

https://bitbucket.org/ronaldoussoren/macholib
macholib can be used to analyze and edit Mach-O headers, the executable
format used by Mac OS X.

签名后的变化
header部分 load cmd的数目加1, load cmd的size 加 16

__LINKEDIT的load command有变化, vm size 和 file size都增加了

然后在 load command 数组的最后添加了一个
cmd LC_CODE_SIGNATURE
cmdsize 16
dataoff 4176
datasize 5184

https://zhiwei.li/text/2012/02/15/iphone-mach-o文件格式与代码签名/

原文地址:https://www.cnblogs.com/feng9exe/p/8258910.html