Efficient storage of code object debug information

### Discussed in https://github.com/faster-cpython/ideas/discussions/324

<div type='discussions-op-text'>

<sup>Originally posted by **markshannon** March 18, 2022</sup>
Code objects include a lot of information, much of it only used for tracing and debugging of various sorts.
It would be good if that information used less space when we aren't using it.

Classifying the fields of a code object we have (ignoring computed attributes and int fields):

#### Needed for normal execution:
* co_code : The bytecode
* co_consts: Constants
* co_names: Names of attributes and global variables.
* co_exceptiontable: The exception handler table.
* co_localsplusnames: Parameter names.

#### Debug info, all bytes objects:
* co_localspluskinds: Describes the "kind" of locals, used for signatures and `locals()`
* co_linetable: Line information
* co_endlinetable: Ditto, for fancy traceback printing
* co_columntable: More stuff for fancy traceback printing

Currently this information uses about 8 bytes per instruction, in large part due to having column table entries for each cache entry.

### A single table

I would like to replace the four debug objects with a single bytes object with the following format:
[0]  0 if compact, 1 if expanded.
[1 - 3] len(co_localspluskinds) (16 million max local variables seems sufficient)
[4 - 4+len] co_localspluskinds
[...]  Location info in compact or expanded form.

The expanded form should be an easily searchable table; one fixed-width entry per instruction.
* Offset of instruction start (including preceding `EXTENDED_ARG`s)
* Instruction length (including `EXTENDED_ARG`s prefix and inline cache)
* Resumption point (where we currently have `RESUME`). 1 bit
* Line start, for tracing line events. 1 bit.
* Start line
* Start column
* End line
* End column
 
I think we can impose some reasonable limits on the size of the fields, even in the expanded form.

E.g. We know the instruction length cannot exceed 10 or so, so we can fit it into 4 bits. It might make sense, then, to limit the start offset to 28 bits and fit both into 32 bits.
 
Then we can fit the whole entry into 96 bits:
* Offset start and length: 32 bits
* Start line (30 bits), resumption and line-start flags: 32 bits:
* Start column(8 bits), End line delta (16 bits), End column (8 bits). (or 10/12/10 bits?)

This should save memory, as most tables would remain in their compact form, and it saves 3 pointers and 3 bytes object headers per code object.
It should also speed up tracing and debugging, as once expanded, the table is quickly searchable.

</div>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Efficient storage of code object debug information #354

Discussed in #324

Needed for normal execution:

Debug info, all bytes objects:

A single table

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Efficient storage of code object debug information #354

Description

Discussed in #324

Needed for normal execution:

Debug info, all bytes objects:

A single table

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions