Conversation
|
@nanosoldier |
|
@nanosoldier |
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
|
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
|
@nanosoldier |
|
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
|
The IOError thing is a Pkg bug (which might hide some other real bug). Edit: Actually, a rebase should probably update to a Pkg where it is fixed. |
vtjnash
left a comment
There was a problem hiding this comment.
Some minor comments from looking this over with you, but it seems to look good to me as a whole. Really solid work, Diogo.
| size_t elsize = ((jl_array_t *)ary8_parent)->elsize / sizeof(jl_value_t *); | ||
| #ifndef GC_VERIFY | ||
| // Decide whether need to chunk ary8 | ||
| size_t nrefs = (ary8_end - ary8_begin) / elsize; |
There was a problem hiding this comment.
Should we multiply by * (elem_end - elem_begin) for the number of pointers in each element?
| gc_cache->data_stack = (jl_gc_mark_data_t *)malloc_s(init_size * sizeof(jl_gc_mark_data_t)); | ||
|
|
||
| // Initialize GC mark-queue | ||
| size_t init_size = (1 << 18); |
There was a problem hiding this comment.
2 MB seems rather large (though 8k does seem a little small)
There was a problem hiding this comment.
I think the chunk-queue shouldn't be large (the number of chunks grows with the depth of graphs of objarrays).
There was a problem hiding this comment.
This is over 2 MB per thread though, which seems large
f594a8d to
e82cd9d
Compare
|
@vtjnash any idea of what could be causing the error in |
|
Try annotating the It is a heuristic evaluation tool, so any changes to the code can sometime arbitrarily change what branches it considers and what errors it thinks it might have found. You can run it locally with |
c0cf1ed to
5bb5bab
Compare
|
@nanosoldier |
1 similar comment
|
@nanosoldier |
|
Hm, not sure why this didn't went through. Let's try again: @nanosoldier |
|
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
80f71bb to
be4b416
Compare
|
I'm seeing some regressions after the rebase, so not good to merge yet. |
|
NVM, I was running with obj pools disabled. |
|
@nanosoldier |
|
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
|
Bump |
| #define LLT_FREE(x) free(x) | ||
|
|
||
| #define STATIC_INLINE static inline | ||
| #define FORCE_INLINE static inline __attribute__((always_inline)) |
There was a problem hiding this comment.
This is already defined in
julia/src/support/MurmurHash3.c
Line 15 in d775750
In file included from /home/mose/repo/julia/src/support/hashing.c:51:
/home/mose/repo/julia/src/support/MurmurHash3.c:15: warning: "FORCE_INLINE" redefined
15 | #define FORCE_INLINE inline __attribute__((always_inline))
|
In file included from /home/mose/repo/julia/src/support/hashing.c:7:
/home/mose/repo/julia/src/support/dtypes.h:120: note: this is the location of the previous definition
120 | #define FORCE_INLINE static inline __attribute__((always_inline))
|
Previous work
Since #21590, the GC mark-loop was implemented by keeping two manually managed stacks: one of which contained iterator states used to keep track of the object currently being marked. As an example, to mark arrays, we would pop the corresponding iterator state from the stack, iterate over the array until we found an unmarked reference, and if so, we would update the iterator state (to reflect the index we left off), "repush" it into the stack and proceed with marking the reference we just found.
This PR
This PR eliminates the need of keeping the iterator states by modifying the object graph traversal code. We keep a single stack of
jl_value_t *currently being processed. To mark an object, we first pop it from the stack, push all unmarked references into the stack and proceed with marking.I believe this doesn't break any invariant from the generational GC. Indeed, the age bits are set after marking (if the object survived one GC cycle it's moved to the old generation), so this new traversal scheme wouldn't change the fact of whether an object had references to old objects or not. Furthermore, we must not update GC metadata for objects in the
remset, and we ensure this by callinggc_mark_outrefsingc_queue_remsetwithmeta_updatedset to 1.Additional advantages
jl_gc_markqueue_t(from LIFO to FIFO, for example) methods without touching the mark-loop itself, which could enable further exploration on the GC in the future.Since this PR changes the mark-loop graph traversal code, there are some changes in the heap-snapshot, though I'm not familiar with that PR.
Some benchmark results are here: https://hackmd.io/@Idnmfpb3SxK98-OsBtRD5A/H1V6QSzvs.