Skip to content

Conversation

@hmmr
Copy link

@hmmr hmmr commented Oct 28, 2025

This implements martinsumner#478.

When called here, it returns a map containing reportable items described in the issue above.

(dev2@127.0.0.1)12> riak_kv_vnode:vnode_status(riak_core_vnode_manager:all_index_pid(riak_kv_vnode)).
[{1347321821914426127719021955160323408745312813056,
  [{backend_status,riak_kv_leveled_backend,
                   #{ledger_cache_size => 4,
                     journal_last_compaction_result => {0,0.0},
                     get_sample_count => 0,get_body_time => 0,
                     head_sample_count => 0,head_rsp_time => 0,
                     put_sample_count => 0,put_prep_time => 0,put_ink_time => 0,
                     put_mem_time => 0}},
   {vnodeid,<<"}ÍyرYdå">>},
   {counter,140003},
   {counter_lease,150000},
   {counter_lease_size,10000},
   {counter_leasing,false}]},
...

Not all items can be available at the time of calling, in which case the backend_status will not include the corresponding keys.

In riak_control, the results will show as follows:
Screenshot_2025-10-29_01-11-07
Screenshot_2025-10-29_01-11-33

@hmmr
Copy link
Author

hmmr commented Oct 28, 2025

Marking as WIP pending work on the last item ("Mean level for recent fetches").

@hmmr hmmr changed the title WIP leveled_bookie:status/1 leveled_bookie:status/1 Oct 29, 2025
@martinsumner martinsumner moved this to Todo in OpenRiak 3.4.1 Jan 7, 2026
@martinsumner martinsumner moved this from Todo to Ready For Review in OpenRiak 3.4.1 Jan 7, 2026
@martinsumner martinsumner self-requested a review January 7, 2026 13:25
@martinsumner
Copy link

Can we add a test to the ct tests, or perhaps just extend one of the tests in basic_SUITE.

It would be good to verify that the results returned are as expected. Some may just need to be range checked, but you can get the PIDs of the Inker/Penciller via leveled_bookie:book_returnactors/1 to absolutely verify things like the penciller cache size.

Obviously fails the coverage check at the moment as book_status/1 is never called:

|------------------------|------------|
  |                module  |  coverage  |
  |------------------------|------------|
  |        leveled_bookie  |       99%  |
  |           leveled_cdb  |      100%  |
  |         leveled_codec  |      100%  |
  |        leveled_ebloom  |      100%  |
  |          leveled_eval  |      100%  |
  |        leveled_filter  |      100%  |
  |          leveled_head  |      100%  |
  |        leveled_iclerk  |      100%  |
  |     leveled_imanifest  |      100%  |
  |         leveled_inker  |      100%  |
  |           leveled_log  |      100%  |
  |       leveled_monitor  |       97%  |
  |        leveled_pclerk  |      100%  |
  |     leveled_penciller  |      100%  |
  |     leveled_pmanifest  |      100%  |
  |          leveled_pmem  |      100%  |
  |        leveled_runner  |      100%  |
  |         leveled_setop  |      100%  |
  |           leveled_sst  |      100%  |
  |      leveled_sstblock  |      100%  |
  |        leveled_tictac  |      100%  |
  |          leveled_tree  |      100%  |
  |          leveled_util  |      100%  |
  |------------------------|------------|
  |                 total  |       99%  |
  |------------------------|------------|

There are tests in basic_SUITE that prompt journal compaction, so you cna follow the pattern to check results are expected after a compaction.

@hmmr hmmr changed the title leveled_bookie:status/1 WIP leveled_bookie:status/1 Jan 15, 2026
@martinsumner
Copy link

The PUT will call leveled_monitor:maybe_time/2 - so sometimes it will be timed, and sometimes not. So I'm not sure that the test will pass reliably.

I do think it should try a number of operations, so that we can check all the values change to the right order of magnitude. There are a lot of load functions, or you can add a fetch and validation of the book status into one of the other tests.

Also, what happened to reporting the journal compaction scores and results?


-spec book_status(pid()) -> proplists:proplist().
%% @doc
%% Return a proplist conteaining the following items:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4314334.

book_headstatus(Pid) ->
gen_server:call(Pid, head_status, infinity).

-spec book_status(pid()) -> proplists:proplist().

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A proplist or a map?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4314334.

put_mem_time => PT#bookie_put_timings.mem_time,
fetch_count_by_level => FCL
}.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there has been no compaction, is there no report on compaction? Would it not be better to consistently sure all keys exist, but have them set to a default value when they haven't occurred.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All keys are now present in report, with values undefined when no actual values have been collected yet (11e30b9).

code_change(_OldVsn, State, _Extra) ->
{ok, State}.

enriched_bookie_status(#state{
Copy link

@martinsumner martinsumner Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dialyzer spec required.

By default, in leveled, we try to avoid passing the LoopState to functions outside of the standard handle_call/caast/info functions. I would probably do this within the handle_call rather than have it as a separate function - as it is just a function based on knowledge of #state, and there's not separate testing of the function. [in this case there would be no need for a dialyzer spec]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in dffe29a.

@martinsumner
Copy link

One other thing to note, which came up in a thread about stats - is the riak admin vnode-status command.

I think it would be worth looking at the formatting of the output of this CLI command as part of this. There's an open issue from the basho days: basho/riak_kv#343

As we have the OTP json module available, perhaps we could fix this by just pretty-printing json, so that we don't end up coupling some presentation logic in the CLI to the format of the status output.

@hmmr hmmr force-pushed the tiot/openriak-3.4/leveled_bookie_status branch from 1c897f3 to 56d848e Compare January 29, 2026 16:42
@hmmr
Copy link
Author

hmmr commented Jan 29, 2026

The PUT will call leveled_monitor:maybe_time/2 - so sometimes it will be timed, and sometimes not. So I'm not sure that the test will pass reliably.

With this option, stats are always collected.

@hmmr
Copy link
Author

hmmr commented Jan 29, 2026

One other thing to note, which came up in a thread about stats - is the riak admin vnode-status command.

I think it would be worth looking at the formatting of the output of this CLI command as part of this. There's an open issue from the basho days: basho/riak_kv#343

As we have the OTP json module available, perhaps we could fix this by just pretty-printing json, so that we don't end up coupling some presentation logic in the CLI to the format of the status output.

That will be a matter of a PR of its own, in riak_kv; I'll see what I can do.

@hmmr hmmr changed the title WIP leveled_bookie:status/1 leveled_bookie:status/1 Jan 29, 2026
@martinsumner
Copy link

Quick check to extend the test to do some genuine load that should generate both journal and ledger compaction (just printed out status for now, rather than check with assertions):

bookie_status_report(_Config) ->
    RootPath = testutil:reset_filestructure(),
    StartOpts =
        [
            {root_path, RootPath},
            {sync_strategy, testutil:sync_strategy()},
            {log_level, info},
            {stats_percentage, 100},
            {max_journalobjectcount, 10000},
            {forced_logs, []}
        ],
    {ok, Bookie} = leveled_bookie:book_start(StartOpts),

    InitialReport =
        #{
            ledger_cache_size => undefined,
            n_active_journal_files => undefined,
            level_files_count => undefined,
            penciller_inmem_cache_size => undefined,
            penciller_work_backlog_status => undefined,
            penciller_last_merge_time => undefined,
            journal_last_compaction_time => undefined,
            journal_last_compaction_result => undefined,
            fetch_count_by_level =>
                #{
                    not_found => #{count => 0, time => 0},
                    mem => #{count => 0, time => 0},
                    lower => #{count => 0, time => 0},
                    '0' => #{count => 0, time => 0},
                    '1' => #{count => 0, time => 0},
                    '2' => #{count => 0, time => 0},
                    '3' => #{count => 0, time => 0}
                },
            get_sample_count => 0,
            get_body_time => 0,
            head_sample_count => 0,
            head_rsp_time => 0,
            put_sample_count => 0,
            put_prep_time => 0,
            put_ink_time => 0,
            put_mem_time => 0,
            avg_compaction_score => undefined
        },
    InitialReport = leveled_bookie:book_status(Bookie),

    {TObj, TSpec} = testutil:generate_testobject(),
    ok = testutil:book_riakput(Bookie, TObj, TSpec),

    Rep1 = leveled_bookie:book_status(Bookie),
    1 = maps:get(ledger_cache_size, Rep1),
    1 = maps:get(put_sample_count, Rep1),
    GoodPutPrepTime = 10000,
    GoodPutInkTime = 10000,
    GoodPutMemTime = 100,
    within_range(1, GoodPutPrepTime, maps:get(put_prep_time, Rep1)),
    within_range(1, GoodPutInkTime, maps:get(put_ink_time, Rep1)),
    within_range(1, GoodPutMemTime, maps:get(put_mem_time, Rep1)),

    {r_object, TBkt, TKey, _, _, _, _} = TObj,
    {ok, _} = testutil:book_riakget(Bookie, TBkt, TKey),
    Rep2 = leveled_bookie:book_status(Bookie),
    1 = maps:get(get_sample_count, Rep2),
    GoodGetBodyTime = 500,
    within_range(1, GoodGetBodyTime, maps:get(get_body_time, Rep2)),

    io:format("Prompt journal compaction~n"),
    ok = leveled_bookie:book_compactjournal(Bookie, 30000),
    testutil:wait_for_compaction(Bookie),

    io:format(user, "~p~n", [leveled_bookie:book_status(Bookie)]),

    io:format("Load 80K objects and then delete them~n"),
    testutil:load_objects(
        20000,
        [binary_uuid, binary_uuid, binary_uuid, binary_uuid],
        Bookie,
        no_check,
        fun testutil:generate_compressibleobjects/2
    ),
    FoldKeysFun = fun(B, K, Acc) -> [{B, K} | Acc] end,
    {async, F1} =
        leveled_bookie:book_keylist(Bookie, o_rkv, {FoldKeysFun, []}),
    KL1 = F1(),
    lists:foreach(
        fun({Bucket, Key}) ->
            testutil:book_riakdelete(Bookie, Bucket, Key, [])
        end,
        KL1
    ),

    io:format("Prompt journal compaction~n"),
    ok = leveled_bookie:book_compactjournal(Bookie, 30000),
    testutil:wait_for_compaction(Bookie),

    io:format(user, "~p~n", [leveled_bookie:book_status(Bookie)]),

    ok = leveled_bookie:book_destroy(Bookie).

However, there are still some unexpected undefined results.

@hmmr hmmr force-pushed the tiot/openriak-3.4/leveled_bookie_status branch from 7725c19 to 4f1fabb Compare February 1, 2026 22:07
),
%% penciller last merge actually happens before second compaction,
%% but only gets recorded in book monitor at the time level files
%% count is updated, i.e., earlier than CompactionStarted2, so:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default penciller cache size is 28K - but there is support for jitter. This is the count of keys and index entries added. At about this level, a merge will be prompted - but the first merge will create a L0 file, and as L1 is empty this will be switched rather than merged below.

I think the stat update of the merge will only be prompted on the second merge event. Merge is prompted by a push of the ledger cache to an oversized penciller - but this has nothing to do with journal compaction.

It might be possible that the merge is prompted and doesn't complete until after the journal compaction. Thought ther eis probably enough keys (80K) so that this doesn't happen.

Thhis would be more predictable if you set a lower max_pencillercachesize to the start options e.g. {max_pencillercachesize, 16000}. This should mean more penciller merge events, and the second merge will be prompted well before the load/delete is complete - and so the store will slow down if it doesn't happen in a timely manner.

As the resting cache size is size + jitter - this should also make the < 20K test reliably true

{sync_strategy, testutil:sync_strategy()},
{log_level, info},
{stats_percentage, 100},
{forced_logs, []}
Copy link

@martinsumner martinsumner Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set {max_journalobjectcount, 10000} in StartOps. So that there are non-active journals when the compaction is run, and compaction will have work to do.

@martinsumner
Copy link

I think you still need to increment on open_reader. I expected you needed an additional increment on cdb_roll, not to remove the open_reader counts.

Otherwise, when a store is restarted, it will simply not count all the old cdb files it starts as read-only. Also, it will not count new files created by compaction.

@martinsumner martinsumner moved this from Ready For Review to Done in OpenRiak 3.4.1 Feb 4, 2026
@martinsumner martinsumner moved this from Done to Review In Progress in OpenRiak 3.4.1 Feb 4, 2026
@hmmr hmmr force-pushed the tiot/openriak-3.4/leveled_bookie_status branch from dd7eeff to 0bf6905 Compare February 5, 2026 01:27
@martinsumner
Copy link

martinsumner commented Feb 10, 2026

Some comments on some of the stats:

  • recent_putgethead_counts; doesn't seem to be connected (i.e. can't be updated).
  • avg_compaction_score; doesn't seem to be connected (i.e. can't be updated).
    • Would be preferable I think to avg_compaction_score_sample, as avg_compaction_score_sample is not a sample in that it is not a random slice of the journal scores (the scores are in-order so it will always show one end of the Journal only).
    • Would be better to have avg_compaction_score and max_compaction_score, drop the sample scores.
  • penciller_work_backlog_status - I think this probably needs to be updated in all three case statement in leveled_penciller not just the final one. The first case would be an update of {0, false, false}, the second {WC, false, true} and the third case is OK as-is.
  • journal_last_compaction_time/penciller_last_merge_time - would it be better to present this in a human readable format? Especially if we're to sue this more generally other than for riak_control.

@hmmr hmmr force-pushed the tiot/openriak-3.4/leveled_bookie_status branch from 9aed1a6 to b2915f5 Compare February 11, 2026 02:44
@hmmr
Copy link
Author

hmmr commented Feb 11, 2026

I have

  • removed recent_putgethead_counts, which was a resurfaced leftover from early work (those counts exist as individual items).
  • did the suggested to penciller_work_backlog_status, which will provide it with a more meaningful value than undefined.
  • on compaction score, I am now reporting min_ and max_compaction_score, calculated on every avg_compaction_score_update; the latter is no longer included in the report.
  • I would still keep the various times in the report as unixtime milliseconds, and only convert at the sites where they are to be presented, i.e., in Elm code in riak_control, and in the json printed by riak admin vnode-status (in the works).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Review In Progress

Development

Successfully merging this pull request may close these issues.

2 participants