-
Notifications
You must be signed in to change notification settings - Fork 2
leveled_bookie:status/1 #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: openriak-3.4
Are you sure you want to change the base?
leveled_bookie:status/1 #17
Conversation
* avg/best_compaction_score * level_files_count * penciller_inmem_cache_size
|
Marking as WIP pending work on the last item ("Mean level for recent fetches"). |
|
Can we add a test to the It would be good to verify that the results returned are as expected. Some may just need to be range checked, but you can get the PIDs of the Inker/Penciller via Obviously fails the coverage check at the moment as There are tests in basic_SUITE that prompt journal compaction, so you cna follow the pattern to check results are expected after a compaction. |
|
The PUT will call I do think it should try a number of operations, so that we can check all the values change to the right order of magnitude. There are a lot of load functions, or you can add a fetch and validation of the book status into one of the other tests. Also, what happened to reporting the journal compaction scores and results? |
src/leveled_bookie.erl
Outdated
|
|
||
| -spec book_status(pid()) -> proplists:proplist(). | ||
| %% @doc | ||
| %% Return a proplist conteaining the following items: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
containing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 4314334.
src/leveled_bookie.erl
Outdated
| book_headstatus(Pid) -> | ||
| gen_server:call(Pid, head_status, infinity). | ||
|
|
||
| -spec book_status(pid()) -> proplists:proplist(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A proplist or a map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 4314334.
src/leveled_monitor.erl
Outdated
| put_mem_time => PT#bookie_put_timings.mem_time, | ||
| fetch_count_by_level => FCL | ||
| }. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there has been no compaction, is there no report on compaction? Would it not be better to consistently sure all keys exist, but have them set to a default value when they haven't occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All keys are now present in report, with values undefined when no actual values have been collected yet (11e30b9).
src/leveled_monitor.erl
Outdated
| code_change(_OldVsn, State, _Extra) -> | ||
| {ok, State}. | ||
|
|
||
| enriched_bookie_status(#state{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dialyzer spec required.
By default, in leveled, we try to avoid passing the LoopState to functions outside of the standard handle_call/caast/info functions. I would probably do this within the handle_call rather than have it as a separate function - as it is just a function based on knowledge of #state, and there's not separate testing of the function. [in this case there would be no need for a dialyzer spec]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in dffe29a.
|
One other thing to note, which came up in a thread about stats - is the I think it would be worth looking at the formatting of the output of this CLI command as part of this. There's an open issue from the basho days: basho/riak_kv#343 As we have the OTP json module available, perhaps we could fix this by just pretty-printing json, so that we don't end up coupling some presentation logic in the CLI to the format of the status output. |
1c897f3 to
56d848e
Compare
With this option, stats are always collected. |
That will be a matter of a PR of its own, in riak_kv; I'll see what I can do. |
|
Quick check to extend the test to do some genuine load that should generate both journal and ledger compaction (just printed out status for now, rather than check with assertions): bookie_status_report(_Config) ->
RootPath = testutil:reset_filestructure(),
StartOpts =
[
{root_path, RootPath},
{sync_strategy, testutil:sync_strategy()},
{log_level, info},
{stats_percentage, 100},
{max_journalobjectcount, 10000},
{forced_logs, []}
],
{ok, Bookie} = leveled_bookie:book_start(StartOpts),
InitialReport =
#{
ledger_cache_size => undefined,
n_active_journal_files => undefined,
level_files_count => undefined,
penciller_inmem_cache_size => undefined,
penciller_work_backlog_status => undefined,
penciller_last_merge_time => undefined,
journal_last_compaction_time => undefined,
journal_last_compaction_result => undefined,
fetch_count_by_level =>
#{
not_found => #{count => 0, time => 0},
mem => #{count => 0, time => 0},
lower => #{count => 0, time => 0},
'0' => #{count => 0, time => 0},
'1' => #{count => 0, time => 0},
'2' => #{count => 0, time => 0},
'3' => #{count => 0, time => 0}
},
get_sample_count => 0,
get_body_time => 0,
head_sample_count => 0,
head_rsp_time => 0,
put_sample_count => 0,
put_prep_time => 0,
put_ink_time => 0,
put_mem_time => 0,
avg_compaction_score => undefined
},
InitialReport = leveled_bookie:book_status(Bookie),
{TObj, TSpec} = testutil:generate_testobject(),
ok = testutil:book_riakput(Bookie, TObj, TSpec),
Rep1 = leveled_bookie:book_status(Bookie),
1 = maps:get(ledger_cache_size, Rep1),
1 = maps:get(put_sample_count, Rep1),
GoodPutPrepTime = 10000,
GoodPutInkTime = 10000,
GoodPutMemTime = 100,
within_range(1, GoodPutPrepTime, maps:get(put_prep_time, Rep1)),
within_range(1, GoodPutInkTime, maps:get(put_ink_time, Rep1)),
within_range(1, GoodPutMemTime, maps:get(put_mem_time, Rep1)),
{r_object, TBkt, TKey, _, _, _, _} = TObj,
{ok, _} = testutil:book_riakget(Bookie, TBkt, TKey),
Rep2 = leveled_bookie:book_status(Bookie),
1 = maps:get(get_sample_count, Rep2),
GoodGetBodyTime = 500,
within_range(1, GoodGetBodyTime, maps:get(get_body_time, Rep2)),
io:format("Prompt journal compaction~n"),
ok = leveled_bookie:book_compactjournal(Bookie, 30000),
testutil:wait_for_compaction(Bookie),
io:format(user, "~p~n", [leveled_bookie:book_status(Bookie)]),
io:format("Load 80K objects and then delete them~n"),
testutil:load_objects(
20000,
[binary_uuid, binary_uuid, binary_uuid, binary_uuid],
Bookie,
no_check,
fun testutil:generate_compressibleobjects/2
),
FoldKeysFun = fun(B, K, Acc) -> [{B, K} | Acc] end,
{async, F1} =
leveled_bookie:book_keylist(Bookie, o_rkv, {FoldKeysFun, []}),
KL1 = F1(),
lists:foreach(
fun({Bucket, Key}) ->
testutil:book_riakdelete(Bookie, Bucket, Key, [])
end,
KL1
),
io:format("Prompt journal compaction~n"),
ok = leveled_bookie:book_compactjournal(Bookie, 30000),
testutil:wait_for_compaction(Bookie),
io:format(user, "~p~n", [leveled_bookie:book_status(Bookie)]),
ok = leveled_bookie:book_destroy(Bookie).However, there are still some unexpected |
7725c19 to
4f1fabb
Compare
test/end_to_end/basic_SUITE.erl
Outdated
| ), | ||
| %% penciller last merge actually happens before second compaction, | ||
| %% but only gets recorded in book monitor at the time level files | ||
| %% count is updated, i.e., earlier than CompactionStarted2, so: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default penciller cache size is 28K - but there is support for jitter. This is the count of keys and index entries added. At about this level, a merge will be prompted - but the first merge will create a L0 file, and as L1 is empty this will be switched rather than merged below.
I think the stat update of the merge will only be prompted on the second merge event. Merge is prompted by a push of the ledger cache to an oversized penciller - but this has nothing to do with journal compaction.
It might be possible that the merge is prompted and doesn't complete until after the journal compaction. Thought ther eis probably enough keys (80K) so that this doesn't happen.
Thhis would be more predictable if you set a lower max_pencillercachesize to the start options e.g. {max_pencillercachesize, 16000}. This should mean more penciller merge events, and the second merge will be prompted well before the load/delete is complete - and so the store will slow down if it doesn't happen in a timely manner.
As the resting cache size is size + jitter - this should also make the < 20K test reliably true
| {sync_strategy, testutil:sync_strategy()}, | ||
| {log_level, info}, | ||
| {stats_percentage, 100}, | ||
| {forced_logs, []} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set {max_journalobjectcount, 10000} in StartOps. So that there are non-active journals when the compaction is run, and compaction will have work to do.
|
I think you still need to increment on open_reader. I expected you needed an additional increment on cdb_roll, not to remove the open_reader counts. Otherwise, when a store is restarted, it will simply not count all the old cdb files it starts as read-only. Also, it will not count new files created by compaction. |
dd7eeff to
0bf6905
Compare
|
Some comments on some of the stats:
|
9aed1a6 to
b2915f5
Compare
|
I have
|
This implements martinsumner#478.
When called here, it returns a map containing reportable items described in the issue above.
Not all items can be available at the time of calling, in which case the backend_status will not include the corresponding keys.
In riak_control, the results will show as follows:

