-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libbpf-tools: add profile #3782
Conversation
In general a conversion of a bcc python API tool to a libbpf tool is desired and very appreciated. |
Thank you for your reply. |
This patch has basic functionality with some limitations of the profile.
This is the result of using the -f (folded) option.
Current limitations are as follows.
Alternatively, the code to get kernel_ip is excluded in this patch. How about having this for now and including the kernel_ip code after libbpf-tools supports the macro?
Thank you for reading this. |
libbpf-tools/Makefile
Outdated
@@ -54,6 +54,7 @@ APPS = \ | |||
tcpconnlat \ | |||
tcprtt \ | |||
vfsstat \ | |||
profile \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep the list sorted and add an entry to .gitignore
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fixed. Thank you.
libbpf-tools/profile.bpf.c
Outdated
@@ -0,0 +1,67 @@ | |||
// SPDX-License-Identifier: GPL-2.0 | |||
// Copyright (c) 2021 Wenbo Zhang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong copyright info. Please use C++ style comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I edited comment style.
For copyright information, I've now followed opensnoop's license-identifier.
I hope to have one chance to fix it after more code review.
Thank you.
libbpf-tools/profile.bpf.c
Outdated
|
||
struct { | ||
__uint(type, BPF_MAP_TYPE_STACK_TRACE); | ||
__uint(key_size, sizeof(u32)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use __type(key, u32);
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, but the code is throwing a runtime error like below. So I keep the original code.
Thank you.
libbpf: Error in bpf_create_map_xattr(stackmap):Invalid argument(-22). Retrying without BTF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you update your git submodule ?
libbpf-tools/profile.bpf.c
Outdated
u64 *valp; | ||
static const u64 zero; | ||
|
||
if (include_idle == false && pid == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!include_idle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fixed. Thank you.
libbpf-tools/profile.bpf.c
Outdated
return 0; | ||
|
||
// create map key | ||
struct key_t key = {.pid = tgid}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare variables first.
This .pid = tgid
is very counterintuitive. I would suggest you use pid
and tid
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. It's fixed. Thank you.
For the |
libbpf-tools/profile.bpf.c
Outdated
return 0; | ||
|
||
// create map key | ||
struct key_t key = {.pid = pid}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare variable first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed it and fixed it now.
Thank you.
libbpf-tools/profile.c
Outdated
|
||
libbpf_set_print(libbpf_print_fn); | ||
|
||
err = bump_memlock_rlimit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to other tools, we don't need this anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed it. Thank you.
Thank you for the information and I checked the page.
Alternatively, I'm trying to check if I can refer or use machine__get_kernel_start of perf(linux/tools/perf/util/machine.h) to get kernel start address. I've shared my status for feedback to you, any opinions are welcome if you have for this. |
68df9ba
to
3c5a967
Compare
libbpf-tools/profile.bpf.c
Outdated
|
||
// arm64 only supported | ||
if (CONFIG_ARM64) { | ||
extern __u32 CONFIG_ARM64_VA_BITS __kconfig __weak; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember the 'declare variables first.' as you commented.
However, referencing other tools in libbpf-tools, I found some exceptions in cases like this (for variables used only in if conditions).
Could you please share your opinion?
libbpf-tools/profile.bpf.c
Outdated
// populate extras to fix the kernel stack | ||
extern bool CONFIG_ARM64 __kconfig __weak; | ||
|
||
// arm64 only supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I've added logic to set kernel_ip only for arm64 due to some limitations and difficulty accessing kernel PAGE_OFFSET values at runtime with hope of more support for other architectures with next patches by me or more technical engineer.
Could you please share your opinion?
Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote code to apply the same logic for x86 but to be honest: I do not understand what does this logic.
I saw it was present in bcc
version of this gadget:
Line 178 in 55dcd2e
// populate extras to fix the kernel stack |
So, does someone know what do we need this code and what is the meaning of "fixing kernel stack"?
[Here is my code for information:
diff --git a/libbpf-tools/profile.bpf.c b/libbpf-tools/profile.bpf.c
index 08ac5687..9e4f5d0f 100644
--- a/libbpf-tools/profile.bpf.c
+++ b/libbpf-tools/profile.bpf.c
@@ -59,13 +59,13 @@ int do_perf_event(struct bpf_perf_event_data *ctx) {
if (key.kern_stack_id >= 0) {
// populate extras to fix the kernel stack
+ u64 ip = PT_REGS_IP(&ctx->regs);
+ u64 page_offset;
// arm64 only supported
-#ifdef __TARGET_ARCH_arm64
+#if defined(__TARGET_ARCH_arm64)
extern __u32 CONFIG_ARM64_VA_BITS __kconfig __weak;
extern bool CONFIG_ARM64_64K_PAGES __kconfig __weak;
- u64 ip = PT_REGS_IP(&ctx->regs);
- u64 page_offset;
u32 va;
if (CONFIG_ARM64_VA_BITS)
@@ -79,11 +79,19 @@ int do_perf_event(struct bpf_perf_event_data *ctx) {
page_offset = (-(1UL << (va)));
else
page_offset = (0xffffffffffffffffUL - (1UL << (va - 1)) + 1);
-
- if (ip > page_offset) {
+#elif defined(__TARGET_ARCH_x86)
+#define __PAGE_OFFSET_BASE_L4 0xffff888000000000UL
+#define __PAGE_OFFSET_BASE_L5 0xff11000000000000UL
+ extern bool CONFIG_DYNAMIC_MEMORY_LAYOUT __kconfig __weak;
+ extern bool CONFIG_X86_5LEVEL __kconfig __weak;
+
+ if (CONFIG_DYNAMIC_MEMORY_LAYOUT && CONFIG_X86_5LEVEL)
+ page_offset = __PAGE_OFFSET_BASE_L5;
+ else
+ page_offset = __PAGE_OFFSET_BASE_L4;
+#endif /* __TARGET_ARCH_x86 */
+ if (ip > page_offset)
key.kernel_ip = ip;
- }
-#endif
}
valp = bpf_map_lookup_or_try_init(&counts, &key, &zero);
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, as I understand, It's to append ip address of the register to the top of the kernel stack trace to provide more detailed information to the user. page_offset is used to determine whether ip is a kernel address.
Thank you very much for the code for x86. But unfortunately I changed the code from yesterday...
because
- If the target doesn't have Kconfig, the profile won't work at all.
- On some versions of Linux, page_offset is not an effective value for this purpose.
Could you please check the updated code?
I'm really sorry and thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really sorry and thank you.
No problem, your actual version is simpler (and so better) than the previous and my proposal for x86.
Nonetheless, I do not really understand the goal of this code (or how it is printed).
Indeed, from my tests, I got the following output:
$ sudo ./profile -K 5
Sampling at 49 Hertz of all threads by kernel stack for 5 secs.
#0 0xffffffff98ed69cb arch_local_irq_enable+0xb
#1 0xffffffff99a00ec3 __schedule+0x773
#2 0xffffffff99a0109f schedule+0x4f
#3 0xffffffff98edf5f7 do_sched_yield+0x87
#4 0xffffffff98edf62e __x64_sys_sched_yield+0xe
#5 0xffffffff999f3b21 do_syscall_64+0x61
#6 0xffffffff99c0007c entry_SYSCALL_64_after_hwframe+0x44
kernel_ip#7 0xffffffff98ed69cb arch_local_irq_enable+0xb
...
You can see the entry where k->kernel_ip
is not NULL begining with kernel_ip
.
But this entry is the same as entry #0, so this is why I do not understand this code (maybe I am not using the tool like I should do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, I fixed kernel_ip to be at the top of the call stack. Thank you so much for finding it.
And yes, as you said, I found a duplicate with the first entry in the call stack. I will check and update my opinion on it. I have ever seen this before and it's was produced with some conditions in my faint memory. please let me check it. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where I see it before was a result of bcc docker. (https://github.com/iovisor/bcc/blob/master/QUICKSTART.md)
These are the test results for profile.py. And it has the same kernel_ip and call stack entry #0..
...
do_syscall_64 // kernel_ip
do_syscall_64 // call stack entry #0
entry_SYSCALL_64_after_hwframe
[unknown]
[unknown]
- qemu-system-x86 (36165)
7
native_write_msr_safe // kernel_ip
native_write_msr_safe // call stack entry #0
kvm_set_shared_msr
vmx_save_host_state
...
@yonghong-song
If you have the same experience, could you please share your opinion on the equality of call stack entry #0 and kernel ip in profile.py?
Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi.
This code was indeed already present in the bcc version of this tool, I just do not understand why we need this (maybe it was mandatory with some older kernels because we cannot get the full kernel stack from the BPF_MAP_TYPE_STACK_TRACE
?).
Thank you for your investigation though :)
Best regards.
libbpf-tools/profile.bpf.c
Outdated
if (LINUX_KERNEL_VERSION >= KERNEL_VERSION(5, 4, 0)) | ||
page_offset = (-(1UL << (va))); | ||
else | ||
page_offset = (0xffffffffffffffffUL - (1UL << (va - 1)) + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since v5.4, the definition of PAGE_OFFSET has changed and it has been divided into two cases.
If you don't need it, I'd appreciate your feedback.
Thank you.
I found and fixed a bug in the first patch. I will update soon. Sorry and thank you. |
8033322
to
5a07dfe
Compare
This is the test result of c88462c
|
This is the test result of 5a07dfe Before:
After:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi.
Thank you for translating this tool!
I tested it and it works, I have just some comments (regarding things I do not understand).
Note that, I am simple contributor here, my review will not have consequence regarding the merging.
Best regards.
libbpf-tools/profile.bpf.c
Outdated
if (CONFIG_ARM64_VA_BITS) | ||
va = CONFIG_ARM64_VA_BITS; | ||
else if (CONFIG_ARM64_64K_PAGES) | ||
va = 42; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please indicate what this value is.
I guess it is the number of bit used for a page virtual address, but a comment (or a define) would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
libbpf-tools/profile.bpf.c
Outdated
// populate extras to fix the kernel stack | ||
extern bool CONFIG_ARM64 __kconfig __weak; | ||
|
||
// arm64 only supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote code to apply the same logic for x86 but to be honest: I do not understand what does this logic.
I saw it was present in bcc
version of this gadget:
Line 178 in 55dcd2e
// populate extras to fix the kernel stack |
So, does someone know what do we need this code and what is the meaning of "fixing kernel stack"?
[Here is my code for information:
diff --git a/libbpf-tools/profile.bpf.c b/libbpf-tools/profile.bpf.c
index 08ac5687..9e4f5d0f 100644
--- a/libbpf-tools/profile.bpf.c
+++ b/libbpf-tools/profile.bpf.c
@@ -59,13 +59,13 @@ int do_perf_event(struct bpf_perf_event_data *ctx) {
if (key.kern_stack_id >= 0) {
// populate extras to fix the kernel stack
+ u64 ip = PT_REGS_IP(&ctx->regs);
+ u64 page_offset;
// arm64 only supported
-#ifdef __TARGET_ARCH_arm64
+#if defined(__TARGET_ARCH_arm64)
extern __u32 CONFIG_ARM64_VA_BITS __kconfig __weak;
extern bool CONFIG_ARM64_64K_PAGES __kconfig __weak;
- u64 ip = PT_REGS_IP(&ctx->regs);
- u64 page_offset;
u32 va;
if (CONFIG_ARM64_VA_BITS)
@@ -79,11 +79,19 @@ int do_perf_event(struct bpf_perf_event_data *ctx) {
page_offset = (-(1UL << (va)));
else
page_offset = (0xffffffffffffffffUL - (1UL << (va - 1)) + 1);
-
- if (ip > page_offset) {
+#elif defined(__TARGET_ARCH_x86)
+#define __PAGE_OFFSET_BASE_L4 0xffff888000000000UL
+#define __PAGE_OFFSET_BASE_L5 0xff11000000000000UL
+ extern bool CONFIG_DYNAMIC_MEMORY_LAYOUT __kconfig __weak;
+ extern bool CONFIG_X86_5LEVEL __kconfig __weak;
+
+ if (CONFIG_DYNAMIC_MEMORY_LAYOUT && CONFIG_X86_5LEVEL)
+ page_offset = __PAGE_OFFSET_BASE_L5;
+ else
+ page_offset = __PAGE_OFFSET_BASE_L4;
+#endif /* __TARGET_ARCH_x86 */
+ if (ip > page_offset)
key.kernel_ip = ip;
- }
-#endif
}
valp = bpf_map_lookup_or_try_init(&counts, &key, &zero);
]
libbpf-tools/profile.c
Outdated
@@ -250,13 +252,65 @@ static int cmp_counts(const void *dx, const void *dy) | |||
return x > y ? -1 : !(x == y); | |||
} | |||
|
|||
static bool batch_map_ops = true; /* hope for the best */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand this code.
You want to detect if bpf_map_lookup_batch()
is available, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes batch_map_ops is initially set to true and then changed to false if bpf_map_lookup_batch() is not supported.
In fact, it was referenced in syscount.c. :). I couldn't figure out the meaning of the comments alone. So if you like, I'll edit the comment to be more detailed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry I did not see you set this to false later, this totally makes sense then!
Regarding the comment, if you think it is better to edit it, then do, otherwise I am fine with this, this is a cool feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for your test, review and opinion.
libbpf-tools/profile.c
Outdated
@@ -250,13 +252,65 @@ static int cmp_counts(const void *dx, const void *dy) | |||
return x > y ? -1 : !(x == y); | |||
} | |||
|
|||
static bool batch_map_ops = true; /* hope for the best */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes batch_map_ops is initially set to true and then changed to false if bpf_map_lookup_batch() is not supported.
In fact, it was referenced in syscount.c. :). I couldn't figure out the meaning of the comments alone. So if you like, I'll edit the comment to be more detailed.
libbpf-tools/profile.bpf.c
Outdated
// populate extras to fix the kernel stack | ||
extern bool CONFIG_ARM64 __kconfig __weak; | ||
|
||
// arm64 only supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, as I understand, It's to append ip address of the register to the top of the kernel stack trace to provide more detailed information to the user. page_offset is used to determine whether ip is a kernel address.
Thank you very much for the code for x86. But unfortunately I changed the code from yesterday...
because
- If the target doesn't have Kconfig, the profile won't work at all.
- On some versions of Linux, page_offset is not an effective value for this purpose.
Could you please check the updated code?
I'm really sorry and thank you.
libbpf-tools/profile.bpf.c
Outdated
if (CONFIG_ARM64_VA_BITS) | ||
va = CONFIG_ARM64_VA_BITS; | ||
else if (CONFIG_ARM64_64K_PAGES) | ||
va = 42; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
I think you will need to add the |
Could you review it? Or are you waiting for me to do something? |
It's just rebased on the master branch. |
This patch includes the following changes:
This version of the code looks cleaner to read. Could you please review this? @yezhem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
Thank you for rebasing and still working on this PR despite it was first opened two years ago!
I will take a deeper look and test it later, as I want to fully understand the modifications you brought to the code. For now, I have some comments.
Best regards.
case 'U': | ||
env.user_stacks_only = true; | ||
break; | ||
case 'K': | ||
env.kernel_stacks_only = true; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my usage of your tool, I am currently modifying this to have only one flag:
inspektor-gadget/inspektor-gadget#1417
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your good opinion.
But applying it in this patch make difference with bcc python Profile.
How about applying it to both the Python and C versions through a separate patch?
@chenhengqi
Could you please share your thoughts on this?
libbpf-tools/profile.c
Outdated
items[i].k.pid = keys[i].pid; | ||
items[i].k.user_stack_id = keys[i].user_stack_id; | ||
items[i].k.kern_stack_id = keys[i].kern_stack_id; | ||
strncpy(items[i].k.name, keys[i].name, TASK_COMM_LEN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you are using the key_t
both in userspace and eBPF, you can use memcpy
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it. Thank you.
libbpf-tools/profile.c
Outdated
static void print_count(struct key_t *event, __u64 count, int sfd, | ||
struct ksyms *ksyms, struct syms_cache *syms_cache) | ||
{ | ||
unsigned long *ip; | ||
|
||
ip = calloc(env.perf_max_stack_depth, sizeof(*ip)); | ||
if (!ip) { | ||
fprintf(stderr, "failed to alloc ip\n"); | ||
return; | ||
} | ||
|
||
/* kernel stack */ | ||
if (!env.user_stacks_only && !STACK_ID_EFAULT(event->kern_stack_id)) { | ||
if (bpf_map_lookup_elem(sfd, &event->kern_stack_id, ip) != 0) | ||
printf(" [Missed Kernel Stack]\n"); | ||
else | ||
print_kernel_stacktrace(ip, ksyms); | ||
} | ||
|
||
/* user stack */ | ||
if (!env.kernel_stacks_only && !STACK_ID_EFAULT(event->user_stack_id)) { | ||
if (NEED_DELIMITER(env.delimiter, event->user_stack_id, event->kern_stack_id)) | ||
printf(" --\n"); | ||
|
||
if (bpf_map_lookup_elem(sfd, &event->user_stack_id, ip) != 0) | ||
printf(" [Missed User Stack]\n"); | ||
else | ||
print_user_stacktrace(ip, syms_cache, event->pid); | ||
} | ||
|
||
printf(" %-16s %s (%d)\n", "-", event->name, event->pid); | ||
printf(" %lld\n\n", count); | ||
|
||
free(ip); | ||
} | ||
|
||
static void print_count_folded(struct key_t *event, __u64 count, int sfd, | ||
struct ksyms *ksyms, struct syms_cache *syms_cache) | ||
{ | ||
unsigned long *ip; | ||
|
||
ip = calloc(env.perf_max_stack_depth, sizeof(*ip)); | ||
if (!ip) { | ||
fprintf(stderr, "failed to alloc ip\n"); | ||
return; | ||
} | ||
|
||
printf("%s", event->name); | ||
|
||
/* user stack */ | ||
if (!env.kernel_stacks_only && !STACK_ID_EFAULT(event->user_stack_id)) { | ||
if (bpf_map_lookup_elem(sfd, &event->user_stack_id, ip) != 0) | ||
printf(";[Missed User Stack]"); | ||
else | ||
print_user_stacktrace_folded(ip, syms_cache, event->pid); | ||
} | ||
|
||
/* kernel stack */ | ||
if (!env.user_stacks_only && !STACK_ID_EFAULT(event->kern_stack_id)) { | ||
if (NEED_DELIMITER(env.delimiter, event->user_stack_id, event->kern_stack_id)) | ||
printf(";-"); | ||
|
||
if (bpf_map_lookup_elem(sfd, &event->kern_stack_id, ip) != 0) | ||
printf(";[Missed Kernel Stack]"); | ||
else | ||
print_kernel_stacktrace_folded(ip, ksyms); | ||
} | ||
|
||
printf(" %lld\n", count); | ||
|
||
free(ip); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can factorize this code to reduce the duplicated one.
Maybe, we can check the folded
argument in the print_*_stacktrace()
functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, we can check the
folded
argument in theprint_*_stacktrace()
functions?
It can reduce duplication slightly, but it cannot reduce meaningfully because iterating direction and format are different.
In principle, it's good to separate functions to handle data and representation. For separating this, I can try using iterator that takes addresses and syms and then return resolved symbols, or I can try returning resolved symbols through another array.
But It can make complex code in C language or need more memory usage.
Any thoughts on using iterators for this? Shall I try it?
992702f
to
9098638
Compare
I discovered and fixed an issue where some stack traces had zero counts. And I'm working on removing duplication. |
I reduced duplicate code and excluded the folded option for the sake of simplicity. The folded option can be added in the next step if needed. |
Hi!
Thank you! I will take a look at it during this week! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
I tested it and it works perfectly:
$ yes > /dev/null &
$ sudo ./profile -p $!
...
1
vfs_write
ksys_write
__x64_sys_write
do_syscall_64
entry_SYSCALL_64_after_hwframe
write
[unknown]
- yes (36571)
1
...
I just found one small nit and I have a question with regard to resource usage but in my opinion, this is ready to go.
Best regards.
libbpf-tools/profile.c
Outdated
if (x > y) | ||
return -1; | ||
|
||
if (x < y) | ||
return 1; | ||
|
||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (x > y) | |
return -1; | |
if (x < y) | |
return 1; | |
return 0; | |
return y - x; |
qsort
function requires than something greater, equal or less than 0 is returned, so using this should do the trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback.
I have applied it.
fprintf(stderr, "failed to attach perf event on cpu: " | ||
"%d\n", i); | ||
links[i] = NULL; | ||
close(fd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not call close()
in cleanup
, I suppose bpf_link__destroy()
also closes the underlying perf event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I checked it again and confirmed that the link->detach called in bpf_link__destroy() closes the file descriptor. Thank you.
Signed-off-by: Eunseon Lee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been updated with your feedback along with some other minor changes.
Thank you.
fprintf(stderr, "failed to attach perf event on cpu: " | ||
"%d\n", i); | ||
links[i] = NULL; | ||
close(fd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I checked it again and confirmed that the link->detach called in bpf_link__destroy() closes the file descriptor. Thank you.
libbpf-tools/profile.c
Outdated
if (x > y) | ||
return -1; | ||
|
||
if (x < y) | ||
return 1; | ||
|
||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback.
I have applied it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested and approve this pull request.
Followings are the steps I did on Ubuntu 22.04.3 LTS, x86_64 machine.
/* Build profile */
$ git clone https://github.com/ekyooo/bcc.git -b profile & cd bcc
$ mkdir build & cd build
$ cmake ..
$ cd ../libbpf-tools
$ make profile
/* Build test program */
$ cat test.c
#include <stdlib.h>
unsigned int foo(unsigned int i) {
unsigned int tmp = i + 50;
return tmp * 3;
}
int main(int argc, char* argv[]) {
unsigned int i = 0;
while (i >= 0) {
i = foo(i);
++i;
}
return 0;
}
$ gcc test.c
/* Run test program and profile */
$ ./a.out &
$ sudo ./profile -p 134827 -U
Sampling at 49 Hertz of PID 134827 by user... Hit Ctrl-C to end.
^C main
[unknown]
- a.out (134827)
611
main
[unknown]
- a.out (134827)
278
foo
main
[unknown]
- a.out (134827)
254
foo
main
[unknown]
- a.out (134827)
232
main
[unknown]
- a.out (134827)
221
foo
main
[unknown]
- a.out (134827)
199
foo
main
[unknown]
- a.out (134827)
123
foo
main
[unknown]
- a.out (134827)
117
foo
main
[unknown]
- a.out (134827)
114
foo
main
[unknown]
- a.out (134827)
113
foo
[unknown]
- a.out (134827)
105
/* Run profile.py and compare the result */
$ sudo python3 ../tools/profile.py -p 134827 -U
Sampling at 49 Hertz of PID [134827] by user stack... Hit Ctrl-C to end.
^C
foo
main
__libc_start_call_main
- a.out (134827)
3
foo
main
__libc_start_call_main
- a.out (134827)
3
foo
main
__libc_start_call_main
- a.out (134827)
4
foo
main
__libc_start_call_main
- a.out (134827)
5
foo
__libc_start_call_main
- a.out (134827)
5
foo
main
__libc_start_call_main
- a.out (134827)
6
main
__libc_start_call_main
- a.out (134827)
7
foo
main
__libc_start_call_main
- a.out (134827)
8
main
__libc_start_call_main
- a.out (134827)
9
foo
main
__libc_start_call_main
- a.out (134827)
11
main
__libc_start_call_main
- a.out (134827)
27
It looks same except unknown
printing. And unknown
printing issue is related with #4849. Which means it is not directly related with this PR.
@eiffel-fl @Bojun-Seo Can anybody take a look and possibly merge this? |
Thanks for detailed reviews by @chenhengqi @eiffel-fl @Bojun-Seo. The patch looks good to me. I see in the current patch bpf_map_get_next_key() is used. The kernel has implemented batch lookup which should be more reliable. But this can be done as a followup. |
I have code that uses batch lookup. I will apply it with the next patch. @yonghong-song @chenhengqi @eiffel-fl @Bojun-Seo |
You are more than welcome! It was a pleasure to work with you! |
@eiffel-fl |
Hi!
Sadly no, I will mail you on your Best regards and enjoy OSS :D! |
Hello and thank you for your work! There was already a discussion in the thread above: #3782 (comment) |
@alebsys Feel free to make a contribution! |
It is based on Brendan Gregg's BCC profile and much of the code for it is borrowed from offcputime.c and ruqlen.c. This is an example of usage. # ./profile -iK Sampling at 49 Hertz of all threads by kernel stack... Hit Ctrl-C to end. ^C _raw_spin_unlock_irq finish_task_switch __schedule schedule schedule_timeout msleep HDMI21_Tx_EARC_MainThread kthread ret_from_fork - hdmi21_earc_eng (1427) 1 __pi___clean_dcache_area_poc dma_direct_map_page dma_direct_map_sg dma_buf_unified_map dma_buf_map_attachment kbase_mem_umm_map_attachment kbase_mem_import kbase_ioctl __arm64_compat_sys_ioctl el0_svc_common.constprop.0 el0_svc_compat_handler el0_svc_compat - surface-manager (1911) 4 __pi___clean_dcache_area_poc dma_direct_map_page dma_direct_map_sg dma_buf_unified_map dma_buf_map_attachment kbase_mem_umm_map_attachment kbase_mem_import kbase_ioctl __arm64_compat_sys_ioctl el0_svc_common.constprop.0 el0_svc_compat_handler el0_svc_compat - tAiToneService (2141) 12 [Missed Kernel Stack] - WebAppMgr (3624) 1 ... This is the result of using the -f (folded) option. # ./profile -fdK ... tCMState;low_mem_notify_threshold;__alloc_pages_nodemask;__get_free_pages;__pollwait;unix_poll;sock_poll;do_sys_poll;__arm64_sys_poll;el0_svc_common.constprop.0;el0_svc_compat_handler;el0_svc_compat 2 surface-manager;sock_poll;do_sys_poll;__arm64_sys_poll;el0_svc_common.constprop.0;el0_svc_compat_handler;el0_svc_compat 1 qml-runner;sock_poll;do_sys_poll;__arm64_sys_poll;el0_svc_common.constprop.0;el0_svc_compat_handler;el0_svc_compat 96 mali-cmar-backe;_raw_spin_unlock_irqrestore;kbase_pm_update_cores_state;kbase_pm_do_poweron;kbase_pm_update_active;kbase_hwaccess_pm_gpu_active;kbase_pm_context_active_handle_suspend;kbase_js_sched;kbase_jd_submit;kbase_ioctl;__arm64_compat_sys_ioctl;el0_svc_common.constprop.0;el0_svc_compat_handler;el0_svc_compat 2 surface-manager;__fget;__fget_light;__fdget;sockfd_lookup_light;__sys_sendmsg;__arm64_compat_sys_sendmsg;el0_svc_common.constprop.0;el0_svc_compat_handler;el0_svc_compat 1 ... WARNING: 4 stack traces could not be displayed. Consider increasing --stack-storage-size. Signed-off-by: Eunseon Lee [email protected]
It is based on Brendan Gregg's BCC profile and much of the code for it is borrowed from offcputime.c and ruqlen.c.
I'm not sure if libbpf-tools requires a profile. So I'm sharing this just to get some feedback before I finish developing of this patch.
Could you give me your opinion? (if necessary or not)
Thank you.
This is an example of usage.
This is the result of using the -f (folded) option.
Signed-off-by: Eunseon Lee [email protected]