Li Bin [Mon, 5 Jun 2017 00:34:09 +0000 (08:34 +0800)]
perf symbols: Fix plt entry calculation for ARM and AARCH64
On x86, the plt header size is as same as the plt entry size, and can be
identified from shdr's sh_entsize of the plt.
But we can't assume that the sh_entsize of the plt shdr is always the
plt entry size in all architecture, and the plt header size may be not
as same as the plt entry size in some architecure.
On ARM, the plt header size is 20 bytes and the plt entry size is 12
bytes (don't consider the FOUR_WORD_PLT case) that refer to the binutils
implementation. The plt section is as follows:
The commit 9aaf5a5f479b ("perf probe: Check kprobes blacklist when
adding new events"), 'perf probe' supports checking the blacklist of the
fuctions which can not be probed. But the checking condition is wrong,
that the end_addr of the symbol which is the start_addr of the next
symbol can't be included.
Committer notes:
IOW make it match its kernel counterpart in kernel/kprobes.c:
bool within_kprobe_blacklist(unsigned long addr)
Each entry have as its end address not its end address, but the first
address _outside_ that symbol, which for related functions, is the first
address of the next symbol, like these from kernel/trace/trace_probe.c:
Signed-off-by: Li Bin <huawei.libin@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: zhangmengting@huawei.com Fixes: 9aaf5a5f479b ("perf probe: Check kprobes blacklist when adding new events") Link: http://lkml.kernel.org/r/1504011443-7269-1-git-send-email-huawei.libin@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The tools/include/uapi/asm-generic/mman-common.h file is used to find
the access rights defines for the pkey_alloc syscall second argument.
Since we have the detector of changes for the tools/include header files
versus its kernel origin (include/uapi/asm-generic/mman-common.h), we'll
get whatever new flag appears for that argument automatically.
This method should be used in other cases where it is easy to generate
those flags tables because the header has properly namespaced defines
like PKEY_DISABLE_ACCESS and PKEY_DISABLE_WRITE.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-3xq5312qlks7wtfzv2sk3nct@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
tools headers: Sync cpu features kernel ABI headers with tooling headers
These changes made the tools/arch/x86/include/ headers to drift from its
kernel origins:
910448bbed06 ("perf/x86/amd/uncore: Rename cpufeatures macro for cache counters") 5442c2699552 ("x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized VMSAVE/VMLOAD" CPUID flag") cba4671af755 ("x86/mm: Disable PCID on 32-bit kernels")
Which was detected while building perf:
make: Entering directory '/home/acme/git/linux/tools/perf'
BUILD: Doing 'make -j4' parallel build
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
This sync causes just these perf object files to be rebuilt:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And the changes in the above changesets don't entail any need for change
in the above 'perf bench' files.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-456aafouj911a4x4zwt8stkm@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Prior to this patch, make scripts tested for CLANG with ifeq ($(CC),
clang), failing to detect CLANG binaries with different names. Fix it by
testing for the existence of __clang__ macro in the list of compiler
defined macros.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Paul Turner <pjt@google.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20170827075442.108534-5-davidcc@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 24 Aug 2017 16:27:35 +0000 (18:27 +0200)]
perf values: Zero value buffers
We need to make sure the array of value pointers are zero initialized,
because we use them in realloc later on and uninitialized non zero value
will cause allocation error and aborted execution.
Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170824162737.7813-9-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 24 Aug 2017 16:27:32 +0000 (18:27 +0200)]
perf report: Add dump_read function
Adding dump_read function to gather all the dump output of read
function. Adding output of enabled and running times and id if enabled
(3 new lines with '...' prefix below).
Do not use 'read' as a variable name as it breaks the build on older
systems, such as RHEL6:
CC /tmp/build/perf/util/session.o
cc1: warnings being treated as errors
util/session.c: In function 'dump_read':
util/session.c:1132: error: declaration of 'read' shadows a global declaration
/usr/include/bits/unistd.h:35: error: shadowed declaration is here
mv: cannot stat `/tmp/build/perf/util/.session.o.tmp': No such file or directory
Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170824162737.7813-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 24 Aug 2017 08:57:32 +0000 (10:57 +0200)]
perf c2c: Fix remote HITM detection for Skylake
Skylake introduced new mem_remote bit in union perf_mem_data_src [1].
It applies to any other memory level to express Remote unknown level, as
is reported by Skylake.
Adding this extra check to c2c_decode_stats to properly decode remote
HITMs on Skylake.
Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170824085732.28481-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Fri, 25 Aug 2017 18:45:10 +0000 (15:45 -0300)]
perf tools: Fix static build with newer toolchains
We can't pass --dynamic-list list into static build anymore, because
compilers starts to scream about that. Fedora 26 started to fail build
with following error:
$ make LDFLAGS=-static
...
/usr/bin/ld: dynamic STT_GNU_IFUNC symbol `strcmp' with pointer equality in `/usr/lib/gcc/x86_64-redhat-linux/7/../../../../lib64/libc.a(strcmp.o
+)' can not be used when making an executable; recompile with -fPIE and relink with -pie
There's no sense for --dynamic-list in static build, because there's no
.dynsym table in static binary. Consequently the traceevent plugins have
never worked with static build, but it was quietly passed by.
To fix this in future I think we should add support to compile plugins
within the perf binary directly for static build.
Andi Kleen [Tue, 22 Aug 2017 18:52:01 +0000 (11:52 -0700)]
perf/x86: Export some PMU attributes in caps/ directory
It can be difficult to figure out for user programs what features
the x86 CPU PMU driver actually supports. Currently it requires
grepping in dmesg, but dmesg is not always available.
This adds a caps directory to /sys/bus/event_source/devices/cpu/,
similar to the caps already used on intel_pt, which can be used to
discover the available capabilities cleanly.
Three capabilities are defined:
- pmu_name: Underlying CPU name known to the driver
- max_precise: Max precise level supported
- branches: Known depth of LBR.
Andi Kleen [Tue, 22 Aug 2017 18:52:00 +0000 (11:52 -0700)]
perf/x86: Only show format attributes when supported
Only show the Intel format attributes in sysfs when the feature is actually
supported with the current model numbers. This allows programs to probe
what format attributes are available, and give a sensible error message
to users if they are not.
This handles near all cases for intel attributes since Nehalem,
except the (obscure) case when the model number if known, but PEBS
is disabled in PERF_CAPABILITIES.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170822185201.9261-2-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
tracing, perf: Adjust code layout in get_recursion_context()
In an XDP redirect applications using tracepoint xdp:xdp_redirect to
diagnose TX overrun, I noticed perf_swevent_get_recursion_context()
was consuming 2% CPU. This was reduced to 1.85% with this simple
change.
Looking at the annotated asm code, it was clear that the unlikely case
in_nmi() test was chosen (by the compiler) as the most likely
event/branch. This small adjustment makes the compiler (GCC version
7.1.1 20170622 (Red Hat 7.1.1-3)) put in_nmi() as an unlikely branch.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/150342256382.16595.986861478681783732.stgit@firesoul Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Tue, 22 Aug 2017 15:59:28 +0000 (17:59 +0200)]
perf/core: Don't report zero PIDs for exiting tasks
The exiting/dead task has no PIDs and in this case perf_event_pid/tid()
return zero, change them to return -1 to distinguish this case from
idle threads.
Andi Kleen [Wed, 16 Aug 2017 22:21:54 +0000 (15:21 -0700)]
perf/x86: Fix data source decoding for Skylake
Skylake changed the encoding of the PEBS data source field.
Some combinations are not available anymore, but some new cases
e.g. for L4 cache hit are added.
Fix up the conversion table for Skylake, similar as had been done
for Nehalem.
On Skylake server the encoding for L4 actually means persistent
memory. Handle this case too.
To properly describe it in the abstracted perf format I had to add
some new fields. Since a hit can have only one level add a new
field that is an enumeration, not a bit field to describe
the level. It can describe any level. Some numbers are also
used to describe PMEM and LFB.
Also add a new generic remote flag that can be combined with
the generic level to signify a remote cache.
And there is an extension field for the snoop indication to handle
the Forward state.
I didn't add a generic flag for hops because it's not needed
for Skylake.
I changed the existing encodings for older CPUs to also fill in the
new level and remote fields.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: http://lkml.kernel.org/r/20170816222156.19953-3-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andi Kleen [Wed, 16 Aug 2017 22:21:53 +0000 (15:21 -0700)]
perf/x86: Move Nehalem PEBS code to flag
Minor cleanup: use an explicit x86_pmu flag to handle the
missing Lock / TLB information on Nehalem, instead of always
checking the model number for each PEBS sample.
Will Deacon [Wed, 16 Aug 2017 16:18:17 +0000 (17:18 +0100)]
perf/aux: Ensure aux_wakeup represents most recent wakeup index
The aux_watermark member of struct ring_buffer represents the period (in
terms of bytes) at which wakeup events should be generated when data is
written to the aux buffer in non-snapshot mode. On hardware that cannot
generate an interrupt when the aux_head reaches an arbitrary wakeup index
(such as ARM SPE), the aux_head sampled from handle->head in
perf_aux_output_{skip,end} may in fact be past the wakeup index. This
can lead to wakeup slowly falling behind the head. For example, consider
the case where hardware can only generate an interrupt on a page-boundary
and the aux buffer is initialised as follows:
and the hardware will be programmed to generate an interrupt at
PAGE_SIZE.
When the interrupt is raised, the hardware head will be at PAGE_SIZE,
so calling perf_aux_output_end(handle, PAGE_SIZE) puts the ring buffer
into the following state:
and then the next call to perf_aux_output_begin will result in:
handle->head = handle->wakeup = PAGE_SIZE
for which the semantics are unclear and, for a smaller aux_watermark
(e.g. PAGE_SIZE / 4), then the wakeup would in fact be behind head at
this point.
This patch fixes the problem by rounding down the aux_head (as sampled
from the handle) to the nearest aux_watermark boundary when updating
rb->aux_wakeup, therefore taking into account any overruns by the
hardware.
Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1502900297-21839-2-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Will Deacon [Wed, 16 Aug 2017 16:18:16 +0000 (17:18 +0100)]
perf/aux: Make aux_{head,wakeup} ring_buffer members long
The aux_head and aux_wakeup members of struct ring_buffer are defined
using the local_t type, despite the fact that they are only accessed via
the perf_aux_output_*() functions, which cannot race with each other for a
given ring buffer.
This patch changes the type of the members to long, so we can avoid
using the local_*() API where it isn't needed.
Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1502900297-21839-1-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mark Rutland [Thu, 22 Jun 2017 14:41:38 +0000 (15:41 +0100)]
perf/core: Fix group {cpu,task} validation
Regardless of which events form a group, it does not make sense for the
events to target different tasks and/or CPUs, as this leaves the group
inconsistent and impossible to schedule. The core perf code assumes that
these are consistent across (successfully intialised) groups.
Core perf code only verifies this when moving SW events into a HW
context. Thus, we can violate this requirement for pure SW groups and
pure HW groups, unless the relevant PMU driver happens to perform this
verification itself. These mismatched groups subsequently wreak havoc
elsewhere.
For example, we handle watchpoints as SW events, and reserve watchpoint
HW on a per-CPU basis at pmu::event_init() time to ensure that any event
that is initialised is guaranteed to have a slot at pmu::add() time.
However, the core code only checks the group leader's cpu filter (via
event_filter_match()), and can thus install follower events onto CPUs
violating thier (mismatched) CPU filters, potentially installing them
into a CPU without sufficient reserved slots.
This can be triggered with the below test case, resulting in warnings
from arch backends.
int main(int argc, char *argv[])
{
int leader, ret;
cpu_set_t cpus;
/*
* Force use of CPU0 to ensure our CPU0-bound events get scheduled.
*/
CPU_ZERO(&cpus);
CPU_SET(0, &cpus);
ret = sched_setaffinity(0, sizeof(cpus), &cpus);
if (ret) {
printf("Unable to set cpu affinity\n");
return 1;
}
/* open leader event, bound to this task, CPU0 only */
leader = perf_event_open(&wp_attr, 0, 0, -1, 0);
if (leader < 0) {
printf("Couldn't open leader: %d\n", leader);
return 1;
}
/*
* Open a follower event that is bound to the same task, but a
* different CPU. This means that the group should never be possible to
* schedule.
*/
ret = perf_event_open(&wp_attr, 0, 1, leader, 0);
if (ret < 0) {
printf("Couldn't open mismatched follower: %d\n", ret);
return 1;
} else {
printf("Opened leader/follower with mismastched CPUs\n");
}
/*
* Open as many independent events as we can, all bound to the same
* task, CPU0 only.
*/
do {
ret = perf_event_open(&wp_attr, 0, 0, -1, 0);
} while (ret >= 0);
/*
* Force enable/disble all events to trigger the erronoeous
* installation of the follower event.
*/
printf("Opened all events. Toggling..\n");
for (;;) {
prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0);
prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0);
}
return 0;
}
Fix this by validating this requirement regardless of whether we're
moving events.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhou Chengming <zhouchengming1@huawei.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1498142498-15758-1-git-send-email-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Thu, 24 Aug 2017 22:48:38 +0000 (15:48 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull more rdma fixes from Doug Ledford:
"Well, I thought we were going to be done for this -rc cycle. I should
have known better than to say so though.
We have four additional items that trickled in.
One was a simple mistake on my part. I took a patch into my for-next
thinking that the issue was less severe than it was. I was then
notified that it needed to be in my -rc area instead.
The other three were just found late in testing.
Summary:
- One core fix accidentally applied first to for-next and then cherry
picked back because it needed to be in the -rc cycles instead
- Another core fix
- Two mlx5 fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/mlx5: Always return success for RoCE modify port
IB/mlx5: Fix Raw Packet QP event handler assignment
IB/core: Avoid accessing non-allocated memory when inferring port type
RDMA/uverbs: Initialize cq_context appropriately
Linus Torvalds [Thu, 24 Aug 2017 21:56:20 +0000 (14:56 -0700)]
Merge tag 'acpi-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix two recent regressions (in ACPICA and in the ACPI EC driver)
and one bug in code introduced during the 4.12 cycle (ACPI device
properties library routine).
Specifics:
- Fix a regression in the ACPI EC driver causing a kernel to crash
during initialization on some systems due to a code ordering issue
exposed by a recent change (Lv Zheng).
- Fix a recent regression in ACPICA due to a change of the behavior
of a library function in a way that is not backwards compatible
with some existing callers of it (Rafael Wysocki).
- Fix a coding mistake in a library function related to the handling
of ACPI device properties introduced during the 4.12 cycle (Sakari
Ailus)"
* tag 'acpi-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: device property: Fix node lookup in acpi_graph_get_child_prop_value()
ACPICA: Fix acpi_evaluate_object_typed()
ACPI: EC: Fix regression related to wrong ECDT initialization order
Linus Torvalds [Thu, 24 Aug 2017 21:22:27 +0000 (14:22 -0700)]
Merge tag 'kbuild-fixes-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- fix linker script regression caused by dead code elimination support
- fix typos and outdated comments
- specify kselftest-clean as a PHONY target
- fix "make dtbs_install" when $(srctree) includes shell special
characters like '~'
- Move -fshort-wchar to the global option list because defining it
partially emits warnings
* tag 'kbuild-fixes-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: update comments of Makefile.asm-generic
kbuild: Do not use hyphen in exported variable name
Makefile: add kselftest-clean to PHONY target list
Kbuild: use -fshort-wchar globally
fixdep: trivial: typo fix and correction
kbuild: trivial cleanups on the comments
kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured
Linus Torvalds [Thu, 24 Aug 2017 21:08:22 +0000 (14:08 -0700)]
Merge tag 'trace-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Various bug fixes:
- Two small memory leaks in error paths.
- A missed return error code on an error path.
- A fix to check the tracing ring buffer CPU when it doesn't exist
(caused by setting maxcpus on the command line that is less than
the actual number of CPUs, and then onlining them manually).
- A fix to have the reset of boot tracers called by lateinit_sync()
instead of just lateinit(). As some of the tracers register via
lateinit(), and if the clear happens before the tracer is
registered, it will never start even though it was told to via the
kernel command line"
* tag 'trace-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix freeing of filter in create_filter() when set_str is false
tracing: Fix kmemleak in tracing_map_array_free()
ftrace: Check for null ret_stack on profile function graph entry function
ring-buffer: Have ring_buffer_alloc_read_page() return error on offline CPU
tracing: Missing error code in tracer_alloc_buffers()
tracing: Call clear_boot_tracer() at lateinit_sync
Linus Torvalds [Thu, 24 Aug 2017 21:01:18 +0000 (14:01 -0700)]
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"A small number of bugfixes, again nothing serious.
- Alexander Dahl found multiple bugs in the Atmel memory interface
driver
- A randconfig build fix for at91 was incomplete, the second attempt
fixes the remaining corner case
- One fix for the TI Keystone queue handler
- The Odroid XU4 HDMI port (added in 4.13) needs a small DT fix"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: dts: exynos: add needs-hpd for Odroid-XU3/4
ARM: at91: don't select CONFIG_ARM_CPU_SUSPEND for old platforms
soc: ti: knav: Add a NULL pointer check for kdev in knav_pool_create
memory: atmel-ebi: Fix smc cycle xlate converter
memory: atmel-ebi: Allow t_DF timings of zero ns
memory: atmel-ebi: Fix smc timing return value evaluation
When /dev/ptmx (as opposed to /dev/pts/ptmx) is opened the wrong
vfsmount is passed to dentry_open. Which results in the kernel displaying
the wrong pathname for the peer.
The second is simply by caching the vfsmount and dentry of the peer it leaves
them open, in a way they were not previously Which because of the inreased
reference counts can cause unnecessary behaviour differences resulting in
regressions.
To fix these move the ioctl into tty_io.c at a generic level allowing
the ioctl to have access to the struct file on which the ioctl is
being called. This allows the path of the slave to be derived when
opening the slave through TIOCGPTPEER instead of requiring the path to
the slave be cached. Thus removing the need for caching the path.
A new function devpts_ptmx_path is factored out of devpts_acquire and
used to implement a function devpts_mntget. The new function devpts_mntget
takes a filp to perform the lookup on and fsi so that it can confirm
that the superblock that is found by devpts_ptmx_path is the proper superblock.
v2: Lots of fixes to make the code actually work
v3: Suggestions by Linus
- Removed the unnecessary initialization of filp in ptm_open_peer
- Simplified devpts_ptmx_path as gotos are no longer required
[ This is the fix for the issue that was reverted in commit 143c97cc6529, but this time without breaking 'pbuilder' due to
increased reference counts - Linus ]
Fixes: 54ebbfb16034 ("tty: add TIOCGPTPEER ioctl") Reported-by: Christian Brauner <christian.brauner@canonical.com> Reported-and-tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Majd Dibbiny [Wed, 23 Aug 2017 05:35:42 +0000 (08:35 +0300)]
IB/mlx5: Always return success for RoCE modify port
CM layer calls ib_modify_port() regardless of the link layer.
For the Ethernet ports, qkey violation and Port capabilities
are meaningless. Therefore, always return success for ib_modify_port
calls on the Ethernet ports.
Cc: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
Noa Osherovich [Wed, 23 Aug 2017 05:35:40 +0000 (08:35 +0300)]
IB/core: Avoid accessing non-allocated memory when inferring port type
Commit 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
introduced the concept of type in ah_attr:
* During ib_register_device, each port is checked for its type which
is stored in ib_device's port_immutable array.
* During uverbs' modify_qp, the type is inferred using the port number
in ib_uverbs_qp_dest struct (address vector) by accessing the
relevant port_immutable array and the type is passed on to
providers.
IB spec (version 1.3) enforces a valid port value only in Reset to
Init. During Init to RTR, the address vector must be valid but port
number is not mentioned as a field in the address vector, so its
value is not validated, which leads to accesses to a non-allocated
memory when inferring the port type.
Save the real port number in ib_qp during modify to Init (when the
comp_mask indicates that the port number is valid) and use this value
to infer the port type.
Avoid copying the address vector fields if the matching bit is not set
in the attr_mask. Address vector can't be modified before the port, so
no valid flow is affected.
Omar Sandoval [Wed, 23 Aug 2017 06:45:59 +0000 (23:45 -0700)]
Btrfs: fix blk_status_t/errno confusion
This fixes several instances of blk_status_t and bare errno ints being
mixed up, some of which are real bugs.
In the normal case, 0 matches BLK_STS_OK, so we don't observe any
effects of the missing conversion, but in case of errors or passes
through the repair/retry paths, the errors get mixed up.
The changes were identified using 'sparse', we don't have reports of the
buggy behaviour.
Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t") Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
The function create_filter() is passed a 'filterp' pointer that gets
allocated, and if "set_str" is true, it is up to the caller to free it, even
on error. The problem is that the pointer is not freed by create_filter()
when set_str is false. This is a bug, and it is not up to the caller to free
the filter on error if it doesn't care about the string.
Link: http://lkml.kernel.org/r/1502705898-27571-2-git-send-email-chuhu@redhat.com Cc: stable@vger.kernel.org Fixes: 38b78eb85 ("tracing: Factorize filter creation") Reported-by: Chunyu Hu <chuhu@redhat.com> Tested-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
ftrace: Check for null ret_stack on profile function graph entry function
There's a small race when function graph shutsdown and the calling of the
registered function graph entry callback. The callback must not reference
the task's ret_stack without first checking that it is not NULL. Note, when
a ret_stack is allocated for a task, it stays allocated until the task exits.
The problem here, is that function_graph is shutdown, and a new task was
created, which doesn't have its ret_stack allocated. But since some of the
functions are still being traced, the callbacks can still be called.
The normal function_graph code handles this, but starting with commit 8861dd303c ("ftrace: Access ret_stack->subtime only in the function
profiler") the profiler code references the ret_stack on function entry, but
doesn't check if it is NULL first.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196611 Cc: stable@vger.kernel.org Fixes: 8861dd303c ("ftrace: Access ret_stack->subtime only in the function profiler") Reported-by: lilydjwg@gmail.com Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Ingo Molnar [Thu, 24 Aug 2017 08:12:59 +0000 (10:12 +0200)]
Merge tag 'perf-core-for-mingo-4.14-20170823' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
- Expression parser enhancements for metrics (Andi Kleen)
- Fix buffer overflow while freeing events in 'perf stat' (Andi Kleen)
- Fix static linking with elfutils's libdf and with libunwind
in Debian/Ubuntu (Konstantin Khlebnikov)
- Tighten detection of BPF events, avoiding matching some other PMU
events such as 'cpu/uops_executed.core,cmask=1/' as a .c source
file that ended up being considered a BPF event (Andi Kleen)
- Add Skylake server uncore JSON vendor events (Andi Kleen)
- Add support for printing new mem_info encodings, including
'perf test' checks (Andi Kleen)
- Really install manpages via 'make install-man' (Konstantin Khlebnikov)
- Fix documentation for perf_event_paranoid and perf_event_mlock_kb
sysctls (Konstantin Khlebnikov)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
It turns out that while fixing the ptmx file descriptor to have the
correct 'struct path' to the associated slave pty is a really good
thing, it breaks some user space tools for a very annoying reason.
The problem is that /dev/ptmx and its associated slave pty (/dev/pts/X)
are on different mounts. That was what caused us to have the wrong path
in the first place (we would mix up the vfsmount of the 'ptmx' node,
with the dentry of the pty slave node), but it also means that now while
we use the right vfsmount, having the pty master open also keeps the pts
mount busy.
And it turn sout that that makes 'pbuilder' very unhappy, as noted by
Stefan Lippers-Hollmann:
"This patch introduces a regression for me when using pbuilder
0.228.7[2] (a helper to build Debian packages in a chroot and to
create and update its chroots) when trying to umount /dev/ptmx (inside
the chroot) on Debian/ unstable (full log and pbuilder configuration
file[3] attached).
[...]
Setting up build-essential (12.3) ...
Processing triggers for libc-bin (2.24-15) ...
I: unmounting dev/ptmx filesystem
W: Could not unmount dev/ptmx: umount: /var/cache/pbuilder/build/1340/dev/ptmx: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)"
apparently pbuilder tries to unmount the /dev/pts filesystem while still
holding at least one master node open, which is arguably not very nice,
but we don't break user space even when fixing other bugs.
So this commit has to be reverted.
I'll try to figure out a way to avoid caching the path to the slave pty
in the master pty. The only thing that actually wants that slave pty
path is the "TIOCGPTPEER" ioctl, and I think we could just recreate the
path at that time.
Reported-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Cc: Eric W Biederman <ebiederm@xmission.com> Cc: Christian Brauner <christian.brauner@canonical.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hans Verkuil [Wed, 23 Aug 2017 16:24:50 +0000 (18:24 +0200)]
ARM: dts: exynos: add needs-hpd for Odroid-XU3/4
CEC support was added for Exynos5 in 4.13, but for the Odroids we need to set
'needs-hpd' as well since CEC is disabled when there is no HDMI hotplug signal,
just as for the exynos4 Odroid-U3.
This is due to the level-shifter that is disabled when there is no HPD, thus
blocking the CEC signal as well. Same close-but-no-cigar board design as the
Odroid-U3.
Tested with my Odroid XU4.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Linus Torvalds [Wed, 23 Aug 2017 19:05:46 +0000 (12:05 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Late arm64 fixes.
They fix very early boot failures with KASLR where the early mapping
of the kernel is incorrect, so the failure mode looks like a hang with
no output. There's also a signal-handling fix when a uaccess routine
faults with a fatal signal pending, which could be used to create
unkillable user tasks using userfaultfd and finally a state leak fix
for the floating pointer registers across a call to exec().
We're still seeing some random issues crop up (inode memory corruption
and spinlock recursion) but we've not managed to reproduce things
reliably enough to debug or bisect them yet.
Summary:
- Fix very early boot failures with KASLR enabled
- Fix fatal signal handling on userspace access from kernel
- Fix leakage of floating point register state across exec()"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kaslr: Adjust the offset to avoid Image across alignment boundary
arm64: kaslr: ignore modulo offset when validating virtual displacement
arm64: mm: abort uaccess retries upon fatal signal
arm64: fpsimd: Prevent registers leaking across exec
Linus Torvalds [Wed, 23 Aug 2017 18:43:38 +0000 (11:43 -0700)]
Merge tag 'gpio-v4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Here are the (hopefully) last GPIO fixes for v4.13:
- an important core fix to reject invalid GPIOs *before* trying to
obtain a GPIO descriptor for it.
- a driver fix for the mvebu driver IRQ handling"
* tag 'gpio-v4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: mvebu: Fix cause computation in irq handler
gpio: reject invalid gpio before getting gpio_desc
Linus Torvalds [Wed, 23 Aug 2017 18:34:40 +0000 (11:34 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Six minor and error leg fixes, plus one major change: the reversion of
scsi-mq as the default.
We're doing the latter temporarily (with a backport to stable) to give
us time to fix all the issues that turned up with this default before
trying again"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: cxgb4i: call neigh_event_send() to update MAC address
Revert "scsi: default to scsi-mq"
scsi: sd_zbc: Write unlock zone from sd_uninit_cmnd()
scsi: aacraid: Fix out of bounds in aac_get_name_resp
scsi: csiostor: fail probe if fw does not support FCoE
scsi: megaraid_sas: fix error handle in megasas_probe_one
Arnd Bergmann [Wed, 23 Aug 2017 14:46:15 +0000 (16:46 +0200)]
ARM: at91: don't select CONFIG_ARM_CPU_SUSPEND for old platforms
My previous patch fixed a link error for all at91 platforms when
CONFIG_ARM_CPU_SUSPEND was not set, however this caused another
problem on a configuration that enabled CONFIG_ARCH_AT91 but none
of the individual SoCs, and that also enabled CPU_ARM720 as
the only CPU:
warning: (ARCH_AT91 && SOC_IMX23 && SOC_IMX28 && ARCH_PXA && MACH_MVEBU_V7 && SOC_IMX6 && ARCH_OMAP3 && ARCH_OMAP4 && SOC_OMAP5 && SOC_AM33XX && SOC_DRA7XX && ARCH_EXYNOS3 && ARCH_EXYNOS4 && EXYNOS5420_MCPM && EXYNOS_CPU_SUSPEND && ARCH_VEXPRESS_TC2_PM && ARM_BIG_LITTLE_CPUIDLE && ARM_HIGHBANK_CPUIDLE && QCOM_PM) selects ARM_CPU_SUSPEND which has unmet direct dependencies (ARCH_SUSPEND_POSSIBLE)
arch/arm/kernel/sleep.o: In function `cpu_resume':
(.text+0xf0): undefined reference to `cpu_arm720_suspend_size'
arch/arm/kernel/suspend.o: In function `__cpu_suspend_save':
suspend.c:(.text+0x134): undefined reference to `cpu_arm720_do_suspend'
This improves the hack some more by only selecting ARM_CPU_SUSPEND
for the part that requires it, and changing pm.c to drop the
contents of unused init functions so we no longer refer to
cpu_resume on at91 platforms that don't need it.
Sakari Ailus [Tue, 22 Aug 2017 20:39:58 +0000 (23:39 +0300)]
ACPI: device property: Fix node lookup in acpi_graph_get_child_prop_value()
acpi_graph_get_child_prop_value() is intended to find a child node with a
certain property value pair. The check
if (!fwnode_property_read_u32(fwnode, prop_name, &nr))
continue;
is faulty: fwnode_property_read_u32() returns zero on success, not on
failure, leading to comparing values only if the searched property was not
found.
Moreover, the check is made against the parent device node instead of
the child one as it should be.
Fixes: 79389a83bc38 (ACPI / property: Add support for remote endpoints) Reported-by: Hyungwoo Yang <hyungwoo.yang@intel.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: 4.12+ <stable@vger.kernel.org> # 4.12+
[ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 2d2a954375a0 (ACPICA: Update two error messages to emit
control method name) causes acpi_evaluate_object_typed() to fail
if its pathname argument is NULL, but some callers of that function
in the kernel, particularly acpi_nondev_subnode_data_ok(), pass
NULL as pathname to it and expect it to work.
For this reason, make acpi_evaluate_object_typed() check if its
pathname argument is NULL and fall back to using the pathname of
its handle argument if that is the case.
Reported-by: Sakari Ailus <sakari.ailus@intel.com> Tested-by: Yang, Hyungwoo <hyungwoo.yang@intel.com> Fixes: 2d2a954375a0 (ACPICA: Update two error messages to emit control method name) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Bharat Potnuri [Tue, 1 Aug 2017 05:28:35 +0000 (10:58 +0530)]
RDMA/uverbs: Initialize cq_context appropriately
Initializing cq_context with ev_queue in create_cq(), leads to NULL pointer
dereference in ib_uverbs_comp_handler(), if application doesnot use completion
channel. This patch fixes the cq_context initialization.
Fixes: 1e7710f3f65 ("IB/core: Change completion channel to use the reworked") Cc: stable@vger.kernel.org # 4.12 Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from commit 699a2d5b1b880b4e4e1c7d55fa25659322cf5b51)
Linus Torvalds [Tue, 22 Aug 2017 17:21:05 +0000 (10:21 -0700)]
Merge tag 'mfd-fixes-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD fix from Lee Jones:
"Revert duplicate commit in da9062-core"
* tag 'mfd-fixes-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
Revert "mfd: da9061: Fix to remove BBAT_CONT register from chip model"
Catalin Marinas [Tue, 22 Aug 2017 14:39:00 +0000 (15:39 +0100)]
arm64: kaslr: Adjust the offset to avoid Image across alignment boundary
With 16KB pages and a kernel Image larger than 16MB, the current
kaslr_early_init() logic for avoiding mappings across swapper table
boundaries fails since increasing the offset by kimg_sz just moves the
problem to the next boundary.
This patch rounds the offset down to (1 << SWAPPER_TABLE_SHIFT) if the
Image crosses a PMD_SIZE boundary.
Fixes: afd0e5a87670 ("arm64: kaslr: Fix up the kernel image alignment") Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
Ard Biesheuvel [Fri, 18 Aug 2017 17:42:30 +0000 (18:42 +0100)]
arm64: kaslr: ignore modulo offset when validating virtual displacement
In the KASLR setup routine, we ensure that the early virtual mapping
of the kernel image does not cover more than a single table entry at
the level above the swapper block level, so that the assembler routines
involved in setting up this mapping can remain simple.
In this calculation we add the proposed KASLR offset to the values of
the _text and _end markers, and reject it if they would end up falling
in different swapper table sized windows.
However, when taking the addresses of _text and _end, the modulo offset
(the physical displacement modulo 2 MB) is already accounted for, and
so adding it again results in incorrect results. So disregard the modulo
offset from the calculation.
Mark Rutland [Tue, 11 Jul 2017 14:19:22 +0000 (15:19 +0100)]
arm64: mm: abort uaccess retries upon fatal signal
When there's a fatal signal pending, arm64's do_page_fault()
implementation returns 0. The intent is that we'll return to the
faulting userspace instruction, delivering the signal on the way.
However, if we take a fatal signal during fixing up a uaccess, this
results in a return to the faulting kernel instruction, which will be
instantly retried, resulting in the same fault being taken forever. As
the task never reaches userspace, the signal is not delivered, and the
task is left unkillable. While the task is stuck in this state, it can
inhibit the forward progress of the system.
To avoid this, we must ensure that when a fatal signal is pending, we
apply any necessary fixup for a faulting kernel instruction. Thus we
will return to an error path, and it is up to that code to make forward
progress towards delivering the fatal signal.
Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Laura Abbott <labbott@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Steve Capper <steve.capper@arm.com> Tested-by: Steve Capper <steve.capper@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Tested-by: James Morse <james.morse@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
Dave Martin [Fri, 18 Aug 2017 15:57:01 +0000 (16:57 +0100)]
arm64: fpsimd: Prevent registers leaking across exec
There are some tricky dependencies between the different stages of
flushing the FPSIMD register state during exec, and these can race
with context switch in ways that can cause the old task's regs to
leak across. In particular, a context switch during the memset() can
cause some of the task's old FPSIMD registers to reappear.
Disabling preemption for this small window would be no big deal for
performance: preemption is already disabled for similar scenarios
like updating the FPSIMD registers in sigreturn.
So, instead of rearranging things in ways that might swap existing
subtle bugs for new ones, this patch just disables preemption
around the FPSIMD state flushing so that races of this type can't
occur here. This brings fpsimd_flush_thread() into line with other
code paths.
Cc: stable@vger.kernel.org Fixes: 674c242c9323 ("arm64: flush FP/SIMD state correctly after execve()") Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* libunwind-x86_64 must be linked before libunwind
* libunwind requires liblzma
* static libunwind conflicts with static libgcc_eh
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/150322917247.129799.14247751517961953155.stgit@buzz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf tools: Fix static linking with libdw from elfutils
Fix feature test for static libdw: link required dependencies. Backends
of libebl are not statically linked thus libdl is required.
In Debian/Ubuntu libdw-dev includes libebl.a starting from 0.166-1.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/150322916720.129772.7959925864494283854.stgit@buzz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf: Fix documentation for sysctls perf_event_paranoid and perf_event_mlock_kb
Fix misprint CAP_IOC_LOCK -> CAP_IPC_LOCK. This capability have nothing
to do with raw tracepoints. This part is about bypassing mlock limits.
Sysctl kernel.perf_event_paranoid = -1 allows raw and ftrace function
tracepoints without CAP_SYS_ADMIN.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/150322916080.129746.11285255474738558340.stgit@buzz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Andi Kleen [Wed, 16 Aug 2017 22:21:55 +0000 (15:21 -0700)]
perf tools: Add support for printing new mem_info encodings
Add decoding for the new "lvlx" and "snoopx" meminfo fields added
earlier to the kernel so that "perf mem report" and other tools can
print it properly.
v2: Merge with persistent memory patch.
Switch to new bit encoding for each combination.
Andi Kleen [Fri, 10 Mar 2017 20:51:38 +0000 (12:51 -0800)]
perf vendor events: Add Skylake server uncore event list
Add JSON uncore events for Skylake Server to perf.
Based on JSON list V1.01
This is a much fuller list than with earlier uncores, including
more low level (but also harder to understand) events. It does not
include the "experimential" events. The previous
high level metric (LLC_* etc.) are still available when applicable.
C state power events are not included at this point.
This is useful to fix up problems caused by multiplexing.
- Support | & ^ operators
- Minor cleanups and fixes
- Support an \ escape for operators. This allows to specify event names
like c2-residency
- Support @ as an alternative for / to be able to specify pmus without
conflicts with operators (like msr/tsc/ as msr@tsc@)
Andi Kleen [Fri, 11 Aug 2017 23:26:19 +0000 (16:26 -0700)]
perf bpf: Tighten detection of BPF events
perf stat -e cpu/uops_executed.core,cmask=1/
would be detected as a BPF source event because the .c matches the .c
source BPF pattern.
v2:
Originally I tried to use lex lookahead, but it doesn't seem to work.
This now extends the BPF pattern to match longer events, but then does
an extra check in the C code to reject BPF matches that do not end with
.c/.o/.obj
This uses REJECT, which makes the flex scanner slower, but that
shouldn't be a big problem for the perf events.
It continues doing what was expected, i.e. identifying
/home/acme/bpf/tracepoint.c as a BPF event and activates the clang
machinery to build an eBPF object and then uses sys_bpf() to hook it up
to the raw_syscalls:sys_enter tracepoint, etc.
Andi forgot to add Wang to the CC list, fix it.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20170811232634.30465-4-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Andi Kleen [Fri, 11 Aug 2017 23:26:17 +0000 (16:26 -0700)]
perf evsel: Fix buffer overflow while freeing events
Fix buffer overflow for:
% perf stat -e msr/tsc/,cstate_core/c7-residency/ true
that causes glibc free list corruption. For some reason it doesn't
trigger in valgrind, but it is visible in AS:
=================================================================
==32681==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000003f5c at pc 0x0000005671ef bp 0x7ffdaaac9ac0 sp 0x7ffdaaac9ab0
READ of size 4 at 0x603000003f5c thread T0
#0 0x5671ee in perf_evsel__close_fd util/evsel.c:1196
#1 0x56c57a in perf_evsel__close util/evsel.c:1717
#2 0x55ed5f in perf_evlist__close util/evlist.c:1631
#3 0x4647e1 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:749
#4 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767
#5 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785
#6 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296
#7 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348
#8 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392
#9 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530
#10 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400)
#11 0x428419 in _start (/home/ak/hle/obj-perf/perf+0x428419)
0x603000003f5c is located 0 bytes to the right of 28-byte region [0x603000003f40,0x603000003f5c)
allocated by thread T0 here:
#0 0x7f0675139020 in calloc (/lib64/libasan.so.3+0xc7020)
#1 0x648a2d in zalloc util/util.h:23
#2 0x648a88 in xyarray__new util/xyarray.c:9
#3 0x566419 in perf_evsel__alloc_fd util/evsel.c:1039
#4 0x56b427 in perf_evsel__open util/evsel.c:1529
#5 0x56c620 in perf_evsel__open_per_thread util/evsel.c:1730
#6 0x461dea in create_perf_stat_counter /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:263
#7 0x4637d7 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:600
#8 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767
#9 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785
#10 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296
#11 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348
#12 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392
#13 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530
#14 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400)
The event is allocated with cpus == 1, but freed with cpus == real number
When the evsel close function walks the file descriptors it exceeds the
fd xyarray boundaries and reads random memory.
v2:
Now that xyarrays save their original dimensions we can use these to
iterate the two dimensional fd arrays. Fix some users (close, ioctl) in
evsel.c to use these fields directly. This allows simplifying the code
and dropping quite a few function arguments. Adjust all callers by
removing the unneeded arguments.
The actual perf event reading still uses the original values from the
evsel list.
Ingo Molnar [Tue, 22 Aug 2017 10:16:39 +0000 (12:16 +0200)]
Merge tag 'perf-core-for-mingo-4.14-20170821' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
- Support --show-nr-samples in annotate's --stdio and --tui, using
the existing 't' toggle to circulate 'percent', 'total-period' and
'nr-samples' as the first column (Taeung Song)
- Support FCMask and PortMask in JSON vendor events (Andi Kleen)
- Fix off by one string allocation problem in 'perf trace' (Arnaldo Carvalho de Melo)
- Use just one parse events state struct in yyparse(), fixing one
reported segfault when a routine received a different data struct,
smaller than the one it expected to use (Arnaldo Carvalho de Melo)
- Remove unused cpu_relax() macros, they stopped being used when
tools/perf lived in Documentation/ (Arnaldo Carvalho de Melo)
- Fix double file test in libbpf's Makefile (Daniel Díaz):
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fixes: b77eb79acca3 ("mfd: da9061: Fix to remove BBAT_CONT register from chip model") Reported-by: Steve Twiss <stwiss.opensource@diasemi.com> Reviewed-by: Steve Twiss <stwiss.opensource@diasemi.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
Thomas Petazzoni [Sun, 13 Aug 2017 21:14:58 +0000 (23:14 +0200)]
sparc: kernel/pcic: silence gcc 7.x warning in pcibios_fixup_bus()
When building the kernel for Sparc using gcc 7.x, the build fails
with:
arch/sparc/kernel/pcic.c: In function ‘pcibios_fixup_bus’:
arch/sparc/kernel/pcic.c:647:8: error: ‘cmd’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
cmd |= PCI_COMMAND_IO;
^~
I.e, the code assumes that pcic_read_config() will always initialize
cmd. But it's not the case. Looking at pcic_read_config(), if
bus->number is != 0 or if the size is not one of 1, 2 or 4, *val will
not be initialized.
As a simple fix, we initialize cmd to zero at the beginning of
pcibios_fixup_bus.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 21 Aug 2017 20:30:36 +0000 (13:30 -0700)]
Merge tag 'arc-4.13-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- PAE40 related updates
- SLC errata for region ops
- intc line masking by default
* tag 'arc-4.13-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
arc: Mask individual IRQ lines during core INTC init
ARCv2: PAE40: set MSB even if !CONFIG_ARC_HAS_PAE40 but PAE exists in SoC
ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses
ARC: dma: implement dma_unmap_page and sg variant
ARCv2: SLC: Make sure busy bit is set properly for region ops
ARC: [plat-sim] Include this platform unconditionally
ARC: [plat-axs10x]: prepare dts files for enabling PAE40 on axs103
ARC: defconfig: Cleanup from old Kconfig options
2) Fix timer access to freed object in dccp, from Eric Dumazet.
3) Use kmalloc_array() in ptr_ring to avoid overflow cases which are
triggerable by userspace. Also from Eric Dumazet.
4) Fix infinite loop in unmapping cleanup of nfp driver, from Colin Ian
King.
5) Correct datagram peek handling of empty SKBs, from Matthew Dawson.
6) Fix use after free in TIPC, from Eric Dumazet.
7) When replacing a route in ipv6 we need to reset the round robin
pointer, from Wei Wang.
8) Fix bug in pci_find_pcie_root_port() which was unearthed by the
relaxed ordering changes, from Thierry Redding. I made sure to get
an explicit ACK from Bjorn this time around :-)
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
ipv6: repair fib6 tree in failure case
net_sched: fix order of queue length updates in qdisc_replace()
tools lib bpf: improve warning
switchdev: documentation: minor typo fixes
bpf, doc: also add s390x as arch to sysctl description
net: sched: fix NULL pointer dereference when action calls some targets
rxrpc: Fix oops when discarding a preallocated service call
irda: do not leak initialized list.dev to userspace
net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled
PCI: Allow PCI express root ports to find themselves
tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
net: check and errout if res->fi is NULL when RTM_F_FIB_MATCH is set
ipv6: reset fn->rr_ptr when replacing route
sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
tipc: fix use-after-free
tun: handle register_netdevice() failures properly
datagram: When peeking datagrams with offset < 0 don't skip empty skbs
bpf, doc: improve sysctl knob description
netxen: fix incorrect loop counter decrement
nfp: fix infinite loop on umapping cleanup
...
Oleg Nesterov [Mon, 21 Aug 2017 15:35:02 +0000 (17:35 +0200)]
pids: make task_tgid_nr_ns() safe
This was reported many times, and this was even mentioned in commit 52ee2dfdd4f5 ("pids: refactor vnr/nr_ns helpers to make them safe") but
somehow nobody bothered to fix the obvious problem: task_tgid_nr_ns() is
not safe because task->group_leader points to nowhere after the exiting
task passes exit_notify(), rcu_read_lock() can not help.
We really need to change __unhash_process() to nullify group_leader,
parent, and real_parent, but this needs some cleanups. Until then we
can turn task_tgid_nr_ns() into another user of __task_pid_nr_ns() and
fix the problem.
Current max_register setting breaks reading nvram on certain chips and
also reading the standard registers on RX8130 where register map starts
at 0x10.
soc: ti: knav: Add a NULL pointer check for kdev in knav_pool_create
knav_pool_create is an exported function. In the event of a call
before knav_queue_probe, we encounter a NULL pointer dereference
in the following line. Hence return -EPROBE_DEFER to the caller till
the kdev pointer is non-NULL.
Wei Wang [Sat, 19 Aug 2017 00:14:49 +0000 (17:14 -0700)]
ipv6: repair fib6 tree in failure case
In fib6_add(), it is possible that fib6_add_1() picks an intermediate
node and sets the node's fn->leaf to NULL in order to add this new
route. However, if fib6_add_rt2node() fails to add the new
route for some reason, fn->leaf will be left as NULL and could
potentially cause crash when fn->leaf is accessed in fib6_locate().
This patch makes sure fib6_repair_tree() is called to properly repair
fn->leaf in the above failure case.
net_sched: fix order of queue length updates in qdisc_replace()
This important to call qdisc_tree_reduce_backlog() after changing queue
length. Parent qdisc should deactivate class in ->qlen_notify() called from
qdisc_tree_reduce_backlog() but this happens only if qdisc->q.qlen in zero.
Missed class deactivations leads to crashes/warnings at picking packets
from empty qdisc and corrupting state at reactivating this class in future.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Fixes: 86a7996cc8a0 ("net_sched: introduce qdisc_replace() helper") Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Leblond [Sun, 20 Aug 2017 19:48:14 +0000 (21:48 +0200)]
tools lib bpf: improve warning
Signed-off-by: Eric Leblond <eric@regit.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sun, 20 Aug 2017 22:26:03 +0000 (00:26 +0200)]
bpf, doc: also add s390x as arch to sysctl description
Looks like this was accidentally missed, so still add s390x
as supported eBPF JIT arch to bpf_jit_enable.
Fixes: 014cd0a368dc ("bpf: Update sysctl documentation to list all supported architectures") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sat, 19 Aug 2017 21:30:02 +0000 (22:30 +0100)]
kbuild: Do not use hyphen in exported variable name
This definition in Makefile.dtbinst:
export dtbinst-root ?= $(obj)
should define and export dtbinst-root when handling the root dts
directory, and do nothing in the subdirectories. However some shells,
including dash, will not pass through environment variables whose name
includes a hyphen. Usually GNU make does not use a shell to recurse,
but if e.g. $(srctree) contains '~' it will use a shell here.
Rename the variable to dtbinst_root.
References: https://bugs.debian.org/833561 Fixes: 323a028d39cdi ("dts, kbuild: Implement support for dtb vendor subdirs") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Commit 971a69db7dc0 ("Xen: don't warn about 2-byte wchar_t in efi")
added the --no-wchar-size-warning to the Makefile to avoid this
harmless warning:
arm-linux-gnueabi-ld: warning: drivers/xen/efi.o uses 2-byte wchar_t yet the output is to use 4-byte wchar_t; use of wchar_t values across objects may fail
Changing kbuild to use thin archives instead of recursive linking
unfortunately brings the same warning back during the final link.
The kernel does not use wchar_t string literals at this point, and
xen does not use wchar_t at all (only efi_char16_t), so the flag
has no effect, but as pointed out by Jan Beulich, adding a wchar_t
string literal would be bad here.
Since wchar_t is always defined as u16, independent of the toolchain
default, always passing -fshort-wchar is correct and lets us
remove the Xen specific hack along with fixing the warning.
Link: https://patchwork.kernel.org/patch/9275217/ Fixes: 971a69db7dc0 ("Xen: don't warn about 2-byte wchar_t in efi") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Linus Torvalds [Sun, 20 Aug 2017 20:26:27 +0000 (13:26 -0700)]
Sanitize 'move_pages()' permission checks
The 'move_paghes()' system call was introduced long long ago with the
same permission checks as for sending a signal (except using
CAP_SYS_NICE instead of CAP_SYS_KILL for the overriding capability).
That turns out to not be a great choice - while the system call really
only moves physical page allocations around (and you need other
capabilities to do a lot of it), you can check the return value to map
out some the virtual address choices and defeat ASLR of a binary that
still shares your uid.
So change the access checks to the more common 'ptrace_may_access()'
model instead.
This tightens the access checks for the uid, and also effectively
changes the CAP_SYS_NICE check to CAP_SYS_PTRACE, but it's unlikely that
anybody really _uses_ this legacy system call any more (we hav ebetter
NUMA placement models these days), so I expect nobody to notice.
Famous last words.
Reported-by: Otto Ebeling <otto.ebeling@iki.fi> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Willy Tarreau <w@1wt.eu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 20 Aug 2017 16:36:52 +0000 (09:36 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Another pile of small fixes and updates for x86:
- Plug a hole in the SMAP implementation which misses to clear AC on
NMI entry
- Fix the norandmaps/ADDR_NO_RANDOMIZE logic so the command line
parameter works correctly again
- Use the proper accessor in the startup64 code for next_early_pgt to
prevent accessing of invalid addresses and faulting in the early
boot code.
- Prevent CPU hotplug lock recursion in the MTRR code
- Unbreak CPU0 hotplugging
- Rename overly long CPUID bits which got introduced in this cycle
- Two commits which mark data 'const' and restrict the scope of data
and functions to file scope by making them 'static'"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Constify attribute_group structures
x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt'
x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks
x86: Fix norandmaps/ADDR_NO_RANDOMIZE
x86/mtrr: Prevent CPU hotplug lock recursion
x86: Mark various structures and functions as 'static'
x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized VMSAVE/VMLOAD" CPUID flag
x86/smpboot: Unbreak CPU0 hotplug
x86/asm/64: Clear AC on NMI entries
Linus Torvalds [Sun, 20 Aug 2017 16:20:57 +0000 (09:20 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"Two fixes for the perf subsystem:
- Fix an inconsistency of RDPMC mm struct tagging across exec() which
causes RDPMC to fault.
- Correct the timestamp mechanics across IOC_DISABLE/ENABLE which
causes incorrect timestamps and total time calculations"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix time on IOC_ENABLE
perf/x86: Fix RDPMC vs. mm_struct tracking