Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSan/MSan builds failures #64086

Closed
azat opened this issue May 18, 2024 · 8 comments · Fixed by #64090 or #64091
Closed

TSan/MSan builds failures #64086

azat opened this issue May 18, 2024 · 8 comments · Fixed by #64090 or #64091
Labels
comp-ci Continuous integration

Comments

@azat
Copy link
Collaborator

azat commented May 18, 2024

TSan

May 17 18:34:12 FAILED: contrib/arrow-cmake/orc_proto.pb.h contrib/arrow-cmake/orc_proto.pb.cc /build/build_docker/contrib/arrow-cmake/orc_proto.pb.h /build/build_docker/contrib/arrow-cmake/orc_proto.pb.cc 
May 17 18:34:12 cd /build/build_docker/contrib/arrow-cmake && /build/build_docker/contrib/google-protobuf-cmake/protoc -I /build/contrib/orc/c++/../proto --cpp_out="/build/build_docker/contrib/arrow-cmake" /build/contrib/orc/c++/../proto/orc_proto.proto
May 17 18:34:12 ThreadSanitizer: CHECK failed: tsan_platform_linux.cpp:282 "((personality(old_personality | ADDR_NO_RANDOMIZE))) != ((-1))" (0xffffffffffffffff, 0xffffffffffffffff) (tid=123304)
May 17 18:34:12 Segmentation fault (core dumped)

MSan

May 17 18:31:34 [3061/12202] Generating orc_proto.pb.h, orc_proto.pb.cc
May 17 18:31:34 FAILED: contrib/arrow-cmake/orc_proto.pb.h contrib/arrow-cmake/orc_proto.pb.cc /build/build_docker/contrib/arrow-cmake/orc_proto.pb.h /build/build_docker/contrib/arrow-cmake/orc_proto.pb.cc 
May 17 18:31:34 cd /build/build_docker/contrib/arrow-cmake && /build/build_docker/contrib/google-protobuf-cmake/protoc -I /build/contrib/orc/c++/../proto --cpp_out="/build/build_docker/contrib/arrow-cmake" /build/contrib/orc/c++/../proto/orc_proto.proto
May 17 18:31:34 FATAL: Code 0x603049277240 is out of application range. Non-PIE build?
May 17 18:31:34 FATAL: MemorySanitizer can not mmap the shadow memory.
May 17 18:31:34 FATAL: Make sure to compile with -fPIE and to link with -pie.
May 17 18:31:34 FATAL: Disabling ASLR is known to cause this error.
May 17 18:31:34 FATAL: If running under GDB, try 'set disable-randomization off'.
May 17 18:31:34 ==110628==Process memory map follows:
May 17 18:31:34 	0x603049036000-0x60304925a000	/build/build_docker/contrib/google-protobuf-cmake/protoc
May 17 18:31:34 	0x60304925a000-0x60304a10a000	/build/build_docker/contrib/google-protobuf-cmake/protoc
May 17 18:31:34 	0x60304a10a000-0x60304a121000	/build/build_docker/contrib/google-protobuf-cmake/protoc
May 17 18:31:34 	0x60304a121000-0x60304a128000	/build/build_docker/contrib/google-protobuf-cmake/protoc
May 17 18:31:34 	0x60304a128000-0x60304bb14000	
May 17 18:31:34 	0x7e0b4d700000-0x7e0b4d800000	
May 17 18:31:34 	0x7e0b4d900000-0x7e0b4da00000	
May 17 18:31:34 	0x7e0b4db00000-0x7e0b4dc00000	
May 17 18:31:34 	0x7e0b4dd00000-0x7e0b4de00000	
May 17 18:31:34 	0x7e0b4dee3000-0x7e0b4e291000	
May 17 18:31:34 	0x7e0b4e291000-0x7e0b4e292000	/usr/lib/x86_64-linux-gnu/libdl.so.2
May 17 18:31:34 	0x7e0b4e292000-0x7e0b4e293000	/usr/lib/x86_64-linux-gnu/libdl.so.2
May 17 18:31:34 	0x7e0b4e293000-0x7e0b4e294000	/usr/lib/x86_64-linux-gnu/libdl.so.2
May 17 18:31:34 	0x7e0b4e294000-0x7e0b4e295000	/usr/lib/x86_64-linux-gnu/libdl.so.2
May 17 18:31:34 	0x7e0b4e295000-0x7e0b4e296000	/usr/lib/x86_64-linux-gnu/libdl.so.2
May 17 18:31:34 	0x7e0b4e296000-0x7e0b4e297000	/usr/lib/x86_64-linux-gnu/librt.so.1
May 17 18:31:34 	0x7e0b4e297000-0x7e0b4e298000	/usr/lib/x86_64-linux-gnu/librt.so.1
May 17 18:31:34 	0x7e0b4e298000-0x7e0b4e299000	/usr/lib/x86_64-linux-gnu/librt.so.1
May 17 18:31:34 	0x7e0b4e299000-0x7e0b4e29a000	/usr/lib/x86_64-linux-gnu/librt.so.1
May 17 18:31:34 	0x7e0b4e29a000-0x7e0b4e29b000	/usr/lib/x86_64-linux-gnu/librt.so.1
May 17 18:31:34 	0x7e0b4e29b000-0x7e0b4e2a9000	/usr/lib/x86_64-linux-gnu/libm.so.6
May 17 18:31:34 	0x7e0b4e2a9000-0x7e0b4e325000	/usr/lib/x86_64-linux-gnu/libm.so.6
May 17 18:31:34 	0x7e0b4e325000-0x7e0b4e380000	/usr/lib/x86_64-linux-gnu/libm.so.6
May 17 18:31:34 	0x7e0b4e380000-0x7e0b4e381000	/usr/lib/x86_64-linux-gnu/libm.so.6
May 17 18:31:34 	0x7e0b4e381000-0x7e0b4e382000	/usr/lib/x86_64-linux-gnu/libm.so.6
May 17 18:31:34 	0x7e0b4e382000-0x7e0b4e3aa000	/usr/lib/x86_64-linux-gnu/libc.so.6
May 17 18:31:34 	0x7e0b4e3aa000-0x7e0b4e53f000	/usr/lib/x86_64-linux-gnu/libc.so.6
May 17 18:31:34 	0x7e0b4e53f000-0x7e0b4e597000	/usr/lib/x86_64-linux-gnu/libc.so.6
May 17 18:31:34 	0x7e0b4e597000-0x7e0b4e598000	/usr/lib/x86_64-linux-gnu/libc.so.6
May 17 18:31:34 	0x7e0b4e598000-0x7e0b4e59c000	/usr/lib/x86_64-linux-gnu/libc.so.6
May 17 18:31:34 	0x7e0b4e59c000-0x7e0b4e59e000	/usr/lib/x86_64-linux-gnu/libc.so.6
May 17 18:31:34 	0x7e0b4e59e000-0x7e0b4e5ab000	
May 17 18:31:34 	0x7e0b4e5ab000-0x7e0b4e5ac000	/usr/lib/x86_64-linux-gnu/libpthread.so.0
May 17 18:31:34 	0x7e0b4e5ac000-0x7e0b4e5ad000	/usr/lib/x86_64-linux-gnu/libpthread.so.0
May 17 18:31:34 	0x7e0b4e5ad000-0x7e0b4e5ae000	/usr/lib/x86_64-linux-gnu/libpthread.so.0
May 17 18:31:34 	0x7e0b4e5ae000-0x7e0b4e5af000	/usr/lib/x86_64-linux-gnu/libpthread.so.0
May 17 18:31:34 	0x7e0b4e5af000-0x7e0b4e5b0000	/usr/lib/x86_64-linux-gnu/libpthread.so.0
May 17 18:31:34 	0x7e0b4e5b1000-0x7e0b4e5b7000	
May 17 18:31:34 	0x7e0b4e5b7000-0x7e0b4e5b9000	/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
May 17 18:31:34 	0x7e0b4e5b9000-0x7e0b4e5e3000	/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
May 17 18:31:34 	0x7e0b4e5e3000-0x7e0b4e5ee000	/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
May 17 18:31:34 	0x7e0b4e5ee000-0x7e0b4e5ef000	
May 17 18:31:34 	0x7e0b4e5ef000-0x7e0b4e5f1000	/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
May 17 18:31:34 	0x7e0b4e5f1000-0x7e0b4e5f3000	/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
May 17 18:31:34 	0x7ffd73396000-0x7ffd733b7000	[stack]
May 17 18:31:34 	0x7ffd733c1000-0x7ffd733c5000	[vvar]
May 17 18:31:34 	0x7ffd733c5000-0x7ffd733c7000	[vdso]
May 17 18:31:34 	0xffffffffff600000-0xffffffffff601000	[vsyscall]
May 17 18:31:34 ==110628==End of process memory map.

It looks like ASLR related, though not sure it pops up only now, maybe there was some changes on CI?

CI: https://s3.amazonaws.com/clickhouse-test-reports/64058/5ad98e88a254c6c9bc67af52b43702356d6c3950/clickhouse_build_check/report.html
Play: https://play.clickhouse.com/play?user=play#U0VMRUNUIGNoZWNrX3N0YXJ0X3RpbWUsIGNoZWNrX3N0YXR1cywgcmVwb3J0X3VybApGUk9NIGNoZWNrcwpXSEVSRSByZXBvcnRfdXJsIExJS0UgJyVjbGlja2hvdXNlX2J1aWxkX2NoZWNrJScKICAgIEFORCBjaGVja19zdGFydF90aW1lID49IG5vdygpIC0gSU5URVJWQUwgMyBEQVkKICAgIGFuZCBwdWxsX3JlcXVlc3RfbnVtYmVyID0gMAogICAgQU5EIGNoZWNrX3N0YXR1cyBub3QgaW4gKCdzdWNjZXNzJywgJ3BlbmRpbmcnKQpPUkRFUiBCWSBjaGVja19zdGFydF90aW1l

@azat azat added fuzz Problem found by one of the fuzzers comp-ci Continuous integration and removed fuzz Problem found by one of the fuzzers labels May 18, 2024
@azat
Copy link
Collaborator Author

azat commented May 18, 2024

And there are also SIGSEGV during installation of TSan/MSan builds from time to time:

Stateless tests (tsan) [1/5] — Invalid check_status.tsv

Setting up clickhouse-server (24.5.1.1318+tsan) ...
Segmentation fault (core dumped)
dpkg: error processing package clickhouse-server (--install):
 installed clickhouse-server package post-installation script subprocess returned error exit status 139

But I cannot reproduce the issue:

root@5acc492f7e7e:/# dpkg -i /root/*.deb  |& cat
(Reading database ... 4437 files and directories currently installed.)
Preparing to unpack .../clickhouse-common-static_24.5.1.1318+tsan_amd64.deb ...
Unpacking clickhouse-common-static (24.5.1.1318+tsan) over (24.5.1.1318+tsan) ...
Preparing to unpack .../clickhouse-server_24.5.1.1318+tsan_amd64.deb ...
Unpacking clickhouse-server (24.5.1.1318+tsan) over (24.5.1.1318+tsan) ...
Setting up clickhouse-common-static (24.5.1.1318+tsan) ...
Setting up clickhouse-server (24.5.1.1318+tsan) ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 78.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (Can't locate Term/ReadLine.pm in @INC (you may need to install the Term::ReadLine module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.34.0 /usr/local/share/perl/5.34.0 /usr/lib/x86_64-linux-gnu/perl5/5.34 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl-base /usr/lib/x86_64-linux-gnu/perl/5.34 /usr/share/perl/5.34 /usr/local/lib/site_perl) at /usr/share/perl5/Debconf/FrontEnd/Readline.pm line 7.)
debconf: falling back to frontend: Teletype
groupadd: group 'clickhouse' already exists
useradd: user 'clickhouse' already exists
Cannot set 'net_admin' or 'ipc_lock' or 'sys_nice' or 'net_bind_service' capability for clickhouse binary. This is optional. Taskstats accounting will be disabled. To enable taskstats accounting you may add the required capability later manually.
ClickHouse binary is already located at /usr/bin/clickhouse
Symlink /usr/bin/clickhouse-server already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-server to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-extract-from-config already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-extract-from-config to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-keeper already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-keeper to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-keeper-converter already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-keeper-converter to /usr/bin/clickhouse.
Symlink /usr/bin/ch already exists. Will keep it.
Symlink /usr/bin/chl already exists. Will keep it.
Symlink /usr/bin/chc already exists. Will keep it.
Creating clickhouse group if it does not exist.
 groupadd -r clickhouse
Creating clickhouse user if it does not exist.
 useradd -r --shell /bin/false --home-dir /nonexistent -g clickhouse clickhouse
Will set ulimits for clickhouse user in /etc/security/limits.d/clickhouse.conf.
Config file /etc/clickhouse-server/config.xml already exists, will keep it and extract path info from it.
/etc/clickhouse-server/config.xml has /var/lib/clickhouse/ as data path.
/etc/clickhouse-server/config.xml has /var/log/clickhouse-server/ as log path.
Users config file /etc/clickhouse-server/users.xml already exists, will keep it and extract users info from it.
Log directory /var/log/clickhouse-server/ already exists.
Data directory /var/lib/clickhouse/ already exists.
Pid directory /var/run/clickhouse-server already exists.
 chown -R clickhouse:clickhouse '/var/log/clickhouse-server/'
 chown -R clickhouse:clickhouse '/var/run/clickhouse-server'
 chown  clickhouse:clickhouse '/var/lib/clickhouse/'
Password for the default user is an empty string. See /etc/clickhouse-server/users.xml and /etc/clickhouse-server/users.d to change it.
Setting capabilities for clickhouse binary. This is optional.
 chown -R clickhouse:clickhouse '/etc/clickhouse-server'

ClickHouse has been successfully installed.

Start clickhouse-server with:
 sudo clickhouse start

Start clickhouse-client with:
 clickhouse-client

Play: https://play.clickhouse.com/play?user=play#U0VMRUNUIGNoZWNrX2R1cmF0aW9uX21zLCBjaGVja19zdGF0dXMsIHJlcG9ydF91cmwKRlJPTSBjaGVja3MKV0hFUkUgcmVwb3J0X3VybCBMSUtFICclc3RhdGVsZXNzX3Rlc3RzX190c2FuX18lJwogICAgQU5EIGNoZWNrX3N0YXJ0X3RpbWUgPj0gbm93KCkgLSBJTlRFUlZBTCAzIERBWQogICAgYW5kIGNoZWNrX2R1cmF0aW9uX21zIDwgMzAwZTMKICAgIGFuZCBjaGVja19zdGF0dXMgPSAnZXJyb3InCk9SREVSIEJZIGNoZWNrX3N0YXJ0X3RpbWU=

@azat
Copy link
Collaborator Author

azat commented May 18, 2024

We could try sudo sysctl vm.mmap_rnd_bits=28 (if there is such sysctl in ubuntus on CI)

azat added a commit to azat/ClickHouse that referenced this issue May 18, 2024
Rebuild for clang 18.1.3, that contains a workaround [1] for sanitizers
issue [2]:

    $ git tag --contains  c2a57034eff048cd36c563c8e0051db3a70991b3 | tail -1
    llvmorg-18.1.3

 [1]: llvm/llvm-project@c2a5703
 [2]: ClickHouse#64086

Since right now version is not enough:

    $ docker run --rm -it clickhouse/test-util llvm-nm-18 --version
    llvm-nm, compatible with GNU nm
    Ubuntu LLVM version 18.1.2
      Optimized build.

But I don't see any fix for TSan, only MSan, but let's try.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
@azat azat changed the title TSan/MSan builds periodic failures TSan/MSan builds failures May 19, 2024
@azat
Copy link
Collaborator Author

azat commented May 19, 2024

I've tried #64090 since I saw that in google/sanitizers#856 there were issues reported for the 6.5.0-28-generic kernel, while CI has 6.5.0-1014-aws, so more or less the same, but it did not help 100%.
By some reason it helps for builds - where protoc is used, but not for tests where clickhouse binary is used

@azat
Copy link
Collaborator Author

azat commented May 19, 2024

And it will be hard to built protoc without sanitizers -

if (ENABLE_FUZZING)
# `protoc` will be built with sanitizer and it could fail during ClickHouse build
# It easily reproduces in oss-fuzz building pipeline
# To avoid this we can try to build `protoc` without any sanitizer with option `-fno-sanitize=all`, but
# it this case we will face with linker errors, because libcxx still will be built with sanitizer
# So, we can simply suppress all of these failures with a combination this flag and an environment variable
# export MSAN_OPTIONS=exit_code=0
target_compile_options(protoc PRIVATE "-fsanitize-recover=all")
endif()

@azat
Copy link
Collaborator Author

azat commented May 19, 2024

So kernel.randomize_va_space=0 helps for MSan/TSan

MSan has some strange issues with getauxval override with ASLR, but turning it OFF helps anyway:

# cd /src/clickhouse/.cmake-msan/contrib/arrow-cmake && /src/clickhouse/.cmake-msan/contrib/google-protobuf-cmake/protoc -I /src/clickhouse/contrib/orc/c++/../proto --cpp_out="/src/clickhouse/.cmake-msan/contrib/arrow-cmake" /src/clickhouse/contrib/orc/c++/../proto/orc_proto.proto
MemorySanitizer:DEADLYSIGNAL
==41141==ERROR: MemorySanitizer: SEGV on unknown address 0x2ffec5be94a8 (pc 0x630a1c8069c6 bp 0x000000000002 sp 0x7ffec5be94a0 T41141)
==41141==The signal is caused by a WRITE memory access.
    #0 0x630a1c8069c6 in __auxv_init_procfs .cmake-msan/./base/glibc-compatibility/musl/getauxval.c:78
    #1 0x630a1c807049 in getauxval .cmake-msan/./base/glibc-compatibility/musl/getauxval.c:200:12
    #2 0x630a1b963851 in __sanitizer::ReExec() crtstuff.c
    #3 0x630a1b9df4ca in __msan::InitShadowWithReExec(bool) crtstuff.c
    #4 0x630a1b97536b in __msan_init (/src/clickhouse/.cmake-msan/contrib/google-protobuf-cmake/protoc+0x24136b) (BuildId: 95c3c051aef70edc003ed3a9cd175c6b301391e5)
    #5 0x630a1b9e6978 in msan.module_ctor main.cc
    #6 0x630a1c807e8c in __libc_csu_init (/src/clickhouse/.cmake-msan/contrib/google-protobuf-cmake/protoc+0x10d3e8c) (BuildId: 95c3c051aef70edc003ed3a9cd175c6b301391e5)
    #7 0x71919742a263 in __libc_start_main csu/../csu/libc-start.c:343:6
    #8 0x630a1b957f0d in _start (/src/clickhouse/.cmake-msan/contrib/google-protobuf-cmake/protoc+0x223f0d) (BuildId: 95c3c051aef70edc003ed3a9cd175c6b301391e5)

@azat
Copy link
Collaborator Author

azat commented May 19, 2024

May 17 18:34:12 ThreadSanitizer: CHECK failed: tsan_platform_linux.cpp:282 "((personality(old_personality | ADDR_NO_RANDOMIZE))) != ((-1))" (0xffffffffffffffff, 0xffffffffffffffff) (tid=123304)

This is likely due to docker forbids this via seccomp

@azat
Copy link
Collaborator Author

azat commented May 19, 2024

BTW initial issue is likely due to ubuntu decided to adjust mmap_rnd_bits - https://bugs.launchpad.net/ubuntu/+source/llvm-toolchain-14/+bug/2048768/comments/8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp-ci Continuous integration
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant