Sample Header Ad - 728x90

Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes
1 answers
130 views
Not able to use drbd over lustre zfs
I am running the below command to format the nvme drive to lustre with backfstype as zfs ``` mkfs.lustre --mdt --reformat --mgsnode=mgsmaster@tcp --backfstype=zfs --fsname=lustre mdtpool/mdt --index=0 /dev/nvme1n1 ``` After formatting the drive I am trying to setup drbd using the below resource conf...
I am running the below command to format the nvme drive to lustre with backfstype as zfs
mkfs.lustre --mdt --reformat --mgsnode=mgsmaster@tcp --backfstype=zfs --fsname=lustre mdtpool/mdt --index=0 /dev/nvme1n1
After formatting the drive I am trying to setup drbd using the below resource config
resource "r0" {
protocol C; //Updated
device /dev/drbd0; //Updated
disk /dev/nvme1n1;
meta-disk internal;
options {
auto-promote no;
}
on "hostname1" {
node-id 0;
}
on "hostname2" {
node-id 1;
}
connection {
host "hostname1" address 10.40.40.1:7789;
host "hostname2" address 10.40.40.2:7789;
}
}
after this I run the below command to initialize drbd
drbdadm create-md r0
open(/dev/nvme1n1) failed: Device or resource busy
...
I found this post on the drbd website where it's mentioned that ZFS doesn't hold the device open in the kernel the same way other filesystems or processes in Linux do [DRBD Reference website for zfs](https://kb.linbit.com/using-zfs-over-drbd-with-pacemaker) Need some help here. I tried first setting up drbd on the drive and then formatting with mkfs.lustre but then it's the same mkfs.lustre wants drbd service to be down, and only after that it can format update: The issue has been resolved The steps for replication are as follows:
drbdadm create-md r0
drbdadm up r0
drbdadm primary r0 --force
mkfs.lustre --mdt --reformat --mgsnode=mgsmaster@tcp --backfstype=zfs --fsname=lustre mdtpool/mdt --index=0 /dev/drbd0
mount -t mdtpool/mdt /mnt/MDT
If we want to remount after reboot
drbdadm up r0
drbdadm primary r0 --force
zpool import -o cachefile=none mdtpool
mount -t mdtpool/mdt /mnt/MDT
Neil Karania (3 rep)
Jun 11, 2024, 09:09 AM • Last activity: Jun 13, 2024, 08:56 AM
0 votes
1 answers
39 views
How does Lustre decide how much of a file to restore to front-end storage when a program reads a file?
I [read](https://unix.stackexchange.com/a/772007/16704) about the Lustre file system: > With huge files, it’s unusual for the entire file to be restored to frontend storage How does Lustre decide how much of a file to restore to front-end storage when a program reads a file?
I [read](https://unix.stackexchange.com/a/772007/16704) about the Lustre file system: > With huge files, it’s unusual for the entire file to be restored to frontend storage How does Lustre decide how much of a file to restore to front-end storage when a program reads a file?
Franck Dernoncourt (5533 rep)
Mar 11, 2024, 02:11 AM • Last activity: Mar 12, 2024, 06:57 AM
3 votes
1 answers
89 views
Why does reading a file with hexdump sometimes change find's sparseness value (%S)?
I use a Lustre file system. I've noticed that if I look at the find's sparseness value (`%S`) for a file, then print the file with `hexdump`, then look at the find's sparseness value again, then sometimes find's sparseness value (`%S`) has changed. Why does it change? --- Command to look at the find...
I use a Lustre file system. I've noticed that if I look at the find's sparseness value (%S) for a file, then print the file with hexdump, then look at the find's sparseness value again, then sometimes find's sparseness value (%S) has changed. Why does it change? --- Command to look at the find's sparseness value (%S) for the file myvideo.mp4:
find myvideo.mp4 -printf "%S"
Command to read the file myvideo.mp4 with hexdump:
hexdump myvideo.mp4
--- I noticed that behavior on several files. Examples of changes of find's sparseness values (%S): - 0.000135559 to 0.631297 - 0.00466808 to 0.228736 Is it because the file is being cached partly locally when reading with hexdump? I noticed that this change isn't specific to hexdump, e.g. the same happens with nano (and likely any other program that read the file):
dernoncourt@server:/videos$ find myvideo.mp4 -printf "%S"
0.00302331
dernoncourt@server:/videos$ nano myvideo.mp4 
dernoncourt@server:/videos$ find myvideo.mp4 -printf "%S"
0.486752
Franck Dernoncourt (5533 rep)
Mar 10, 2024, 02:37 AM • Last activity: Mar 10, 2024, 10:33 PM
0 votes
1 answers
807 views
Why is the the apparent size of a file much larger than the actual disk usage in this case? (4.4GiB vs. 512B)
While browsing folders via `ncdu`, I noticed that the apparent size of a file was sometimes much larger than the actual disk usage. Example via `ncdu`, then `a` to toggle between showing disk usage and showing apparent, then `i` to show more details: [![enter image description here][1]][1] I was tol...
While browsing folders via ncdu, I noticed that the apparent size of a file was sometimes much larger than the actual disk usage. Example via ncdu, then a to toggle between showing disk usage and showing apparent, then i to show more details: enter image description here I was told this may be due to some automatic process that only keeps a small portion of the data in a "fast" layer and and keeps the rest on slower place such as AWS S3. How can I check that? --- As [suggested](https://unix.stackexchange.com/questions/771965/why-is-the-the-apparent-size-of-a-file-much-larger-than-the-actual-disk-usage-in?noredirect=1#comment1473737_771965) by [Chris Down](https://unix.stackexchange.com/users/10762/chris-down "125,094 reputation"), here is part of the output of hexdump run on that file: enter image description here It seems to indicate the file isn't sparse. As [suggested](https://unix.stackexchange.com/questions/771965/why-is-the-the-apparent-size-of-a-file-much-larger-than-the-actual-disk-usage-in#comment1473744_771965) by [Artem S. Tashkinov](https://unix.stackexchange.com/users/260833/artem-s-tashkinov "28,590 reputation"), the file system is [Lustre](https://en.wikipedia.org/wiki/Lustre_(file_system)) (checked with sudo df -T).
Franck Dernoncourt (5533 rep)
Mar 9, 2024, 04:50 PM • Last activity: Mar 10, 2024, 01:55 PM
0 votes
1 answers
314 views
Benchmarking lustre filesystem
I want to benchmark the ability of a single lustre client to save in its lustre-mounted filesystem. I am an application developer and not a storage maintainer, so i am not worried about storage write bandwidth saturation by several clients, I am worried about how much a single application server/lus...
I want to benchmark the ability of a single lustre client to save in its lustre-mounted filesystem. I am an application developer and not a storage maintainer, so i am not worried about storage write bandwidth saturation by several clients, I am worried about how much a single application server/lustre client can write at once so i can compare it to my application performance. I found this page with several benchmarks , but all seem to be interested in configuring several clients at once instead of using just one client. Is any of these more interesting for what i am looking? Alternatively, i have a naive script which uses dd to benchmark filesystems in different block sizes and counts. Can I trust the results that I obtained by running this dd script in my lustre client? If not, why? I know i am limited by my network bandwidth, but i am interested in understanding how it limits my performance too.
Marco Montevechi Filho (187 rep)
Feb 2, 2024, 06:31 PM • Last activity: Feb 3, 2024, 09:08 AM
4 votes
3 answers
3491 views
Removing dirs with lots of tiny files on Lustre
I have a dir with a gigantic amount of very small files that I want to remove and simply removing the dir with `rm -rf /path/to/the/dir` is already taking multiple days. It might sound strange that this is going slow, but the dir is not a dir on regular filesystem. It's a dir on a Lustre Filesystem...
I have a dir with a gigantic amount of very small files that I want to remove and simply removing the dir with rm -rf /path/to/the/dir is already taking multiple days. It might sound strange that this is going slow, but the dir is not a dir on regular filesystem. It's a dir on a Lustre Filesystem of a cluster. I'm running the rm command on node A of the cluster which has the Lustre mounted, but the backend of the Lustre are 2 ZFS filesystems, one on node B and one on node C so all the networktraffic might be the cause of rm going slow. Does anyone know faster ways to remove the dir than my way ?
Eduardo J. Culpepper (41 rep)
Jun 12, 2016, 11:44 AM • Last activity: Feb 20, 2023, 07:27 AM
0 votes
0 answers
363 views
strace with errors on openat syscall with Lustre FS, RHEL 8, overall intermittent latency
Does anyone see what could be causing this intermittent latency? At various times all commands become slow on login, compute and head nodes and only with logins associated with Lustre, i.e., root and local logins do not have this latency. 45 errors on openat syscall does not sound normal. strace -tt...
Does anyone see what could be causing this intermittent latency? At various times all commands become slow on login, compute and head nodes and only with logins associated with Lustre, i.e., root and local logins do not have this latency. 45 errors on openat syscall does not sound normal. strace -ttt -T -C -w touch newfilethat 1662064377.602061 execve("/usr/bin/touch", ["touch", "newfilethat"], 0x7fffffffd278 /* 75 vars */) = 0 1662064377.602361 brk(NULL) = 0x55555576b000 1662064377.602385 arch_prctl(0x3001 /* ARCH_??? */, 0x7fffffffd1a0) = -1 EINVAL (Invalid argument) 1662064377.602443 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) 1662064377.602516 openat(AT_FDCWD, "/path/to/slurm/current/lib64/slurm/tls/haswell/avx512_1/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.602582 stat("/path/to/slurm/current/lib64/slurm/tls/haswell/avx512_1/x86_64", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.602610 openat(AT_FDCWD, "/path/to/slurm/current/lib64/slurm/tls/haswell/avx512_1/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.602634 stat("/path/to/slurm/current/lib64/slurm/tls/haswell/avx512_1", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.602657 openat(AT_FDCWD, "/path/to/slurm/current/lib64/slurm/tls/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.602680 stat("/path/to/slurm/current/lib64/slurm/tls/haswell/x86_64", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.602703 openat(AT_FDCWD, "/path/to/slurm/current/lib64/slurm/tls/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.602725 stat("/path/to/slurm/current/lib64/slurm/tls/haswell", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.602748 openat(AT_FDCWD, "/path/to/slurm/current/lib64/slurm/tls/avx512_1/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.602772 stat("/path/to/slurm/current/lib64/slurm/tls/avx512_1/x86_64", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.602794 openat(AT_FDCWD, "/path/to/slurm/current/lib64/slurm/tls/avx512_1/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.602817 stat("/path/to/slurm/current/lib64/slurm/tls/avx512_1", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.602840 openat(AT_FDCWD, "/path/to/slurm/current/lib64/slurm/tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.602862 stat("/path/to/slurm/current/lib64/slurm/tls/x86_64", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.602885 openat(AT_FDCWD, "/path/to/slurm/current/lib64/slurm/tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.602907 stat("/path/to/slurm/current/lib64/slurm/tls", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.602930 openat(AT_FDCWD, "/path/to/slurm/current/lib64/slurm/haswell/avx512_1/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.602955 stat("/path/to/slurm/current/lib64/slurm/haswell/avx512_1/x86_64", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.602979 openat(AT_FDCWD, "/path/to/slurm/current/lib64/slurm/haswell/avx512_1/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603002 stat("/path/to/slurm/current/lib64/slurm/haswell/avx512_1", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603025 openat(AT_FDCWD, "/path/to/slurm/current/lib64/slurm/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603048 stat("/path/to/slurm/current/lib64/slurm/haswell/x86_64", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603071 openat(AT_FDCWD, "/path/to/slurm/current/lib64/slurm/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603094 stat("/path/to/slurm/current/lib64/slurm/haswell", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603117 openat(AT_FDCWD, "/path/to/slurm/current/lib64/slurm/avx512_1/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603140 stat("/path/to/slurm/current/lib64/slurm/avx512_1/x86_64", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603164 openat(AT_FDCWD, "/path/to/slurm/current/lib64/slurm/avx512_1/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603187 stat("/path/to/slurm/current/lib64/slurm/avx512_1", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603211 openat(AT_FDCWD, "/path/to/slurm/current/lib64/slurm/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603233 stat("/path/to/slurm/current/lib64/slurm/x86_64", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603257 openat(AT_FDCWD, "/path/to/slurm/current/lib64/slurm/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603280 stat("/path/to/slurm/current/lib64/slurm", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0 1662064377.603308 openat(AT_FDCWD, "/path/to/slurm/current/lib64/tls/haswell/avx512_1/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603332 stat("/path/to/slurm/current/lib64/tls/haswell/avx512_1/x86_64", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603356 openat(AT_FDCWD, "/path/to/slurm/current/lib64/tls/haswell/avx512_1/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603379 stat("/path/to/slurm/current/lib64/tls/haswell/avx512_1", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603402 openat(AT_FDCWD, "/path/to/slurm/current/lib64/tls/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603424 stat("/path/to/slurm/current/lib64/tls/haswell/x86_64", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603446 openat(AT_FDCWD, "/path/to/slurm/current/lib64/tls/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603469 stat("/path/to/slurm/current/lib64/tls/haswell", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603491 openat(AT_FDCWD, "/path/to/slurm/current/lib64/tls/avx512_1/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603517 stat("/path/to/slurm/current/lib64/tls/avx512_1/x86_64", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603540 openat(AT_FDCWD, "/path/to/slurm/current/lib64/tls/avx512_1/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603569 stat("/path/to/slurm/current/lib64/tls/avx512_1", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603592 openat(AT_FDCWD, "/path/to/slurm/current/lib64/tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603615 stat("/path/to/slurm/current/lib64/tls/x86_64", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603638 openat(AT_FDCWD, "/path/to/slurm/current/lib64/tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603661 stat("/path/to/slurm/current/lib64/tls", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603684 openat(AT_FDCWD, "/path/to/slurm/current/lib64/haswell/avx512_1/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603707 stat("/path/to/slurm/current/lib64/haswell/avx512_1/x86_64", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603730 openat(AT_FDCWD, "/path/to/slurm/current/lib64/haswell/avx512_1/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603753 stat("/path/to/slurm/current/lib64/haswell/avx512_1", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603775 openat(AT_FDCWD, "/path/to/slurm/current/lib64/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603798 stat("/path/to/slurm/current/lib64/haswell/x86_64", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603820 openat(AT_FDCWD, "/path/to/slurm/current/lib64/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603844 stat("/path/to/slurm/current/lib64/haswell", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603867 openat(AT_FDCWD, "/path/to/slurm/current/lib64/avx512_1/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603890 stat("/path/to/slurm/current/lib64/avx512_1/x86_64", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603913 openat(AT_FDCWD, "/path/to/slurm/current/lib64/avx512_1/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603935 stat("/path/to/slurm/current/lib64/avx512_1", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.603958 openat(AT_FDCWD, "/path/to/slurm/current/lib64/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.603981 stat("/path/to/slurm/current/lib64/x86_64", 0x7fffffffc3f0) = -1 ENOENT (No such file or directory) 1662064377.604004 openat(AT_FDCWD, "/path/to/slurm/current/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.604028 stat("/path/to/slurm/current/lib64", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 1662064377.604053 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 1662064377.604074 fstat(3, {st_mode=S_IFREG|0644, st_size=93583, ...}) = 0 1662064377.604095 mmap(NULL, 93583, PROT_READ, MAP_PRIVATE, 3, 0) = 0x155555537000 1662064377.604117 close(3) = 0 1662064377.604139 openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 1662064377.604160 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\2607\2\0\0\0\0\0"..., 832) = 832 1662064377.604183 fstat(3, {st_mode=S_IFREG|0755, st_size=3149120, ...}) = 0 1662064377.604203 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x155555535000 1662064377.604225 mmap(NULL, 3938144, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x155554f69000 1662064377.604245 mprotect(0x155555122000, 2093056, PROT_NONE) = 0 1662064377.604268 mmap(0x155555321000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b8000) = 0x155555321000 1662064377.604294 mmap(0x155555327000, 14176, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x155555327000 1662064377.604318 close(3) = 0 1662064377.604345 arch_prctl(ARCH_SET_FS, 0x155555536580) = 0 1662064377.604394 mprotect(0x155555321000, 16384, PROT_READ) = 0 1662064377.604426 mprotect(0x555555769000, 4096, PROT_READ) = 0 1662064377.604448 mprotect(0x155555553000, 4096, PROT_READ) = 0 1662064377.604468 munmap(0x155555537000, 93583) = 0 1662064377.604539 brk(NULL) = 0x55555576b000 1662064377.604559 brk(0x55555578c000) = 0x55555578c000 1662064377.604579 brk(NULL) = 0x55555578c000 1662064377.604601 openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.604631 openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3 1662064377.604652 fstat(3, {st_mode=S_IFREG|0644, st_size=2997, ...}) = 0 1662064377.604673 read(3, "# Locale name alias data base.\n#"..., 4096) = 2997 1662064377.604703 read(3, "", 4096) = 0 1662064377.604722 close(3) = 0 1662064377.604750 openat(AT_FDCWD, "/usr/lib/locale/en_US.UTF-8/LC_IDENTIFICATION", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.604772 openat(AT_FDCWD, "/usr/lib/locale/en_US.utf8/LC_IDENTIFICATION", O_RDONLY|O_CLOEXEC) = 3 1662064377.604793 fstat(3, {st_mode=S_IFREG|0644, st_size=368, ...}) = 0 1662064377.604812 mmap(NULL, 368, PROT_READ, MAP_PRIVATE, 3, 0) = 0x15555554d000 1662064377.604832 close(3) = 0 1662064377.604852 openat(AT_FDCWD, "/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 3 1662064377.604872 fstat(3, {st_mode=S_IFREG|0644, st_size=26998, ...}) = 0 1662064377.604892 mmap(NULL, 26998, PROT_READ, MAP_SHARED, 3, 0) = 0x155555546000 1662064377.604911 close(3) = 0 1662064377.604935 openat(AT_FDCWD, "/usr/lib/locale/en_US.UTF-8/LC_MEASUREMENT", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.604956 openat(AT_FDCWD, "/usr/lib/locale/en_US.utf8/LC_MEASUREMENT", O_RDONLY|O_CLOEXEC) = 3 1662064377.604976 fstat(3, {st_mode=S_IFREG|0644, st_size=23, ...}) = 0 1662064377.604996 mmap(NULL, 23, PROT_READ, MAP_PRIVATE, 3, 0) = 0x155555545000 1662064377.605017 close(3) = 0 1662064377.605040 openat(AT_FDCWD, "/usr/lib/locale/en_US.UTF-8/LC_TELEPHONE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.605060 openat(AT_FDCWD, "/usr/lib/locale/en_US.utf8/LC_TELEPHONE", O_RDONLY|O_CLOEXEC) = 3 1662064377.605081 fstat(3, {st_mode=S_IFREG|0644, st_size=59, ...}) = 0 1662064377.605101 mmap(NULL, 59, PROT_READ, MAP_PRIVATE, 3, 0) = 0x155555544000 1662064377.605121 close(3) = 0 1662064377.605143 openat(AT_FDCWD, "/usr/lib/locale/en_US.UTF-8/LC_ADDRESS", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.605164 openat(AT_FDCWD, "/usr/lib/locale/en_US.utf8/LC_ADDRESS", O_RDONLY|O_CLOEXEC) = 3 1662064377.605184 fstat(3, {st_mode=S_IFREG|0644, st_size=167, ...}) = 0 1662064377.605203 mmap(NULL, 167, PROT_READ, MAP_PRIVATE, 3, 0) = 0x155555543000 1662064377.605223 close(3) = 0 1662064377.605245 openat(AT_FDCWD, "/usr/lib/locale/en_US.UTF-8/LC_NAME", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.605266 openat(AT_FDCWD, "/usr/lib/locale/en_US.utf8/LC_NAME", O_RDONLY|O_CLOEXEC) = 3 1662064377.605286 fstat(3, {st_mode=S_IFREG|0644, st_size=77, ...}) = 0 1662064377.605305 mmap(NULL, 77, PROT_READ, MAP_PRIVATE, 3, 0) = 0x155555542000 1662064377.605324 close(3) = 0 1662064377.605347 openat(AT_FDCWD, "/usr/lib/locale/en_US.UTF-8/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.605367 openat(AT_FDCWD, "/usr/lib/locale/en_US.utf8/LC_PAPER", O_RDONLY|O_CLOEXEC) = 3 1662064377.605388 fstat(3, {st_mode=S_IFREG|0644, st_size=34, ...}) = 0 1662064377.605407 mmap(NULL, 34, PROT_READ, MAP_PRIVATE, 3, 0) = 0x155555541000 1662064377.605426 close(3) = 0 1662064377.605450 openat(AT_FDCWD, "/usr/lib/locale/en_US.UTF-8/LC_MESSAGES", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.605471 openat(AT_FDCWD, "/usr/lib/locale/en_US.utf8/LC_MESSAGES", O_RDONLY|O_CLOEXEC) = 3 1662064377.605492 fstat(3, {st_mode=S_IFDIR|0755, st_size=29, ...}) = 0 1662064377.605514 close(3) = 0 1662064377.605533 openat(AT_FDCWD, "/usr/lib/locale/en_US.utf8/LC_MESSAGES/SYS_LC_MESSAGES", O_RDONLY|O_CLOEXEC) = 3 1662064377.605554 fstat(3, {st_mode=S_IFREG|0644, st_size=57, ...}) = 0 1662064377.605573 mmap(NULL, 57, PROT_READ, MAP_PRIVATE, 3, 0) = 0x155555540000 1662064377.605592 close(3) = 0 1662064377.605614 openat(AT_FDCWD, "/usr/lib/locale/en_US.UTF-8/LC_MONETARY", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.605636 openat(AT_FDCWD, "/usr/lib/locale/en_US.utf8/LC_MONETARY", O_RDONLY|O_CLOEXEC) = 3 1662064377.605656 fstat(3, {st_mode=S_IFREG|0644, st_size=286, ...}) = 0 1662064377.605675 mmap(NULL, 286, PROT_READ, MAP_PRIVATE, 3, 0) = 0x15555553f000 1662064377.605695 close(3) = 0 1662064377.605717 openat(AT_FDCWD, "/usr/lib/locale/en_US.UTF-8/LC_COLLATE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.605738 openat(AT_FDCWD, "/usr/lib/locale/en_US.utf8/LC_COLLATE", O_RDONLY|O_CLOEXEC) = 3 1662064377.605758 fstat(3, {st_mode=S_IFREG|0644, st_size=2586930, ...}) = 0 1662064377.605777 mmap(NULL, 2586930, PROT_READ, MAP_PRIVATE, 3, 0) = 0x155554cf1000 1662064377.605797 close(3) = 0 1662064377.605825 openat(AT_FDCWD, "/usr/lib/locale/en_US.UTF-8/LC_TIME", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.605847 openat(AT_FDCWD, "/usr/lib/locale/en_US.utf8/LC_TIME", O_RDONLY|O_CLOEXEC) = 3 1662064377.605868 fstat(3, {st_mode=S_IFREG|0644, st_size=3316, ...}) = 0 1662064377.605889 mmap(NULL, 3316, PROT_READ, MAP_PRIVATE, 3, 0) = 0x15555553e000 1662064377.605909 close(3) = 0 1662064377.605931 openat(AT_FDCWD, "/usr/lib/locale/en_US.UTF-8/LC_NUMERIC", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.605952 openat(AT_FDCWD, "/usr/lib/locale/en_US.utf8/LC_NUMERIC", O_RDONLY|O_CLOEXEC) = 3 1662064377.605972 fstat(3, {st_mode=S_IFREG|0644, st_size=54, ...}) = 0 1662064377.605991 mmap(NULL, 54, PROT_READ, MAP_PRIVATE, 3, 0) = 0x15555553d000 1662064377.606011 close(3) = 0 1662064377.606035 openat(AT_FDCWD, "/usr/lib/locale/en_US.UTF-8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1662064377.606056 openat(AT_FDCWD, "/usr/lib/locale/en_US.utf8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = 3 1662064377.606076 fstat(3, {st_mode=S_IFREG|0644, st_size=337024, ...}) = 0 1662064377.606095 mmap(NULL, 337024, PROT_READ, MAP_PRIVATE, 3, 0) = 0x1555554e2000 1662064377.606115 close(3) = 0 1662064377.606142 openat(AT_FDCWD, "newfilethat", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3 1662064411.577554 dup2(3, 0) = 0 1662064411.577613 close(3) = 0 1662064411.577662 utimensat(0, NULL, NULL, 0) = 0 1662064411.595221 close(0) = 0 1662064411.595453 close(1) = 0 1662064411.595474 close(2) = 0 1662064411.595512 exit_group(0) = ? 1662064411.595645 +++ exited with 0 +++ % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 99.94 33.971842 539235 63 45 openat 0.05 0.017515 17514 1 utimensat 0.00 0.000348 16 21 close 0.00 0.000280 8 32 30 stat 0.00 0.000204 203 1 execve 0.00 0.000129 7 18 mmap 0.00 0.000106 6 17 fstat 0.00 0.000034 8 4 mprotect 0.00 0.000028 6 4 brk 0.00 0.000020 6 3 read 0.00 0.000018 18 1 dup2 0.00 0.000012 6 2 1 arch_prctl 0.00 0.000011 11 1 munmap 0.00 0.000008 7 1 1 access ------ ----------- ----------- --------- --------- ---------------- 100.00 33.990554 169 77 total
RobbieTheK (133 rep)
Sep 1, 2022, 08:46 PM
3 votes
0 answers
2424 views
How can I fix the error 'Is the MGS running?' when I try to mount FSx for lustre in Vagrant ( Virtualbox, Centos 7 )
I am running openVPN gateway to an AWS VPC. I can normally mount AWS FSx for Lustre on my bare metal CENTOS7 machine using a command like this ``` sudo mount -t lustre -o noatime,flock 10.1.1.90@tcp:/fsx /fsx ``` However if I try to do the same thing in a vagrant CENTOS 7 box on the same network, I...
I am running openVPN gateway to an AWS VPC. I can normally mount AWS FSx for Lustre on my bare metal CENTOS7 machine using a command like this
sudo mount -t lustre -o noatime,flock 10.1.1.90@tcp:/fsx /fsx
However if I try to do the same thing in a vagrant CENTOS 7 box on the same network, I will encounter this apparent networking related error:
[vagrant@localhost ~]$ sudo mount -t lustre -o noatime,flock 10.1.1.90@tcp:/fsx /fsx
mount.lustre: mount 10.1.1.90@tcp:/fsx at /fsx failed: Input/output error
Is the MGS running?
The vagrant box has no problem mounting NFS shares from the same AWS subnet, so this is a mystery to me. Being able to get it working in the vagrant image matters even though I can get it to work on bare metal because we use the vagrant environment for testing. I can also share an example vagrant file that I can replicate the problem with.
Vagrant.configure("2") do |config|
    config.vm.box = "centos/7"
    config.vagrant.plugins = ['vagrant-vbguest', 'vagrant-disksize', 'vagrant-reload']
    config.vm.provider "virtualbox" do |v|
        v.gui = true
        v.memory = 2048
        v.cpus = 2
        
    end
    config.disksize.size = "65000MB"
    config.vm.network "public_network", use_dhcp_assigned_default_route: true
    config.vm.provision "shell", inline: "sudo yum update -y"
    config.vm.provision "shell", inline: "sudo yum install wget -y"
    config.vm.provision "shell", inline: "sudo wget https://fsx-lustre-client-repo-public-keys.s3.amazonaws.com/fsx-rpm-public-key.asc  -O /tmp/fsx-rpm-public-key.asc"
    config.vm.provision "shell", inline: "sudo rpm --import /tmp/fsx-rpm-public-key.asc"
    config.vm.provision "shell", inline: "sudo wget https://fsx-lustre-client-repo.s3.amazonaws.com/el/7/fsx-lustre-client.repo  -O /etc/yum.repos.d/aws-fsx.repo"
    config.vm.provision "shell", inline: "sudo yum install -y kmod-lustre-client lustre-client"
    config.vm.provision :reload
end
openCivilisation (233 rep)
Aug 9, 2020, 01:10 AM
1 votes
2 answers
122 views
What does [0x200000401:0x4:0x0] for a file ID signify
I tried to get the id(descriptor) of the file using the `DFID` which is defined as `#define DFID "["DFID_NOBRACE"]"` in lustre and got the output as `[0x200000401:0x4:0x0]`. So what does the fields separted by `:` in it signify?
I tried to get the id(descriptor) of the file using the DFID which is defined as #define DFID "["DFID_NOBRACE"]" in lustre and got the output as [0x200000401:0x4:0x0]. So what does the fields separted by : in it signify?
Bhagyesh Dudhediya (764 rep)
Sep 10, 2015, 02:39 PM • Last activity: Apr 27, 2018, 07:14 PM
1 votes
1 answers
587 views
Striping a directory in lustre
What exactly happens when a directory has a default layout in lustre? As far as I know, if a directory has a default layout then whatever striping parameters are set for the directory are applied to the files created in that directory (unless mentioned explicitly). However, I tried for it but could...
What exactly happens when a directory has a default layout in lustre? As far as I know, if a directory has a default layout then whatever striping parameters are set for the directory are applied to the files created in that directory (unless mentioned explicitly). However, I tried for it but could not see that. I mentioned the stripe count for directory=2 (lfs setstripe -c 2 /mnt/lustre/directory), and created the file inside it using lfs setstripe /mnt/lustre/directory/file1 but when I do lfs getstripe /mnt/lustre/directory/file1 I could see stripe count=1. Why is it so?
Bhagyesh Dudhediya (764 rep)
Sep 7, 2015, 06:36 AM • Last activity: Apr 6, 2018, 08:40 AM
0 votes
1 answers
549 views
Lustre and HPC configuration
I have an HPC cluster with 22 nodes, and one head node as a master running [Rocks Cluster OS][1] (which is based on CentOS). The nodes and master communicate with private network (`10.10.0.0/16`). And we `ssh` to server with routed public network (`192.168.xxx.xxx/24`) and this network are not route...
I have an HPC cluster with 22 nodes, and one head node as a master running Rocks Cluster OS (which is based on CentOS). The nodes and master communicate with private network (10.10.0.0/16). And we ssh to server with routed public network (192.168.xxx.xxx/24) and this network are not routed to the worker nodes. Now our data has reach it's limit, we can't add anymore disk to the master. Now we want to build a Lustre cluster consisting of 2 OSS and one MDS. My question is... Do we have to connect the Lustre OSS, and MDS to the same network with the HPC nodes (10.10.0.0/16), so that the nodes can mount our new LustreFS as Lustre clients? Or we can just mount the Lustre clients on the master node, and share the Lustre trough NFS for the HPC worker nodes? We will have other Lustre client outside HPC environment, so we will configure the Lustre on 192.168.xxx.xxx/24. Any suggestion?
luthfi.imanal (1 rep)
Jan 29, 2018, 08:35 AM • Last activity: Mar 6, 2018, 06:11 PM
1 votes
0 answers
133 views
What happens to directory entry when a file is renamed
What exactly happens to the directory entry when a file is renamed? Is it that the entry of the oldfile is flushed and the newfile is replaced at it's place or a completely new entry is made at the end of the directory entry?
What exactly happens to the directory entry when a file is renamed? Is it that the entry of the oldfile is flushed and the newfile is replaced at it's place or a completely new entry is made at the end of the directory entry?
Bhagyesh Dudhediya (764 rep)
Sep 8, 2015, 04:49 PM • Last activity: Sep 8, 2015, 05:11 PM
1 votes
0 answers
203 views
File of Doom freezes any computer that attempts interaction
I have a file on an internal server that causes any machine that interacts with it to immediately and irrecoverably freeze. By interaction, I mean just about anything: `rm`, `mv`, `cp`, `cat`, `vi`, `gedit`, `less`, `touch`. (`ls` is fine.) Try any of these commands and the machine will completely h...
I have a file on an internal server that causes any machine that interacts with it to immediately and irrecoverably freeze. By interaction, I mean just about anything: rm, mv, cp, cat, vi, gedit, less, touch. (ls is fine.) Try any of these commands and the machine will completely halt: the GUI will freeze and all network connectivity goes down. The server hosting the file only goes down if a Windows machine attempts to interact with the file. The data server is mounted with Lustre and all the servers in question use RHEL 6. This is an internal, well-protected network, and the file was created within the network. There are many, many other files that have been created in a similar fashion that do no exhibit this problem. What could possibly be going on here? Is there some strange character sequence that causes Lustre or RHEL to flip out? I'm not a sysadmin, and I'm loathe to further interact with the file for fear of bringing down machines in active use for other purposes, but I'd like to know what's going on, how to remove the file if possible, or at least how to prevent creating another similar File of Doom.
pattivacek (111 rep)
Oct 27, 2014, 02:10 PM • Last activity: Sep 7, 2015, 09:17 PM
Showing page 1 of 13 total questions