Discussion:
[OpenAFS] Unexpected no space left on device error
Theo Ouzhinski
2018-11-13 23:35:40 UTC
Permalink
Hi all,

Recently, I've seen an uptick in "no space left on device" errors for
some of the home directories I administer. 

For example,

matsumoto <USERNAME> # touch a
touch: cannot touch 'a': No space left on device

We are not even close to filling up the cache (located at
/var/cache/openafs) on this client machine.

matsumoto ~ # fs getcacheparms
AFS using 10314 of the cache's available 10000000 1K byte blocks.
matsumoto ~ # df -h
Filesystem                   Size  Used Avail Use% Mounted on
....
/dev/mapper/vgwrkstn-root    456G   17G  417G   4% /
....
AFS                          2.0T     0  2.0T   0% /afs


Nor is this home directory or any other problematic home directory close
to their quota.

matsumoto <USERNAME> # fs lq
Volume Name                    Quota       Used %Used   Partition
<VOLUME NAME>              4194304     194403    5%         37% 

According to previous posts on this list, many issues can be attributed
to high inode usage.  However, this is not the case on our machines.

Here is sample output from one of our OpenAFS servers, which is similar
to all of the four other ones.

openafs1 ~ # df -i
Filesystem         Inodes   IUsed      IFree IUse% Mounted on
udev              1903816     413    1903403    1% /dev
tmpfs             1911210     551    1910659    1% /run
/dev/vda1         1905008  154821    1750187    9% /
tmpfs             1911210       1    1911209    1% /dev/shm
tmpfs             1911210       5    1911205    1% /run/lock
tmpfs             1911210      17    1911193    1% /sys/fs/cgroup
/dev/vdb         19660800 3461203   16199597   18% /vicepa
/dev/vdc         19660800 1505958   18154842    8% /vicepb
tmpfs             1911210       4    1911206    1% /run/user/0
AFS            2147483647       0 2147483647    0% /afs


We are running the latest HWE kernel (4.15.0-38-generic) for Ubuntu
16.04 (which is the OS for both server and client machines). We are
running on the clients, the following versions:

openafs-client/xenial,now 1.8.2-0ppa2~ubuntu16.04.1 amd64 [installed]
openafs-krb5/xenial,now 1.8.2-0ppa2~ubuntu16.04.1 amd64 [installed]
openafs-modules-dkms/xenial,xenial,now 1.8.2-0ppa2~ubuntu16.04.1 all
[installed]

and on the servers, the following versions:

openafs-client/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
openafs-dbserver/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
openafs-fileserver/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
openafs-krb5/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
openafs-modules-dkms/xenial,xenial,now 1.6.15-1ubuntu1 all [installed]

What could be the problem? Is there something I missed?


Thanks,

Theo Ouzhinski

�zpJ)ߢf��)��+-:��T���(���~�+
Theo Ouzhinski
2018-11-14 01:46:28 UTC
Permalink
Hi all,

Sorry for my previous incorrectly formatted email.
Recently, I've seen an uptick in "no space left on device" errors for
some of the home directories I administer.

For example,

matsumoto <USERNAME> # touch a
touch: cannot touch 'a': No space left on device

We are not even close to filling up the cache (located at
/var/cache/openafs) on this client machine.

matsumoto ~ # fs getcacheparms
AFS using 10314 of the cache's available 10000000 1K byte blocks.
matsumoto ~ # df -h
Filesystem Size Used Avail Use% Mounted on
....
/dev/mapper/vgwrkstn-root 456G 17G 417G 4% /
....
AFS 2.0T 0 2.0T 0% /afs


Nor is this home directory or any other problematic home directory close
to their quota.

matsumoto <USERNAME> # fs lq
Volume Name Quota Used %Used Partition
<VOLUME NAME> 4194304 194403 5% 37%

According to previous posts on this list, many issues can be attributed
to high inode usage. However, this is not the case on our machines.

Here is sample output from one of our OpenAFS servers, which is similar
to all of the four other ones.

openafs1 ~ # df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
udev 1903816 413 1903403 1% /dev
tmpfs 1911210 551 1910659 1% /run
/dev/vda1 1905008 154821 1750187 9% /
tmpfs 1911210 1 1911209 1% /dev/shm
tmpfs 1911210 5 1911205 1% /run/lock
tmpfs 1911210 17 1911193 1% /sys/fs/cgroup
/dev/vdb 19660800 3461203 16199597 18% /vicepa
/dev/vdc 19660800 1505958 18154842 8% /vicepb
tmpfs 1911210 4 1911206 1% /run/user/0
AFS 2147483647 0 2147483647 0% /afs


We are running the latest HWE kernel (4.15.0-38-generic) for Ubuntu
16.04 (which is the OS for both server and client machines). We are
running on the clients, the following versions:

openafs-client/xenial,now 1.8.2-0ppa2~ubuntu16.04.1 amd64 [installed]
openafs-krb5/xenial,now 1.8.2-0ppa2~ubuntu16.04.1 amd64 [installed]
openafs-modules-dkms/xenial,xenial,now 1.8.2-0ppa2~ubuntu16.04.1 all
[installed]

and on the servers, the following versions:

openafs-client/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
openafs-dbserver/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
openafs-fileserver/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
openafs-krb5/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
openafs-modules-dkms/xenial,xenial,now 1.6.15-1ubuntu1 all [installed]

What could be the problem? Is there something I missed?


Thanks,

Theo Ouzhinski
Benjamin Kaduk
2018-11-14 03:36:54 UTC
Permalink
Post by Theo Ouzhinski
Hi all,
Sorry for my previous incorrectly formatted email.
Recently, I've seen an uptick in "no space left on device" errors for
some of the home directories I administer.
For example,
matsumoto <USERNAME> # touch a
touch: cannot touch 'a': No space left on device
We are not even close to filling up the cache (located at
/var/cache/openafs) on this client machine.
matsumoto ~ # fs getcacheparms
AFS using 10314 of the cache's available 10000000 1K byte blocks.
matsumoto ~ # df -h
Filesystem Size Used Avail Use% Mounted on
....
/dev/mapper/vgwrkstn-root 456G 17G 417G 4% /
....
AFS 2.0T 0 2.0T 0% /afs
Nor is this home directory or any other problematic home directory close
to their quota.
matsumoto <USERNAME> # fs lq
Volume Name Quota Used %Used Partition
<VOLUME NAME> 4194304 194403 5% 37%
According to previous posts on this list, many issues can be attributed
to high inode usage. However, this is not the case on our machines.
Here is sample output from one of our OpenAFS servers, which is similar
to all of the four other ones.
openafs1 ~ # df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
udev 1903816 413 1903403 1% /dev
tmpfs 1911210 551 1910659 1% /run
/dev/vda1 1905008 154821 1750187 9% /
tmpfs 1911210 1 1911209 1% /dev/shm
tmpfs 1911210 5 1911205 1% /run/lock
tmpfs 1911210 17 1911193 1% /sys/fs/cgroup
/dev/vdb 19660800 3461203 16199597 18% /vicepa
/dev/vdc 19660800 1505958 18154842 8% /vicepb
tmpfs 1911210 4 1911206 1% /run/user/0
AFS 2147483647 0 2147483647 0% /afs
We are running the latest HWE kernel (4.15.0-38-generic) for Ubuntu
16.04 (which is the OS for both server and client machines). We are
openafs-client/xenial,now 1.8.2-0ppa2~ubuntu16.04.1 amd64 [installed]
openafs-krb5/xenial,now 1.8.2-0ppa2~ubuntu16.04.1 amd64 [installed]
openafs-modules-dkms/xenial,xenial,now 1.8.2-0ppa2~ubuntu16.04.1 all
[installed]
openafs-client/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
openafs-dbserver/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
openafs-fileserver/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
openafs-krb5/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
openafs-modules-dkms/xenial,xenial,now 1.6.15-1ubuntu1 all [installed]
(Off-topic, but that looks to be missing some security fixes.)
Post by Theo Ouzhinski
What could be the problem? Is there something I missed?
It's not really ringing a bell off the top of my head, no.

That said, there's a number of potential ways to get ENOSPC, so it would be
good to get more data, like an strace of the failing touch, and maybe a
packet capture (port 7000) during the touch, both from a clean cache and
potentially a second attempt.

-Ben
Jeffrey Altman
2018-11-14 04:23:24 UTC
Permalink
I'm placing a beer on the directory being full. For extra credit I will guess that the directory is full as a result of abandoned silly rename files. You should try salvaging the volume with the rebuild directories option.

Jeffrey Altman
Post by Benjamin Kaduk
Post by Theo Ouzhinski
Hi all,
Sorry for my previous incorrectly formatted email.
Recently, I've seen an uptick in "no space left on device" errors for
some of the home directories I administer.
For example,
matsumoto <USERNAME> # touch a
touch: cannot touch 'a': No space left on device
We are not even close to filling up the cache (located at
/var/cache/openafs) on this client machine.
matsumoto ~ # fs getcacheparms
AFS using 10314 of the cache's available 10000000 1K byte blocks.
matsumoto ~ # df -h
Filesystem Size Used Avail Use% Mounted on
....
/dev/mapper/vgwrkstn-root 456G 17G 417G 4% /
....
AFS 2.0T 0 2.0T 0% /afs
Nor is this home directory or any other problematic home directory close
to their quota.
matsumoto <USERNAME> # fs lq
Volume Name Quota Used %Used Partition
<VOLUME NAME> 4194304 194403 5% 37%
According to previous posts on this list, many issues can be attributed
to high inode usage. However, this is not the case on our machines.
Here is sample output from one of our OpenAFS servers, which is similar
to all of the four other ones.
openafs1 ~ # df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
udev 1903816 413 1903403 1% /dev
tmpfs 1911210 551 1910659 1% /run
/dev/vda1 1905008 154821 1750187 9% /
tmpfs 1911210 1 1911209 1% /dev/shm
tmpfs 1911210 5 1911205 1% /run/lock
tmpfs 1911210 17 1911193 1% /sys/fs/cgroup
/dev/vdb 19660800 3461203 16199597 18% /vicepa
/dev/vdc 19660800 1505958 18154842 8% /vicepb
tmpfs 1911210 4 1911206 1% /run/user/0
AFS 2147483647 0 2147483647 0% /afs
We are running the latest HWE kernel (4.15.0-38-generic) for Ubuntu
16.04 (which is the OS for both server and client machines). We are
openafs-client/xenial,now 1.8.2-0ppa2~ubuntu16.04.1 amd64 [installed]
openafs-krb5/xenial,now 1.8.2-0ppa2~ubuntu16.04.1 amd64 [installed]
openafs-modules-dkms/xenial,xenial,now 1.8.2-0ppa2~ubuntu16.04.1 all
[installed]
openafs-client/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
openafs-dbserver/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
openafs-fileserver/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
openafs-krb5/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
openafs-modules-dkms/xenial,xenial,now 1.6.15-1ubuntu1 all [installed]
(Off-topic, but that looks to be missing some security fixes.)
Post by Theo Ouzhinski
What could be the problem? Is there something I missed?
It's not really ringing a bell off the top of my head, no.
That said, there's a number of potential ways to get ENOSPC, so it would be
good to get more data, like an strace of the failing touch, and maybe a
packet capture (port 7000) during the touch, both from a clean cache and
potentially a second attempt.
-Ben
_______________________________________________
OpenAFS-info mailing list
https://lists.openafs.org/mailman/listinfo/openafs-info
Continue reading on narkive:
Loading...