- The actual size of a file, which is the number of bytes that make up the file, and the effective size on the hard disk, which is the number of file system blocks necessary to store it, are different due to the allocation of disk space in blocks.
- The du command can be used to check the size of files, directories, and the total disk space used by the current directory and subdirectories.
- Run “du -h” to see a list of files and folders in a human-readable format.
When you use the Linux
du command, you obtain both the actual disk usage and the true size of a file or directory. We’ll explain why these values aren’t the same.
Why are Actual Disk Usage and True Size Different?
The size of a file and the space it occupies on your hard drive are rarely the same. Disk space is allocated in blocks. If a file is smaller than a block, an entire block is still allocated to it because the file system doesn’t have a smaller unit of real estate to use.
Unless a file’s size is an exact multiple of blocks, the space it uses on the hard drive must always be rounded up to the next whole block. For example, if a file is larger than two blocks but smaller than three, it still takes three blocks of space to store it.
Two measurements are used in relation to file size. The first is the actual size of the file, which is the number of bytes of content that make up the file. The second is the effective size of the file on the hard disk. This is the number of file system blocks necessary to store that file.
How to Check a File’s Size
Let’s look at a simple example. We’ll redirect a single character into a file to create a small file:
echo "1" > geek.txt
Now, we’ll use the long format listing,
ls, to look at the file length:
ls -l geek.txt
The length is the numeric value that follows the
dave dave entries, which is two bytes. Why is it two bytes when we only sent one character to the file? Let’s take a look at what’s happening inside the file.
We’ll use the
hexdump command, which will give us an exact byte count and allow us to “see” non-printing characters as hexadecimal values. We’ll also use the
-C (canonical) option to force the output to show hexadecimal values in the body of the output, as well as their alphanumeric character equivalents:
hexdump -C geek.txt
The output shows us that, beginning at offset 00000000 in the file, there’s a byte that contains a hexadecimal value of 31, and a one that contains a hexadecimal value of 0A. The right-hand portion of the output depicts these values as alphanumeric characters, wherever possible.
The hexadecimal value of 31 is used to represent the digit one. The hexadecimal value of 0A is used to represent the Line Feed character, which cannot be shown as an alphanumeric character, so it’s shown as a period (.) instead. The Line Feed character is added by
echo . By default,
echostarts a new line after it displays the text it needs to write to the terminal window.
That tallies with the output from
ls and agrees with the file length of two bytes.
Now, we’ll use the
du command to look at the file size:
It says the size is four, but four of what?
There Are Blocks, and Then There Are Blocks
du reports file sizes in blocks, the size it uses depends on several factors. You can specify which block size it should use on the command line. If you don’t force
du to use a particular block size, it follows a set of rules to decide which one to use.
First, it checks the following environment variables:
If any of these exist, the block size is set, and
du stops checking. If none are set,
du defaults to a block size of 1,024 bytes. Unless, that is, an environment variable called
POSIXLY_CORRECT is set. If that’s the case,
du defaults to a block size of 512 bytes.
So, how do we find out which one is in use? You can check each environment variable to work it out, but there’s a quicker way. Let’s compare the results to the block size the file system uses instead.
To discover the block size the file system uses, we’ll use the
tune2fs program. We’ll then use the
-l (list superblock) option, pipe the output through
grep, and then print lines that contain the word “Block.”
In this example, we’ll look at the file system on the first partition of the first hard drive,
sda1, and we’ll need to use
sudo tune2fs -l /dev/sda1 | grep Block
The file system block size is 4,096 bytes. If we divide that by the result we got from
du (four), it shows the
du default block size is 1,024 bytes. We now know several important things.
First, we know the smallest amount of file system real estate that can be devoted to storing a file is 4,096 bytes. This means even our tiny, two-byte file is taking up 4 KB of hard drive space.
The second thing to keep in mind is applications dedicated to reporting on hard drive and file system statistics, such as
tune2fs, can have different notions of what “block” means. The
tune2fs application reports true file system block sizes, while
du can be configured or forced to use other block sizes. Those block sizes are not intended to relate to the file system block size; they’re just “chunks” those commands use in their output.
Finally, other than using different block sizes, the answers from
tune2fs convey the same meaning. The
tune2fs result was one block of 4,096 bytes, and the
du result was four blocks of 1,024 bytes.
Using du to Check File Size
With no command line parameters or options,
du lists the total disk space the current directory and all subdirectories are using.
Let’s take a look at an example:
The size is reported in the default block size of 1,024 bytes per block. The entire subdirectory tree is traversed.
du on a Different Directory
If you want
du to report on a different directory than the current one, you can pass the path to the directory on the command line:
du on a Specific File
If you want
du to report on a specific file, pass the path to that file on the command line. You can also pass a shell pattern to a select a group of files, such as
Reporting on Files in Directories
du report on the files in the current directory and subdirectories, use the
-a (all files) option:
For each directory, the size of each file is reported, as well as a total for each directory.
Limiting Directory Tree Depth
You can tell
du to list the directory tree to a certain depth. To do so, use the
-d (max depth) option and provide a depth value as a parameter. Note that all subdirectories are scanned and used to calculate the reported totals, but they’re not all listed. To set a maximum directory depth of one level, use this command:
du -d 1
The output lists the total size of that subdirectory in the current directory and also provides a total for each one.
To list directories one level deeper, use this command:
du -d 2
Setting the Block Size
You can use the
block option to set a block size for
du for the current operation. To use a block size of one byte, use the following command to get the exact sizes of the directories and files:
If you want to use a block size of one megabyte, you can use the
-m (megabyte) option, which is the same as
If you want the sizes reported in the most appropriate block size according to the disk space used by the directories and files, use the
-h (human-readable) option:
To see the apparent size of the file rather than the amount of hard drive space used to store the file, use the
You can combine this with the
-a (all) option to see the apparent size of each file:
du --apparent-size -a
Each file is listed, along with its apparent size.
Displaying Only Totals
If you want
du to report only the total for the directory, use the
-s (summarize) option. You can also combine this with other options, such as the
-h (human-readable) option:
du -h -s
Here, we’ll use it with the
du --apparent-size -s
Displaying Modification Times
To see the creation or last modification time and date, use the
du --time -d 2
If you see strange results from
du , especially when you cross-reference sizes to the output from other commands, it’s usually due to the different block sizes to which different commands can be set or those to which they default. It could also be due to the differences between real file sizes and the disk space required to store them.
If you need to match the output of other commands, experiment with the
--block option in