Data Storage and File Compression | O Level Computer Science 2210 & IGCSE Computer Science 0478 | Detailed Free Notes To Score An A Star (A*)
Measurement of Data Storage
- Bit
- Basic unit of computer memory storage
- It either has a value of 0 or 1.
- Word comes from Binary digIT.
- Byte
- Smallest unit of memory in computer
- 1 byte has 8 bits.
- Nibble
- Half the byte
- Contains 4 bits
- One byte doesn’t allow to store much information so memory size is in the following multiples
- 1 KB or 1 kilobyte is 1000 bytes
- 1 MB or 1 megabyte is 1000000 bytes
- 1 GB or 1 gigabyte is 1 000 000 000 bytes
- 1 TB or 1 terabyte is 1 000 000 000 000 bytes
- 1 PB or 1 petabyte is 1 000 000 000 000 000 bytes
- 1 EB or 1 exabyte is 1 000 000 000 000 000 000 bytes
- It is only focused on some storage devices
- Technically inaccurate
- It uses SI base 10 system where 1 kilo is 1000.
- Memory size is actually measured in terms of powers of 2.
- Therefore, the system adopted by the International Electrotechnical Commission (IEC) is based on the binary system
- 1 KiB or 1 kibibyte is 2^10Â or 1024 bytes
- 1 MiB or 1 mebibyte is 2^20 or 1048576 bytes
- 1 GiB or 1 gibibyte is 2^30 or 1073741824 bytes
- 1 TiB or 1 tebibyte is 2^40 or 1099511627776 bytes
- 1 PiB or 1 pebibyte is 2^50 or 1125899906842624 bytes
- 1 EiB or 1 exbibyte is 2^60 or 1152921504606846976 bytes
- This system is more accurate
- Internal memories such as RAM or ROM are measured through this system.
Calculation of File Size
- File size of an image is as follows
- Image resolutions (in pixels) x Color Depth (in bits)
- Mono Sound File
- Sample Rate (in Hz) x Sample Resolution (in bits) x Length of Sample (in seconds)
- Stereo Sound File
- The result of the mono sound file calculation will be multiplied with 2
Data Compression
- Necessary to reduce or compress the file size
- Save storage space
- Reduce time take for streaming
- Reduce time taken for download, transfer or upload
- Bandwidth
- The maximum rate of transfer of data across a network, measured in bits per second.
- It is used up when we upload or download something
- Fewer bits in compressed files will use less bandwidth, ensuring faster transfer
- Reduces costs
- Cloud storage costs are based on the size of data stored
- ISP will charge less with less amount of data transferred
Lossy and Lossless file Compression
- Lossy File Compression
- Algorithm eliminates unnecessary data from the file.
- Original file can not be reconstructed once compressed
- Some loss of detail occurs
- The algorithm decides which parts of the file can be discarded.
- Lossy compression
- Will reduce the resolution/ bit or color depth of images
- Sound file may see a fall in sampling rate or resolution
- Final file is smaller than lossless files
- Benefits in storage issues
- Benefits in data transfer rate requirements
- Common lossy file compression
- MPEG-3 (MP3)
- MP3 files are used for playing music
- It is a compression technology that reduces the size of normal music files by about 90%.
- MP3 files are never of the same quality of a CD or DVD
- It is still satisfactory in most cases
- The algorithm removes the sounds that human ear can’t hear
- Sounds outside the human ear listening range
- Perceptual Music Shaping
- If two sounds are playing at the same time, the louder sound can only be heard by our ear
- Therefore, the softer sound is eliminated
- MPEG-4 (MP4)
- Allows storage of multimedia files instead of just sound.
- It retains acceptable quality of sounds and videos
- No real loss in discernable quality
- Usually we can stream videos and music online in this format
- JPEG
- A raw bitmap file is very large
- Such files are temporary
- JPEG is a lossy compression used for bitmap images
- A new file is formed and the original file can no longer be reconstructed
- JPEG compression occurs on two key concepts
- Human eye can not detect the differences in color shades as well as it determines the image brightness differences.
- Our eyes are more sensitive to brightness variations compared to color variations
- By separating pixel color from brightness, images can be split into 8 x 8 pixel blocks.
- Certain information can then be discarded
- It will not cause any real of noticeable deterioration in quality.
- Human eye can not detect the differences in color shades as well as it determines the image brightness differences.
- MPEG-3 (MP3)
- Lossless File Compression
- All data from the original uncompressed file can be reconstructed
- Important where any data loss will be disastrous
- For example, very large and complex spreadsheets being transferred
- Or a very large computer applications being transferred
- Lossless file compression does not lose any data form the original file
- One method is run-length encoding (RLE)
- Reversible file compression
- Reduces the size of a string of adjacent, identical data
- For example repeated alphabets or repeated colors
- A repeating string is encoded in two values
- First value tells the number of identical items in the run
- The second value represents the code of the data item. For example, the keyboard character that was being repeated.
- Only effective where there is long run of repeated units/ bits.
- How RLE is used on text based data?
- If there is a string called bbbbdddddaaaaacc.
- Each character requires one byte, then the string requires 16 bytes.
- Then we can use the following method using ASCII code.
- bbbb becomes 04 (showing that 4 times it is repeated) and 98 (which is the ASCII code of b), similarly, for the d, it will be 05 100, for the a it will be 05 97 and c will be 02 99.
- Here, we can see that 8 bytes of memory will now be required if 1 byte of memory is required for each data here. It reduces the original size b half.
- If the data is something like cdcdcdcdcd, then we need to use a flag
- A flag preceding the data indicates that the upcoming data has a number of repeating units.
- If no flag used, the following data is taken at face value, and a run of 1.
- For example
- If the string is aaaaaaaa bbbbbbbb c d c d c d eeeeeeee
- Then, without a flag, it will code as 0897 0898 01 99 01 100 01 99 01 100 01 99 01 100 08 101
- Here, the original data will use 32 bytes of data, the compressed one use 18 bytes
- One the other hand, we can use a flag, at 255, which will reduce the over all bytes used.
- 255 08 97 255 10 98 99 100 99 100 99 100 255 08 101
- 15 values used so 15 bytes required only.
- RLE can be used with images
- It can be used for colored and uncolored images
- Real life reductions are not very large when using lossless compression
- Other data such as file header etc. have to be stored as well.
- All data from the original uncompressed file can be reconstructed
