Data Storage And Compression (Copy)
1. Data Storage Measurement Units
- Bit (b)
- Smallest unit of data in computing.
- Represents a single binary digit: 0 or 1.
- All larger units are multiples of bits.
- Nibble
- Group of 4 bits.
- Can represent 2⁴ = 16 different values (0–15 in denary, 0–F in hex).
- Byte (B)
- Group of 8 bits.
- Commonly used to store a single character in ASCII encoding.
Binary prefixes (IEC standard)
- All multiples are based on powers of 2 (not decimal 1000).
| Unit | Symbol | Value in bytes | Example equivalent |
|---|---|---|---|
| Kibibyte | KiB | 1024 bytes | Small text file |
| Mebibyte | MiB | 1024 KiB | Image or short audio |
| Gibibyte | GiB | 1024 MiB | Video file |
| Tebibyte | TiB | 1024 GiB | Hard drive |
| Pebibyte | PiB | 1024 TiB | Data center storage |
| Exbibyte | EiB | 1024 PiB | Large-scale backups |
Relationship between units
- 8 bits = 1 byte
- 1024 bytes = 1 KiB
- 1024 KiB = 1 MiB
- 1024 MiB = 1 GiB
- 1024 GiB = 1 TiB
- 1024 TiB = 1 PiB
- 1024 PiB = 1 EiB
2. Calculating File Size – Images
- Formula for uncompressed image file size (in bits):
File size = Resolution (width × height) × Colour depth- Resolution = total number of pixels.
- Colour depth = bits per pixel.
- To convert bits → bytes: divide by 8.
- To convert bytes → KiB/MiB: divide by 1024 accordingly.
Example:
Image size: 1920×1080 pixels, colour depth 24-bit.
- Pixels = 1920 × 1080 = 2,073,600 pixels.
- Bits = 2,073,600 × 24 = 49,766,400 bits.
- Bytes = 49,766,400 ÷ 8 = 6,220,800 bytes.
- In MiB = 6,220,800 ÷ 1024² ≈ 5.93 MiB.
3. Calculating File Size – Sound
- Formula for uncompressed audio file size (in bits):
File size = Sample rate × Sample resolution × Duration (seconds) × Number of channels - Sample rate = number of samples per second (Hz).
- Sample resolution = bits per sample.
- Channels = 1 (mono) or 2 (stereo).
Example:
Stereo audio, 44,100 Hz, 16-bit, 5 seconds:
- Bits = 44,100 × 16 × 5 × 2 = 7,056,000 bits.
- Bytes = 7,056,000 ÷ 8 = 882,000 bytes.
- In MiB = 882,000 ÷ 1024² ≈ 0.84 MiB.
4. Purpose and Need for Data Compression
- Purpose: Reduce file size to:
- Save storage space.
- Reduce transmission time over networks.
- Reduce bandwidth usage.
- Make files easier to send via email or upload/download.
- Impact:
- Smaller files → faster downloads/uploads.
- Less data storage required on devices and servers.
- May affect quality (depending on method).
5. Compression Methods
Lossless Compression
- Definition: Reduces file size without losing any original data.
- When decompressed, file is identical to original.
- Example techniques:
- Run-Length Encoding (RLE): Stores sequences of repeated values as a single value and count.
- Example:
AAAAABBBCC→5A3B2C.
- Example:
- Huffman Coding: Assigns shorter binary codes to frequently used symbols and longer codes to less frequent ones.
- Run-Length Encoding (RLE): Stores sequences of repeated values as a single value and count.
- Uses:
- Text documents (where accuracy is critical).
- Program files.
- Some image formats (PNG, GIF).
Lossy Compression
- Definition: Reduces file size by permanently removing some data, often unnoticeable to human perception.
- Common methods:
- Lower image resolution or colour depth.
- Lower audio sample rate or bit depth.
- Remove high-frequency sounds or visual details.
- Uses:
- JPEG images.
- MP3 audio.
- MPEG video.
- Advantages:
- Much smaller file sizes than lossless.
- Disadvantages:
- Some quality is permanently lost.
- Not suitable where exact reproduction is required.
6. Summary Table – Lossless vs Lossy
| Feature | Lossless | Lossy |
|---|---|---|
| Data lost? | No | Yes |
| File size | Larger than lossy | Smaller than lossless |
| Quality | Exact original | Slight quality loss |
| Uses | Text, PNG, GIF | JPEG, MP3, MP4 |
| Example method | RLE, Huffman coding | Downsampling, quantization |
