File Organization And Access (Copy)
1. Introduction to File Organization
- Computers store vast amounts of data, making efficient file organization essential.
- File organization determines how records are arranged in a file.
- Different methods of organizing data influence file access speed and efficiency.
- File organization methods:
- Serial file organization
- Sequential file organization
- Random file organization
2. Serial File Organization
- Definition: Records are stored in the order they are added, without any predefined order.
- Characteristics:
- New records are appended at the end.
- Searching for a record requires scanning the file from start to finish.
- Usage:
- Commonly used for temporary transaction files, such as:
- Meter readings in gas/electricity billing.
- Logs of financial transactions before permanent storage.
- Commonly used for temporary transaction files, such as:
- Example:
- A meter reading system records customer readings in the order they arrive.
- If five records exist, a new record becomes the sixth entry in the file.
3. Sequential File Organization
- Definition: Similar to serial files but with records stored in a specific order based on a key field (e.g., customer ID).
- Characteristics:
- Records are arranged in ascending or descending order.
- Searching for a record is faster than in serial files because the search can stop when the next record surpasses the target.
- Usage:
- Used when data needs to be processed in order, such as:
- Payroll processing.
- Monthly billing systems.
- Used when data needs to be processed in order, such as:
- Example:
- Customer records sorted by ID:
- If searching for Customer 6, the search stops at Customer 7, concluding that Customer 6 does not exist.
4. Random File Organization
- Definition: Records are placed at arbitrary locations using a hashing function.
- Characteristics:
- Each record has a unique key, used with a hashing algorithm to determine storage location.
- Records do not follow any fixed order.
- Direct access to records is much faster than sequential access.
- Usage:
- Ideal for large databases where quick lookup is needed, such as:
- Airline reservation systems.
- Banking databases.
- Ideal for large databases where quick lookup is needed, such as:
- Example:
- A hashing function calculates storage location:
- The key determines the storage location rather than the order of entry.
5. File Access Methods
- Definition: Determines how records are searched and retrieved from a file.
- Two main access methods:
- Sequential Access
- Direct Access
5.1 Sequential Access
- Definition: Records are accessed in a linear order, starting from the beginning.
- Characteristics:
- Used for serial and sequential files.
- Searching requires reading every record until the target is found.
- Efficient when processing all records at once.
- Usage:
- Payroll systems.
- Reading all customer records for an electricity bill.
- Example:
- Searching for Customer 6:
- Since Customer 7 appears, it confirms Customer 6 is missing.
5.2 Direct Access
- Definition: A record can be located instantly using a key field without scanning other records.
- Characteristics:
- Suitable for both sequential and random files.
- Uses either indexing or hashing algorithms.
- Usage:
- Bank account lookups.
- Retrieving specific student records.
- Example:
- A hashing function computes a record’s location:
- The system jumps directly to Address 3, skipping irrelevant records.
6. Indexing in Sequential Files
- Definition: An index table stores key fields and their corresponding file locations.
- Characteristics:
- Instead of searching the whole file, the system looks up the index table first.
- Reduces search time significantly.
- Example:
- An index table for customer IDs:
- If searching for Customer 002, the system retrieves it directly from address 20.
7. Hashing Algorithms in Random Access Files
- Definition: A mathematical formula that determines a record’s storage location.
- Characteristics:
- Converts a key field into an address within a file.
- Eliminates the need for sequential searching.
- Hash Collisions occur when two keys map to the same address.
- Example:
- Given 2000 storage locations and customer IDs from 1-9999, a simple hashing formula:
- For Customer ID 3024:
- The record is stored at location 1024.
8. Handling Hash Collisions
- Definition: When two keys generate the same storage location.
- Two solutions:
- Open Hashing:
- The conflicting record is stored in the next available location.
- Closed Hashing (Overflow Area):
- A dedicated overflow area is used to store conflicts.
- Open Hashing:
- Example:
- If Customer 5024 hashes to address 1024, which is occupied, the system:
- Open hashing: Places the record in the next free address.
- Closed hashing: Moves it to a predefined overflow area.
- If Customer 5024 hashes to address 1024, which is occupied, the system:
9. Comparing File Organization and Access Methods
| Feature | Serial | Sequential | Random |
|---|---|---|---|
| Order of Records | Order of entry | Sorted order | Any order |
| Search Time | Slow (full scan) | Faster than serial | Fastest (direct access) |
| Data Modification | New records added at the end | Inserted in order | Any position |
| Best for | Temporary logs | Payroll, billing | Large databases |
10. Choosing the Right File Organization
- Serial Files: Suitable for log files, transaction histories.
- Sequential Files: Best for processing all records, like payroll.
- Random Files: Ideal for fast lookups, like bank databases.
