Translation Software (Copy)
16.3 Translation Software
Key Terminologies and Concepts
- Lexical Analysis: First stage of compilation that removes unnecessary characters and tokenizes the program.
- Syntax Analysis: Second stage where the tokenized program is checked for syntax errors.
- Code Generation: Third stage that produces the object code (machine code) for execution.
- Optimisation: Fourth stage aimed at improving efficiency by reducing memory usage, execution time, and redundant computations.
- Backus-Naur Form (BNF) Notation: A formal representation of the grammar rules of a programming language.
- Syntax Diagrams: Graphical representations of grammatical rules in programming languages.
- Reverse Polish Notation (RPN): A way to express arithmetic expressions without brackets, useful for evaluation using a stack.
16.3.1 How an Interpreter Differs from a Compiler
- Interpreter
- Executes a program line-by-line without generating an object code.
- Useful for debugging since it stops at errors and allows correction before proceeding.
- Requires re-execution every time the program is run.
- Compiler
- Translates the entire source code into machine code before execution.
- Produces an object file that can be executed multiple times without recompilation.
- More efficient for execution but slower during development due to the need for full recompilation.
- Comparison Summary:
Feature Compiler Interpreter Execution Speed Faster (After Compilation) Slower (Executes Line-by-Line) Error Detection After Full Compilation Line-by-Line Output Object Code No Object Code Memory Usage Requires Storage for Object Code No Additional Storage Required
16.3.2 Stages in Compilation of a Program
1. Lexical Analysis
- Removes unnecessary characters such as white spaces and comments.
- Converts source code into tokens (symbolic representations of reserved words and identifiers).
- Uses keyword tables to store reserved words and their tokenized representations.
- Constructs a symbol table for variables, constants, and identifiers.
Example Keyword Table:
Keyword/Symbol Token = 01 + 02 DECLARE 31 INTEGER 32 INPUT 33 OUTPUT 34 Example Symbol Table:
Token Symbol Type Data Type 81 X Variable Integer 82 Y Variable Integer 84 “Please enter two numbers” Constant String
2. Syntax Analysis
- Checks if the tokenized program follows the grammatical rules of the programming language.
- Uses parsing techniques such as BNF notation and syntax diagrams.
- Detects errors such as missing operators, misplaced brackets, or incorrect expressions.
- If errors exist, they are reported, and compilation stops without proceeding to code generation.
Example Error Detection:
- Given source code:
z + x + y - Expected tokenized list:
83 02 81 02 82 - Error detected:
(01) expected(missing assignment operator=).
- Given source code:
3. Code Generation
- Converts the syntax-checked code into object code (machine-readable form).
- The object program can be:
- Directly executable machine code.
- Intermediate code, which needs further translation during execution.
- Allows relocatable code, linking with library routines, and merging multiple programs.
Example:
- High-level Code:
x = a + b y = a + b + c - Equivalent Assembly Code:
LDD 436 // Load value of A ADD 437 // Add value of B STO 612 // Store result in X LDD 436 // Load value of A ADD 437 // Add value of B ADD 438 // Add value of C STO 613 // Store result in Y
- High-level Code:
4. Code Optimisation
- Improves efficiency by:
- Reducing redundant calculations.
- Minimising memory usage.
- Speeding up execution.
- Example:
- Unoptimised Code:
LDD x ADD y STO w LDD x ADD y ADD z STO v - Optimised Code:
LDD x ADD y STO w ADD z STO v - This avoids redundant loading and re-computation.
- Unoptimised Code:
16.3.3 Syntax Diagrams and Backus-Naur Form (BNF)
- Syntax Diagrams:
- Graphical representations of programming grammar.
- Example: Variable Syntax Diagram
Variable → Letter → Digit
- Backus-Naur Form (BNF):
- A formal notation for defining syntax rules.
- Example:
<assignment_statement> ::= <variable> "=" <expression> <variable> ::= <letter> | <letter> <digit> <expression> ::= <term> | <expression> "+" <term> - Ensures the compiler enforces grammatical correctness.
16.3.4 Reverse Polish Notation (RPN) and Its Advantages
- RPN removes the need for brackets by writing expressions in postfix order.
- Example:
- Infix Notation:
(A + B) * C - RPN Equivalent:
A B + C *
- Infix Notation:
- Stack-based evaluation:
- Example Expression:
b a * c d a + + - - Using values:
a = 2, b = 2, c = 1, d = 3 - Stack Processing:
Push 2 Push 2 Multiply → 4 Push 1 Push 3 Push 2 Add → 5 Add → 6 Subtract → -2
- Example Expression:
- Advantages of RPN:
- No ambiguity or precedence rules.
- Efficient for execution in low-memory devices like calculators.
- Eliminates the need for parentheses.
Summary of Key Points
- Translators convert high-level programs into machine-executable formats.
- Compilers produce object code, while interpreters execute line-by-line.
- Compilation involves four stages: Lexical Analysis, Syntax Analysis, Code Generation, and Optimisation.
- Syntax rules are represented using BNF and syntax diagrams.
- Reverse Polish Notation (RPN) is a postfix notation useful for stack-based computation.
