CS3 — Floating-Point Representation
- I can explain why integers are insufficient to represent very large numbers and fractions
- I can describe the structure of a floating-point number as mantissa × 2exponent
- I can explain what normalised form means and why it is used
- I can convert a denary number to normalised floating-point binary
- I can convert a normalised floating-point binary number back to denary
- I can state that a normalised positive mantissa starts with 0.1 and a negative with 1.0
- I can convert a positive denary number to normalised floating-point using a three-step process
- I can convert a negative denary number to floating-point by finding the two's complement of the positive mantissa
- I can extract a mantissa and exponent from a bit pattern and compute the denary value
Key vocabulary
Why integers are not enough
Everything we have stored so far — positive integers in CS1, negative integers in CS2 — has been a whole number. But real-world data is full of values that cannot be expressed as integers:
- Temperatures: 36.8°C, −3.5°C, absolute zero at −273.15°C
- Measurements: 1.602 × 10−19 coulombs (charge of an electron)
- Coordinates: GPS latitude 55.9533° N
- Financial data: £14.99, exchange rate 1.2847
None of these fit into any integer type — they require a fractional component. We could scale by 100 and pretend everything is in pence, but this breaks for very large or very small scientific values. We need a fundamentally different representation.
Floating-point solves this using the same idea as scientific notation: separate the significant digits from the scale. Just as 6.5 × 103 = 6500 in denary, floating-point stores a mantissa and an exponent independently in binary.
The floating-point structure
Every floating-point number is stored as:
value = mantissa × 2exponent
- Mantissa — a binary fraction representing the significant digits, stored in two's complement with the binary point after the sign bit
- Exponent — a two's complement integer telling you how far to shift the binary point
In our lessons we use a simplified 8-bit format: 5 bits for the mantissa + 3 bits for the exponent. Real systems use far more (IEEE 754 double precision uses 64 bits total), but the method is identical.
| Mantissa (5 bits) | Exponent (3 bits) | ||||||
|---|---|---|---|---|---|---|---|
| M₄ sign |
M₃ | M₂ | M₁ | M₀ | E₂ sign |
E₁ | E₀ |
| sign bit | 2−1 | 2−2 | 2−3 | 2−4 | sign bit | 21 | 20 |
Binary fractions — place values
Just as denary fractions use tenths, hundredths, thousandths…, binary fractions use halves, quarters, eighths…. The binary point divides whole-number place values (left) from fractional place values (right):
| Sign | . | Bit 1 | Bit 2 | Bit 3 | Bit 4 |
|---|---|---|---|---|---|
| sign | . | 2−1 | 2−2 | 2−3 | 2−4 |
| ± | . | 0.5 | 0.25 | 0.125 | 0.0625 |
To read a mantissa: sign bit 0 = positive, sign bit 1 = negative (two's complement). Then add up all fractional columns with a 1, treating the sign bit as −1 if set.
Example: mantissa 01101 → sign=0, then 0.5 + 0.25 + 0 + 0.0625 = 0.8125
Example: mantissa 10011 → sign=1, then −1 + 0 + 0 + 0.125 + 0.0625 = −0.8125
Normalised form
Many values can be expressed in floating-point in more than one way. For example, 0.375 could be represented as 0.1100 × 2−1, or as 0.0110 × 20, or 0.0011 × 21. This ambiguity causes problems: comparisons fail, precision is wasted, and hardware gets complicated.
Normalised form defines exactly one valid representation for every value:
- Positive normalised: mantissa starts
0.1…— the bit immediately after the sign bit is always 1 - Negative normalised: mantissa starts
1.0…— the bit immediately after the sign bit is always 0
This ensures the mantissa is as large as possible in magnitude (≥ 0.5 for positive, ≤ −0.5 for negative), using every available bit for precision. If a mantissa is not in normalised form, shift it left and subtract 1 from the exponent until it is.
Converting denary to floating-point (positive numbers)
- Convert the denary number to binary (whole part + fractional part)
- Normalise: shift the binary point until the mantissa is in the form
0.1…, counting how many places you shift (that count becomes the exponent) - Store the normalised mantissa in two's complement, pad with zeros to fill all mantissa bits; store the exponent in two's complement
Converting denary to floating-point (negative numbers)
- Convert the positive version to normalised floating-point (steps 1–3 above)
- Find the two's complement of the positive mantissa: invert all bits, then add 1
- Keep the exponent the same — the exponent is always stored for the normalised magnitude
Worked examples
0.375 × 2 = 0.75 → bit 0
0.75 × 2 = 1.5 → bit 1
0.5 × 2 = 1.0 → bit 1
So 0.375 = 0.011₂
0.1… form — shift left once (multiply mantissa by 2, subtract 1 from exponent):0.011 → 0.110, exponent = −1
Pad to 4 fractional bits: 0.1100
Mantissa =
01100 (sign bit 0, fractional bits 1100)Exponent = −1 → 3-bit two's complement =
111Verify: mantissa 01100 = 0.75, exponent 111 = −1, 0.75 × 2−1 = 0.375 ✓
Normalised? Sign=0, next bit=1 → ✓
6 = 110₂, 0.5 = .1₂, so 6.5 = 110.1₂
0.1… form — shift right 3 places:110.1 → 0.1101, exponent = 3
Positive mantissa bits:
01101Invert
01101 → 10010, add 1 → 10011Exponent = 3 → 3-bit two's complement =
011Verify: mantissa 10011 = −1 + 0.125 + 0.0625 = −0.8125, exponent 011 = 3, −0.8125 × 8 = −6.5 ✓
Normalised? Sign=1, next bit=0 → ✓
Mantissa bits =
01101 · Exponent bits = 010Sign bit = 0 → positive
Fractional bits .1101: 0.5 + 0.25 + 0 + 0.0625 = 0.8125
010 in two's complement = 2 (MSB = 0, so positive)
Normalised? Sign=0, next bit=1 → ✓
- Forgetting to normalise. Always check that the bit after the sign bit is 1 (positive) or 0 (negative). A non-normalised mantissa like 0.0110 costs you the mark even if the maths is otherwise correct.
- Wrong direction of shift and wrong sign of exponent. Shifting the mantissa left by n places means the exponent is −n (not +n). Think of it as: you made the mantissa bigger, so the exponent must compensate by being smaller.
- Not converting the mantissa to two's complement for negative numbers. Finding the binary for +6.5 and writing it with a 1 sign bit is wrong. You must invert all 5 bits and add 1 to get the proper two's complement negative mantissa.
- Confusing binary fractions with denary fractions. 0.1₂ = 0.5₁₀ — not 0.1₁₀. Always convert binary fractions using the place value table.
- Losing bits when padding. If the mantissa only has 3 significant fractional bits and your format has 4, pad with a 0 on the right. Dropping bits or misaligning changes the value.
Floating-point is the most commonly failed topic in Higher Computing. Pupils who lose marks almost always do so for one reason: they rush. This is a completely mechanical process — there is no trick and no insight required beyond the method. Every mark is available if you write every step.
Expect these question forms:
- "Represent X in normalised floating-point" (1–2 marks: method + answer)
- "Convert bit pattern XXXXX XXX to denary" (1–2 marks)
- "Explain what normalised form means" (1–2 marks: definition + reason)
- "Explain why 0.1 cannot be represented exactly" (1–2 marks)
Task Set A
Task Set B
0.1 + 0.1 + 0.1 == 0.3 evaluates to False. Using your knowledge of binary fractions, explain why.Higher Computing Science → Computer Systems → CS3
Timing (120 min double):
5 min — warm up (CS2 recap), circulate
5 min — key vocabulary together
10 min — why integers aren't enough (discuss: how would you store 36.8°C?)
5 min — binary fraction place values (do a few together: what is 0.101₂?)
10 min — normalised form: show what it means visually with the format diagram
15 min — Examples 1 and 2 worked on board together, Example 3 and Now You Try independently
5 min — common mistakes ("has anyone made this one just now?")
25 min — tasks
5 min — cold call review on B4/B5 (conversion back, fastest to check)
Watch for: pupils who normalise in the wrong direction (shifting right instead of left, flipping the exponent sign); pupils who forget the two's complement step for negative numbers (just writing a 1 in the sign bit); and pupils who misread binary fractions (0.1₂ ≠ 0.1₁₀).
Whiteboard tip: draw the format diagram on the board before the lesson. Keep it visible throughout. Pupils lose track of which bits are which under exam conditions.
C1 is worth a brief mention even for pupils who don't attempt it — the 0.1 + 0.1 + 0.1 ≠ 0.3 result visibly surprises most pupils and motivates why floating-point precision matters.