String Encode and Decode

🔗 NeetCode Problem: https://neetcode.io/problems/encode-and-decode-strings 🔗 LeetCode Problem: https://leetcode.com/problems/encode-and-decode-strings

Problem Statement

Design an algorithm to encode a list of strings to a single string, and decode the encoded string back to the original list of strings. Constraints:

The string may contain any possible characters.
Strings can be empty.
You cannot use external libraries like base64 encoding.

Example:

Input: ["Hello","World"]
Encoded: 5#Hello5#World
Decoded: ["Hello","World"]

Approach 1: Brute Force

Idea

Use a simple delimiter (like comma) to separate strings. However, this fails if the delimiter appears in the string itself.

Pseudocode

encode(strs):
    result = ""
    for each str in strs:
        result += str + ","
    return result

decode(s):
    return s.split(",")

C++ Implementation

class Solution {
public:
    string encode(vector<string>& strs) {
        string result = "";
        for(string str : strs)
            result += str + ",";
        return result;
    }

    vector<string> decode(string s) {
        vector<string> result;
        string current = "";
        for(char c : s) {
            if(c == ',') {
                result.push_back(current);
                current = "";
            } else {
                current += c;
            }
        }
        return result;
    }
};

Complexity Analysis

Time Complexity: O(n) → Where n is the total number of characters across all strings
Space Complexity: O(n) → For storing the encoded result

Why it’s not optimal?

If any string contains the delimiter character (comma in this case), the decoding will break. For example:

Input: ["a,b", "c"]
Encoded: a,b,c,
Decoded: ["a", "b", "c"] ❌ (Wrong!)

This approach cannot handle strings with delimiters in them.

Optimized Approach: Length-Prefixed Encoding

Idea

Instead of using a delimiter, encode the length of each string followed by a separator, then the string itself. This way, we know exactly how many characters belong to each string, making it impossible for content to interfere with the format.

Key Insight

By storing the length of each string before the string itself, we can unambiguously decode any string, regardless of its content. The format becomes: length#string where # is a separator that guides our parsing, not a content delimiter.

How it works:

Encoding: For each string, append its length, followed by #, followed by the string itself
Decoding: Parse the length until we hit #, then read exactly that many characters for the string
Repeat until we’ve decoded all strings

C++ Implementation

class Solution {
public:
    string encode(vector<string>& strs) {
        string result = "";
        for(string str : strs)
            result += to_string(str.size()) + "#" + str;
        return result;
    }

    vector<string> decode(string s) {
        vector<string> result;
        int i = 0;
        while(i < s.size()) {
            int j = i;
            // Find the '#' that separates length from string
            while(s[j] != '#')
                j++;
            
            // Extract the length
            int len = stoi(s.substr(i, j - i));
            
            // Extract the string of that length
            string word = s.substr(j + 1, len);
            result.push_back(word);
            
            // Move to next string
            i = j + 1 + len;
        }
        return result;
    }
};

Complexity Analysis

Time Complexity: O(n) → Where n is the total number of characters. We process each character exactly once.
Space Complexity: O(n) → For storing the encoded result (or decoded result in space-efficient implementation)

Why it’s optimal?

Works with any character in the string, including #, commas, or any special character
Linear time complexity - we never backtrack or revisit characters
Linear space complexity - only storing the output
Simple and elegant implementation
No edge cases that break the algorithm

Comparison: Brute Force vs Optimized

Approach	Time Complexity	Space Complexity	Pros	Cons
Brute Force (Delimiter)	O(n)	O(n)	Simple to understand	Fails with delimiter in string
Length-Prefixed Encoding	O(n)	O(n)	Handles all characters, Unambiguous decoding	Slightly more complex parsing

Why Length-Prefixed Encoding is Most Optimal

Universal Delimiter Independence - No character can break the format since length dictates how many chars to read
Linear Efficiency - Single pass through the data for both encoding and decoding
Handles All Edge Cases - Empty strings, special characters, very long strings all work perfectly
Deterministic Parsing - No ambiguity; each byte of input maps to a unique output
Industry Standard - Used in protocols like gRPC, Protobuf, and network communication
No External Dependencies - Pure algorithm, no external libraries needed

Example Walkthroughs

Example 1: Simple Strings

Input: ["Hello", "World"] Using Length-Prefixed Encoding:

Encoding:
- "Hello" has length 5 → "5#Hello"
- "World" has length 5 → "5#World"
- Result: "5#Hello5#World"

Decoding "5#Hello5#World":
- i=0: Find '#' at j=1
  - Length = 5
  - Read 5 chars from j+1: "Hello"
  - i = 1 + 1 + 5 = 7
- i=7: Find '#' at j=8
  - Length = 5
  - Read 5 chars from j+1: "World"
  - i = 8 + 1 + 5 = 14 (end)
- Result: ["Hello", "World"] ✓

Strings successfully encoded and decoded!

Example 2: Strings with Special Characters

Input: ["a#b", "c,d", "e"] Using Length-Prefixed Encoding:

Encoding:
- "a#b" has length 3 → "3#a#b"
- "c,d" has length 3 → "3#c,d"
- "e" has length 1 → "1#e"
- Result: "3#a#b3#c,d1#e"

Decoding "3#a#b3#c,d1#e":
- i=0: Find '#' at j=1
  - Length = 3
  - Read 3 chars: "a#b" (the # inside doesn't matter!)
  - i = 1 + 1 + 3 = 5
- i=5: Find '#' at j=6
  - Length = 3
  - Read 3 chars: "c,d" (the comma is just data)
  - i = 6 + 1 + 3 = 10
- i=10: Find '#' at j=11
  - Length = 1
  - Read 1 char: "e"
  - i = 11 + 1 + 1 = 13 (end)
- Result: ["a#b", "c,d", "e"] ✓

Notice how special characters don’t break the decoding!

Example 3: Empty Strings

Input: ["", "abc", ""] Using Length-Prefixed Encoding:

Encoding:
- "" has length 0 → "0#"
- "abc" has length 3 → "3#abc"
- "" has length 0 → "0#"
- Result: "0#3#abc0#"

Decoding "0#3#abc0#":
- i=0: Find '#' at j=1
  - Length = 0
  - Read 0 chars: ""
  - i = 1 + 1 + 0 = 2
- i=2: Find '#' at j=3
  - Length = 3
  - Read 3 chars: "abc"
  - i = 3 + 1 + 3 = 7
- i=7: Find '#' at j=8
  - Length = 0
  - Read 0 chars: ""
  - i = 8 + 1 + 0 = 9 (end)
- Result: ["", "abc", ""] ✓

Empty strings are handled gracefully!

Key Takeaways

Length-Prefixed Encoding is the standard solution for this problem, used in real-world protocols
Never rely solely on delimiters when the delimited content is untrusted or arbitrary
Separating metadata (length) from content allows safe serialization of any data
Linear time and space is optimal for serialization problems
Test edge cases: empty strings, special characters, very long strings
This pattern appears everywhere: network protocols, binary serialization, and data encoding

Documentation Index

​String Encode and Decode

​Problem Statement

​Approach 1: Brute Force

​Idea

​Pseudocode

​C++ Implementation

​Complexity Analysis

​Why it’s not optimal?

​Optimized Approach: Length-Prefixed Encoding

​Idea

​Key Insight

​How it works:

​C++ Implementation

​Complexity Analysis

​Why it’s optimal?

​Comparison: Brute Force vs Optimized

​Why Length-Prefixed Encoding is Most Optimal

​Example Walkthroughs

​Example 1: Simple Strings

​Example 2: Strings with Special Characters

​Example 3: Empty Strings

​Key Takeaways

String Encode and Decode

Problem Statement

Approach 1: Brute Force

Idea

Pseudocode

C++ Implementation

Complexity Analysis

Why it’s not optimal?

Optimized Approach: Length-Prefixed Encoding

Idea

Key Insight

How it works:

C++ Implementation

Complexity Analysis

Why it’s optimal?

Comparison: Brute Force vs Optimized

Why Length-Prefixed Encoding is Most Optimal

Example Walkthroughs

Example 1: Simple Strings

Example 2: Strings with Special Characters

Example 3: Empty Strings

Key Takeaways