Group Anagrams

🔗 NeetCode Problem:
https://neetcode.io/problems/anagram-groups 🔗 LeetCode Problem:
https://leetcode.com/problems/group-anagrams/

Problem Statement

Given an array of strings strs, group the anagrams together. You can return the answer in any order. An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. Example:

Input: ["eat","tea","tan","ate","nat","bat"]
Output: [["bat"],["nat","tan"],["ate","eat","tea"]]

Approach 1: Brute Force (Sorting)

Idea

Sort each string and use the sorted version as a key. All anagrams will have the same sorted representation, so we can group them together in a hash map.

Pseudocode

1. Create an unordered_map where key is sorted string
2. For each string in input:
   a. Sort the string
   b. Use sorted string as key
   c. Add original string to the vector mapped by this key
3. Extract all vectors from the map and return

C++ Implementation

class Solution {
public:
    vector<vector<string>> groupAnagrams(vector<string>& strs) {
        vector<vector<string>> ans;
        unordered_map<string, vector<string>> umap;
        
        for(string &str : strs) {
            string original = str;
            sort(str.begin(), str.end());
            umap[str].push_back(original);
        }
        
        for(auto x : umap) {
            vector<string> temp = x.second;
            ans.push_back(temp);
        }
        
        return ans;
    }
};

Complexity Analysis

Time Complexity: O(n * k log k) → Where n is the number of strings and k is the maximum length of a string. Each string is sorted which takes O(k log k), and we do this for all n strings.
Space Complexity: O(n * k) → We store all strings in the hash map.

Why it’s not optimal?

Sorting each string takes O(k log k) time. If strings are very long, this becomes the bottleneck. We can do better by counting character frequencies instead of sorting.

Optimized Approach: Character Frequency Counting

Idea

Instead of sorting, count the frequency of each character (a-z) in every string. Use this frequency array as the key. All anagrams will have identical frequency arrays.

Key Insight

Sorting is unnecessary! Two strings are anagrams if and only if they have the same character frequencies. A frequency array is a direct fingerprint of a string’s composition and is faster to compute than sorting.

How it works:

Create a frequency array of size 26 (for a-z)
For each string, count character frequencies
Use the frequency array as the key in a map
Group strings by their frequency fingerprint
Extract all groups from the map

C++ Implementation

class Solution {
public:
    vector<vector<string>> groupAnagrams(vector<string>& strs) {
        map<vector<int>, vector<string>> mp;
        vector<vector<string>> res;

        for(string str : strs) {
            vector<int> freq(26, 0);
            for(char ch : str)
                freq[ch - 'a']++;
            mp[freq].push_back(str);
        }

        for(auto m : mp)
            res.push_back(m.second);

        return res;
    }
};

Complexity Analysis

Time Complexity: O(n * k) → Where n is the number of strings and k is the maximum length of a string. We iterate through each character once for each string. No sorting involved!
Space Complexity: O(n * k) → We store all strings in the map.

Why it’s optimal?

We eliminate the sorting step, reducing time complexity from O(n * k log k) to O(n * k). Character frequency counting is linear and much faster than sorting.

Comparison: Brute Force vs Optimized

Approach	Time Complexity	Space Complexity	Pros	Cons
Sorting	O(n * k log k)	O(n * k)	Simple to understand, works with strings directly	Slower due to sorting overhead
Frequency Array	O(n * k)	O(n * k)	Faster! Linear time per string	Requires understanding of frequency mapping

Why Frequency Array Approach is Most Optimal

Linear character processing: We count frequencies in O(k) time instead of O(k log k) for sorting
No sorting overhead: Avoids expensive comparison operations inherent in sorting
Scalable: Better performance as string lengths increase
Direct fingerprinting: Frequency arrays directly represent what makes strings anagrams
Cache-friendly: Sequential array iteration is cache-efficient
Predictable performance: No worst-case slowdown (unlike some sorting algorithms)

🚨 Important: Why Use `map` Instead of `unordered_map`?

The Problem with `unordered_map<vector<int>, ...>`

// ❌ This WILL NOT COMPILE
unordered_map<vector<int>, vector<string>> mp;

Reason: unordered_map requires a hash function for its keys. While vector<int> has comparison operators (needed for map), it doesn’t have a built-in hash function (needed for unordered_map).

Why `map` Works

// ✅ This COMPILES and WORKS
map<vector<int>, vector<string>> mp;

map is a tree-based map (implemented as a Red-Black Tree) that only needs comparison operators (<), which vector<int> has by default.

Comparison: `map` vs `unordered_map`

Feature	`map`	`unordered_map`
Data Structure	Red-Black Tree	Hash Table
Requires	`operator<`	Hash function + `operator==`
Lookup Time	O(log n)	O(1) average
Key Requirements	Any comparable type	Must have hash function
Insertion Time	O(log n)	O(1) average
Best for this problem	✅ Works immediately	❌ Requires custom hash

Can We Use `unordered_map`? Yes, But It’s Complex

You’d need to define a custom hash function:

struct VectorHash {
    size_t operator()(const vector<int>& v) const {
        size_t hash = 0;
        for(int val : v) {
            hash ^= std::hash<int>{}(val) + 0x9e3779b9 + (hash << 6) + (hash >> 2);
        }
        return hash;
    }
};

unordered_map<vector<int>, vector<string>, VectorHash> mp;

Why we don’t do this: It adds unnecessary complexity. map works fine, and the O(log n) lookup vs O(1) lookup difference is negligible for this problem.

Rule of Thumb for Future Problems

Can you use map? → Check if your key type is comparable (has < operator)
- If YES: Use map (simpler, always works)
- If NO: Go to step 2
Does your key type have a built-in hash? → Basic types like int, string, long long have built-in hashes
- If YES: Use unordered_map (faster)
- If NO: Go to step 3
Can you define a custom hash function? → For complex types like vector, pair, array
- If YES and it’s worth it: Use unordered_map with custom hash
- If NO or too complex: Use map

Example Walkthroughs

Example 1: Basic Example

Input: ["eat","tea","tan","ate","nat","bat"] Using Frequency Array Approach:

Processing "eat":
  Frequency: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
             (a=1 at index 0, e=1 at index 4, t=1 at index 19)
  
Processing "tea":
  Frequency: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
             (same as "eat" - anagram!)
  
Processing "tan":
  Frequency: [1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0]
             (a=1 at index 0, n=1 at index 13, t=1 at index 19)
  
Processing "ate":
  Frequency: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
             (same as "eat" and "tea" - anagram!)
  
Processing "nat":
  Frequency: [1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0]
             (same as "tan" - anagram!)
  
Processing "bat":
  Frequency: [1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
             (a=1 at index 0, b=1 at index 1, t=1 at index 19)
             (only 'a', 'b', 't' - unique)

Map after processing:
  [a=1,e=1,t=1] → ["eat", "tea", "ate"]
  [a=1,n=1,t=1] → ["tan", "nat"]
  [a=1,b=1,t=1] → ["bat"]

Output: [["eat","tea","ate"], ["tan","nat"], ["bat"]]

Example 2: Single Characters

Input: ["a","b","ba","bb","baa"] Using Frequency Array Approach:

"a":  [1,0,0,...] → key represents single 'a'
"b":  [0,1,0,...] → key represents single 'b'
"ba": [1,1,0,...] → key represents one 'a' and one 'b'
"bb": [0,2,0,...] → key represents two 'b's
"baa":[2,1,0,...] → key represents two 'a's and one 'b'

Map groups:
  [1,0,0,...] → ["a"]
  [0,1,0,...] → ["b"]
  [1,1,0,...] → ["ba"]
  [0,2,0,...] → ["bb"]
  [2,1,0,...] → ["baa"]

Output: [["a"], ["b"], ["ba"], ["bb"], ["baa"]]

Example 3: Identical Anagrams

Input: ["abc","bca","cab"] Using Frequency Array Approach:

"abc": [1,1,1,0,0,...] → one 'a', one 'b', one 'c'
"bca": [1,1,1,0,0,...] → same frequency (anagram!)
"cab": [1,1,1,0,0,...] → same frequency (anagram!)

Map groups:
  [1,1,1,0,0,...] → ["abc", "bca", "cab"]

Output: [["abc", "bca", "cab"]]

Key Takeaways

Anagrams have identical character frequencies: This is the core insight that makes the solution work.
Frequency counting beats sorting: O(nk) is faster than O(nk log k) for this problem.
Use map when key type is comparable: It’s simpler than implementing custom hash functions.
vector<int> works as a key in map: Because vectors are comparable by default.
vector<int> doesn’t work in unordered_map: Unless you define a custom hash function.
Linear traversal is often better than sorting: When you just need to count/group, avoid expensive sorting operations.
Understand data structure requirements: Different containers have different requirements for their key types.

Documentation Index

​Group Anagrams

​Problem Statement

​Approach 1: Brute Force (Sorting)

​Idea

​Pseudocode

​C++ Implementation

​Complexity Analysis

​Why it’s not optimal?

​Optimized Approach: Character Frequency Counting

​Idea

​Key Insight

​How it works:

​C++ Implementation

​Complexity Analysis

​Why it’s optimal?

​Comparison: Brute Force vs Optimized

​Why Frequency Array Approach is Most Optimal

​🚨 Important: Why Use map Instead of unordered_map?

​The Problem with unordered_map<vector<int>, ...>

​Why map Works

​Comparison: map vs unordered_map

​Can We Use unordered_map? Yes, But It’s Complex

​Rule of Thumb for Future Problems

​Example Walkthroughs

​Example 1: Basic Example

​Example 2: Single Characters

​Example 3: Identical Anagrams

​Key Takeaways

Group Anagrams

Problem Statement

Approach 1: Brute Force (Sorting)

Idea

Pseudocode

C++ Implementation

Complexity Analysis

Why it’s not optimal?

Optimized Approach: Character Frequency Counting

Idea

Key Insight

How it works:

C++ Implementation

Complexity Analysis

Why it’s optimal?

Comparison: Brute Force vs Optimized

Why Frequency Array Approach is Most Optimal

🚨 Important: Why Use `map` Instead of `unordered_map`?

The Problem with `unordered_map<vector<int>, ...>`

Why `map` Works

Comparison: `map` vs `unordered_map`

Can We Use `unordered_map`? Yes, But It’s Complex

Rule of Thumb for Future Problems

Example Walkthroughs

Example 1: Basic Example

Example 2: Single Characters

Example 3: Identical Anagrams

Key Takeaways