Skip to main content

Group Anagrams

πŸ”— NeetCode Problem:
https://neetcode.io/problems/anagram-groups
πŸ”— LeetCode Problem:
https://leetcode.com/problems/group-anagrams/

Problem Statement

Given an array of strings strs, group the anagrams together. You can return the answer in any order. An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. Example:
  • Input: ["eat","tea","tan","ate","nat","bat"]
  • Output: [["bat"],["nat","tan"],["ate","eat","tea"]]

Approach 1: Brute Force (Sorting)

Idea

Sort each string and use the sorted version as a key. All anagrams will have the same sorted representation, so we can group them together in a hash map.

Pseudocode

1. Create an unordered_map where key is sorted string
2. For each string in input:
   a. Sort the string
   b. Use sorted string as key
   c. Add original string to the vector mapped by this key
3. Extract all vectors from the map and return

C++ Implementation

class Solution {
public:
    vector<vector<string>> groupAnagrams(vector<string>& strs) {
        vector<vector<string>> ans;
        unordered_map<string, vector<string>> umap;
        
        for(string &str : strs) {
            string original = str;
            sort(str.begin(), str.end());
            umap[str].push_back(original);
        }
        
        for(auto x : umap) {
            vector<string> temp = x.second;
            ans.push_back(temp);
        }
        
        return ans;
    }
};

Complexity Analysis

  • Time Complexity: O(n * k log k) β†’ Where n is the number of strings and k is the maximum length of a string. Each string is sorted which takes O(k log k), and we do this for all n strings.
  • Space Complexity: O(n * k) β†’ We store all strings in the hash map.

Why it’s not optimal?

Sorting each string takes O(k log k) time. If strings are very long, this becomes the bottleneck. We can do better by counting character frequencies instead of sorting.

Optimized Approach: Character Frequency Counting

Idea

Instead of sorting, count the frequency of each character (a-z) in every string. Use this frequency array as the key. All anagrams will have identical frequency arrays.

Key Insight

Sorting is unnecessary! Two strings are anagrams if and only if they have the same character frequencies. A frequency array is a direct fingerprint of a string’s composition and is faster to compute than sorting.

How it works:

  1. Create a frequency array of size 26 (for a-z)
  2. For each string, count character frequencies
  3. Use the frequency array as the key in a map
  4. Group strings by their frequency fingerprint
  5. Extract all groups from the map

C++ Implementation

class Solution {
public:
    vector<vector<string>> groupAnagrams(vector<string>& strs) {
        map<vector<int>, vector<string>> mp;
        vector<vector<string>> res;

        for(string str : strs) {
            vector<int> freq(26, 0);
            for(char ch : str)
                freq[ch - 'a']++;
            mp[freq].push_back(str);
        }

        for(auto m : mp)
            res.push_back(m.second);

        return res;
    }
};

Complexity Analysis

  • Time Complexity: O(n * k) β†’ Where n is the number of strings and k is the maximum length of a string. We iterate through each character once for each string. No sorting involved!
  • Space Complexity: O(n * k) β†’ We store all strings in the map.

Why it’s optimal?

We eliminate the sorting step, reducing time complexity from O(n * k log k) to O(n * k). Character frequency counting is linear and much faster than sorting.

Comparison: Brute Force vs Optimized

ApproachTime ComplexitySpace ComplexityProsCons
SortingO(n * k log k)O(n * k)Simple to understand, works with strings directlySlower due to sorting overhead
Frequency ArrayO(n * k)O(n * k)Faster! Linear time per stringRequires understanding of frequency mapping

Why Frequency Array Approach is Most Optimal

  1. Linear character processing: We count frequencies in O(k) time instead of O(k log k) for sorting
  2. No sorting overhead: Avoids expensive comparison operations inherent in sorting
  3. Scalable: Better performance as string lengths increase
  4. Direct fingerprinting: Frequency arrays directly represent what makes strings anagrams
  5. Cache-friendly: Sequential array iteration is cache-efficient
  6. Predictable performance: No worst-case slowdown (unlike some sorting algorithms)

🚨 Important: Why Use map Instead of unordered_map?

The Problem with unordered_map<vector<int>, ...>

// ❌ This WILL NOT COMPILE
unordered_map<vector<int>, vector<string>> mp;
Reason: unordered_map requires a hash function for its keys. While vector<int> has comparison operators (needed for map), it doesn’t have a built-in hash function (needed for unordered_map).

Why map Works

// βœ… This COMPILES and WORKS
map<vector<int>, vector<string>> mp;
map is a tree-based map (implemented as a Red-Black Tree) that only needs comparison operators (<), which vector<int> has by default.

Comparison: map vs unordered_map

Featuremapunordered_map
Data StructureRed-Black TreeHash Table
Requiresoperator<Hash function + operator==
Lookup TimeO(log n)O(1) average
Key RequirementsAny comparable typeMust have hash function
Insertion TimeO(log n)O(1) average
Best for this problemβœ… Works immediately❌ Requires custom hash

Can We Use unordered_map? Yes, But It’s Complex

You’d need to define a custom hash function:
struct VectorHash {
    size_t operator()(const vector<int>& v) const {
        size_t hash = 0;
        for(int val : v) {
            hash ^= std::hash<int>{}(val) + 0x9e3779b9 + (hash << 6) + (hash >> 2);
        }
        return hash;
    }
};

unordered_map<vector<int>, vector<string>, VectorHash> mp;
Why we don’t do this: It adds unnecessary complexity. map works fine, and the O(log n) lookup vs O(1) lookup difference is negligible for this problem.

Rule of Thumb for Future Problems

  1. Can you use map? β†’ Check if your key type is comparable (has < operator)
    • If YES: Use map (simpler, always works)
    • If NO: Go to step 2
  2. Does your key type have a built-in hash? β†’ Basic types like int, string, long long have built-in hashes
    • If YES: Use unordered_map (faster)
    • If NO: Go to step 3
  3. Can you define a custom hash function? β†’ For complex types like vector, pair, array
    • If YES and it’s worth it: Use unordered_map with custom hash
    • If NO or too complex: Use map

Example Walkthroughs

Example 1: Basic Example

Input: ["eat","tea","tan","ate","nat","bat"] Using Frequency Array Approach:
Processing "eat":
  Frequency: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
             (a=1 at index 0, e=1 at index 4, t=1 at index 19)
  
Processing "tea":
  Frequency: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
             (same as "eat" - anagram!)
  
Processing "tan":
  Frequency: [1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0]
             (a=1 at index 0, n=1 at index 13, t=1 at index 19)
  
Processing "ate":
  Frequency: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
             (same as "eat" and "tea" - anagram!)
  
Processing "nat":
  Frequency: [1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0]
             (same as "tan" - anagram!)
  
Processing "bat":
  Frequency: [1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
             (a=1 at index 0, b=1 at index 1, t=1 at index 19)
             (only 'a', 'b', 't' - unique)

Map after processing:
  [a=1,e=1,t=1] β†’ ["eat", "tea", "ate"]
  [a=1,n=1,t=1] β†’ ["tan", "nat"]
  [a=1,b=1,t=1] β†’ ["bat"]
Output: [["eat","tea","ate"], ["tan","nat"], ["bat"]]

Example 2: Single Characters

Input: ["a","b","ba","bb","baa"] Using Frequency Array Approach:
"a":  [1,0,0,...] β†’ key represents single 'a'
"b":  [0,1,0,...] β†’ key represents single 'b'
"ba": [1,1,0,...] β†’ key represents one 'a' and one 'b'
"bb": [0,2,0,...] β†’ key represents two 'b's
"baa":[2,1,0,...] β†’ key represents two 'a's and one 'b'

Map groups:
  [1,0,0,...] β†’ ["a"]
  [0,1,0,...] β†’ ["b"]
  [1,1,0,...] β†’ ["ba"]
  [0,2,0,...] β†’ ["bb"]
  [2,1,0,...] β†’ ["baa"]
Output: [["a"], ["b"], ["ba"], ["bb"], ["baa"]]

Example 3: Identical Anagrams

Input: ["abc","bca","cab"] Using Frequency Array Approach:
"abc": [1,1,1,0,0,...] β†’ one 'a', one 'b', one 'c'
"bca": [1,1,1,0,0,...] β†’ same frequency (anagram!)
"cab": [1,1,1,0,0,...] β†’ same frequency (anagram!)

Map groups:
  [1,1,1,0,0,...] β†’ ["abc", "bca", "cab"]
Output: [["abc", "bca", "cab"]]

Key Takeaways

  • Anagrams have identical character frequencies: This is the core insight that makes the solution work.
  • Frequency counting beats sorting: O(nk) is faster than O(nk log k) for this problem.
  • Use map when key type is comparable: It’s simpler than implementing custom hash functions.
  • vector<int> works as a key in map: Because vectors are comparable by default.
  • vector<int> doesn’t work in unordered_map: Unless you define a custom hash function.
  • Linear traversal is often better than sorting: When you just need to count/group, avoid expensive sorting operations.
  • Understand data structure requirements: Different containers have different requirements for their key types.