Group Anagrams
π NeetCode Problem:https://neetcode.io/problems/anagram-groups π LeetCode Problem:
https://leetcode.com/problems/group-anagrams/
Problem Statement
Given an array of stringsstrs, group the anagrams together. You can return the answer in any order.
An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.
Example:
- Input:
["eat","tea","tan","ate","nat","bat"] - Output:
[["bat"],["nat","tan"],["ate","eat","tea"]]
Approach 1: Brute Force (Sorting)
Idea
Sort each string and use the sorted version as a key. All anagrams will have the same sorted representation, so we can group them together in a hash map.Pseudocode
C++ Implementation
Complexity Analysis
- Time Complexity: O(n * k log k) β Where
nis the number of strings andkis the maximum length of a string. Each string is sorted which takesO(k log k), and we do this for allnstrings. - Space Complexity: O(n * k) β We store all strings in the hash map.
Why itβs not optimal?
Sorting each string takesO(k log k) time. If strings are very long, this becomes the bottleneck. We can do better by counting character frequencies instead of sorting.
Optimized Approach: Character Frequency Counting
Idea
Instead of sorting, count the frequency of each character (a-z) in every string. Use this frequency array as the key. All anagrams will have identical frequency arrays.Key Insight
Sorting is unnecessary! Two strings are anagrams if and only if they have the same character frequencies. A frequency array is a direct fingerprint of a stringβs composition and is faster to compute than sorting.How it works:
- Create a frequency array of size 26 (for a-z)
- For each string, count character frequencies
- Use the frequency array as the key in a map
- Group strings by their frequency fingerprint
- Extract all groups from the map
C++ Implementation
Complexity Analysis
- Time Complexity: O(n * k) β Where
nis the number of strings andkis the maximum length of a string. We iterate through each character once for each string. No sorting involved! - Space Complexity: O(n * k) β We store all strings in the map.
Why itβs optimal?
We eliminate the sorting step, reducing time complexity fromO(n * k log k) to O(n * k). Character frequency counting is linear and much faster than sorting.
Comparison: Brute Force vs Optimized
| Approach | Time Complexity | Space Complexity | Pros | Cons |
|---|---|---|---|---|
| Sorting | O(n * k log k) | O(n * k) | Simple to understand, works with strings directly | Slower due to sorting overhead |
| Frequency Array | O(n * k) | O(n * k) | Faster! Linear time per string | Requires understanding of frequency mapping |
Why Frequency Array Approach is Most Optimal
- Linear character processing: We count frequencies in O(k) time instead of O(k log k) for sorting
- No sorting overhead: Avoids expensive comparison operations inherent in sorting
- Scalable: Better performance as string lengths increase
- Direct fingerprinting: Frequency arrays directly represent what makes strings anagrams
- Cache-friendly: Sequential array iteration is cache-efficient
- Predictable performance: No worst-case slowdown (unlike some sorting algorithms)
π¨ Important: Why Use map Instead of unordered_map?
The Problem with unordered_map<vector<int>, ...>
unordered_map requires a hash function for its keys. While vector<int> has comparison operators (needed for map), it doesnβt have a built-in hash function (needed for unordered_map).
Why map Works
map is a tree-based map (implemented as a Red-Black Tree) that only needs comparison operators (<), which vector<int> has by default.
Comparison: map vs unordered_map
| Feature | map | unordered_map |
|---|---|---|
| Data Structure | Red-Black Tree | Hash Table |
| Requires | operator< | Hash function + operator== |
| Lookup Time | O(log n) | O(1) average |
| Key Requirements | Any comparable type | Must have hash function |
| Insertion Time | O(log n) | O(1) average |
| Best for this problem | β Works immediately | β Requires custom hash |
Can We Use unordered_map? Yes, But Itβs Complex
Youβd need to define a custom hash function:
map works fine, and the O(log n) lookup vs O(1) lookup difference is negligible for this problem.
Rule of Thumb for Future Problems
-
Can you use
map? β Check if your key type is comparable (has<operator)- If YES: Use
map(simpler, always works) - If NO: Go to step 2
- If YES: Use
-
Does your key type have a built-in hash? β Basic types like
int,string,long longhave built-in hashes- If YES: Use
unordered_map(faster) - If NO: Go to step 3
- If YES: Use
-
Can you define a custom hash function? β For complex types like
vector,pair,array- If YES and itβs worth it: Use
unordered_mapwith custom hash - If NO or too complex: Use
map
- If YES and itβs worth it: Use
Example Walkthroughs
Example 1: Basic Example
Input:["eat","tea","tan","ate","nat","bat"]
Using Frequency Array Approach:
[["eat","tea","ate"], ["tan","nat"], ["bat"]]
Example 2: Single Characters
Input:["a","b","ba","bb","baa"]
Using Frequency Array Approach:
[["a"], ["b"], ["ba"], ["bb"], ["baa"]]
Example 3: Identical Anagrams
Input:["abc","bca","cab"]
Using Frequency Array Approach:
[["abc", "bca", "cab"]]
Key Takeaways
- Anagrams have identical character frequencies: This is the core insight that makes the solution work.
- Frequency counting beats sorting: O(nk) is faster than O(nk log k) for this problem.
- Use
mapwhen key type is comparable: Itβs simpler than implementing custom hash functions. vector<int>works as a key inmap: Because vectors are comparable by default.vector<int>doesnβt work inunordered_map: Unless you define a custom hash function.- Linear traversal is often better than sorting: When you just need to count/group, avoid expensive sorting operations.
- Understand data structure requirements: Different containers have different requirements for their key types.