Top K Frequent Elements
🔗 NeetCode Problem:https://neetcode.io/problems/top-k-frequent-elements 🔗 LeetCode Problem:
https://leetcode.com/problems/top-k-frequent-elements
Problem Statement
Given an integer arraynums and an integer k, return the k most frequent elements. You may return the answer in any order.
Example:
- Input:
nums = [1,1,1,2,2,3],k = 2 - Output:
[1,2]
Approach 1: Brute Force
Idea
Count the frequency of each element using a hash map, then repeatedly find the element with the maximum frequency, add it to the result, and remove it from the map. Repeat this k times.Pseudocode
C++ Implementation
Complexity Analysis
- Time Complexity: O(n + k*m) → O(n) to build frequency map, then O(km) where m is the number of unique elements. In worst case, this becomes O(nk).
- Space Complexity: O(m) → m is the number of unique elements (at most n).
Why it’s not optimal?
This approach repeatedly scans the entire frequency map k times, making it inefficient. Each time we search for the maximum frequency element, we iterate through all remaining unique elements. For large k or many unique elements, this becomes slow.Optimized Approach 1: Max Heap
Idea
Use a max heap (priority queue) to automatically maintain elements in order of frequency. Push all frequency-element pairs into the max heap, then pop k times to get the k most frequent elements.Key Insight
A heap allows us to extract the maximum element in O(log m) time instead of O(m) time, reducing the overall complexity significantly.How it works:
- Build a frequency map by counting occurrences of each element - O(n)
- Insert all frequency-element pairs into a max heap (pairs are compared by frequency first)
- Pop k elements from the heap - each pop is O(log m)
- Return the popped elements
C++ Implementation
Complexity Analysis
- Time Complexity: O(n + mlog m + klog m) → O(n) to build frequency map, O(mlog m) to build heap, O(klog m) to extract k elements.
- Space Complexity: O(m) → space for frequency map and heap.
Why it’s optimal?
This approach is very efficient and straightforward. Extracting from a heap is O(log m) which is much better than scanning the entire map. This is a clean and commonly used solution.Optimized Approach 2: Min Heap (Most Optimal)
Idea
Use a min heap of size k instead of a max heap. As we iterate through frequencies, maintain only the k largest elements in the heap. This way we only store k elements in memory and achieve better space efficiency.Key Insight
We don’t need to store all m elements in the heap. By using a min heap and keeping only k elements, we maintain O(k) space instead of O(m). When heap size exceeds k, we remove the smallest frequency element. At the end, the heap contains exactly the k most frequent elements.How it works:
- Build a frequency map by counting occurrences - O(n)
- Create a min heap (compare by frequency)
- For each unique element:
- Push the frequency-element pair into the min heap - O(log k)
- If heap size exceeds k, pop the minimum element - O(log k)
- All remaining elements in the heap are the k most frequent
C++ Implementation
Complexity Analysis
- Time Complexity: O(n + m*log k) → O(n) to build frequency map, O(m*log k) to maintain heap of size k as we iterate through m unique elements.
- Space Complexity: O(k) → heap stores at most k elements.
Why it’s optimal?
This is superior to max heap when k is much smaller than m (the number of unique elements). We only maintain k elements in the heap, making it very space-efficient. The heap operations are O(log k) instead of O(log m).Optimized Approach 3: Bucket Sort (Most Optimal for Many Duplicates)
Idea
Create buckets indexed by frequency (0 to n). Place each element into the bucket corresponding to its frequency. Then iterate from the highest frequency bucket backwards, collecting elements until we have k elements.Key Insight
Since frequencies range from 1 to n, we can use frequency as a direct index rather than comparing elements. This avoids heap overhead and achieves linear time complexity.How it works:
- Build a frequency map - O(n)
- Create n+1 buckets (bucket[i] stores all elements with frequency i)
- Iterate through frequency map and place each element in its corresponding frequency bucket - O(m)
- Iterate from the last bucket backwards and collect elements until we have k - O(n)
C++ Implementation
Complexity Analysis
- Time Complexity: O(n) → O(n) to build frequency map, O(n) to fill buckets, O(n) to collect k elements from buckets. Total is O(n).
- Space Complexity: O(n) → bucket array of size n+1.
Why it’s optimal?
This is the most optimal approach overall. It achieves linear time complexity which is the best possible since we must read all n elements. No comparison-based sorting or heap operations are needed. Perfect for when we need the absolute best performance.Comparison: All Approaches
| Approach | Time Complexity | Space Complexity | Pros | Cons |
|---|---|---|---|---|
| Brute Force | O(n*k) | O(m) | Simple logic | Very slow for large k; inefficient |
| Max Heap | O(n + m*log m) | O(m) | Easy to understand; works well | Stores all m elements in heap |
| Min Heap | O(n + m*log k) | O(k) | Better when k is small; space efficient | Still uses logarithmic operations |
| Bucket Sort | O(n) | O(n) | Linear time; no comparisons needed | Uses O(n) extra space for buckets |
Why Bucket Sort is Most Optimal
- Linear Time Complexity: O(n) is the theoretical best we can achieve since we must read all n elements. No algorithm can do better.
- No Comparison Operations: Unlike heap-based approaches that rely on comparisons, bucket sort uses frequency as a direct index. This is faster in practice.
- Cache Efficient: Sequential iteration through buckets has better cache locality than random heap operations.
- Predictable Performance: No worst-case degradation like heaps can have. Always O(n) regardless of input distribution.
- Space-Time Tradeoff Justified: The O(n) space is worth it for O(n) time, which is optimal.
- Scalability: As n grows, bucket sort remains the clear winner. Other approaches’ logarithmic factors become more significant.
Example Walkthroughs
Example 1: nums = [1,1,1,2,2,3], k = 2
Using Bucket Sort:Example 2: nums = [4,1,1,1,2,2,3], k = 1
Using Bucket Sort:Example 3: nums = [1,2], k = 2
Using Bucket Sort:Key Takeaways
- Know Your Trade-offs: Brute force is simple but slow. Heaps are a good middle ground. Bucket sort is optimal.
- Min Heap Advantage: Use when k is much smaller than the number of unique elements. Better space complexity O(k) than max heap.
- Bucket Sort Power: When frequency values are bounded (1 to n), bucket sort achieves linear time with no comparisons. This is the theoretical best.
- Always Count Frequencies First: All approaches start with frequency counting. This is unavoidable and necessary.
- Think About Constraints: If k is small, min heap is practical. If you need absolute best performance, use bucket sort. If simplicity matters, use max heap.
- Real-World Choice: For interviews and production, bucket sort is impressive and optimal. It demonstrates deep algorithmic thinking beyond standard data structures.