Arrays
What is an Array?
An array is a collection of elements stored at contiguous memory locations, each identified by at least one index. It's the most fundamental data structure in computer science, forming the basis for more complex data structures.
Key Characteristics
- Contiguous Memory O(1): Elements are stored in adjacent memory locations
- Fixed Size: In traditional arrays, size is defined at creation
- Homogeneous Elements: All elements typically have the same data type
- Index-Based Access O(1): Elements can be directly accessed by their position
Types of Arrays
1. Static Arrays Fixed Size
Fixed-size arrays with size defined at compile time. Their size cannot be modified after creation.
2. Dynamic Arrays Resizable
Resizable arrays that can grow or shrink during program execution. They handle memory management automatically.
Memory Layout of Arrays
Arrays store elements in contiguous memory blocks, making them efficient for access but inflexible for dynamic operations.
How Arrays Work in Memory
- Base Address: Memory address of the first element
- Element Size: Each element occupies the same amount of memory
- Index Calculation O(1): The address of an element at index
i
is calculated as:address = base_address + (i * element_size)
Memory Allocation
When an array is created, the system allocates a continuous block of memory:
- Static Arrays: Memory is allocated at compile time and fixed
- Dynamic Arrays: Memory is allocated at runtime and can grow as needed
Memory Consideration O(n): Resizing a dynamic array often requires allocating a new, larger block of memory and copying all elements, which is an expensive O(n) operation.
Array Operations Complexity
Operation | Time Complexity | Space Complexity | Notes |
---|---|---|---|
Access | O(1) | O(1) | Direct indexing allows constant-time access |
Search (Unsorted) | O(n) | O(1) | Requires examining each element in worst case |
Search (Sorted) | O(log n) | O(1) | Binary search can be applied |
Insert (end) | O(1)* | O(1) | *O(1) on average for dynamic arrays |
Insert (position) | O(n) | O(1) | Requires shifting elements |
Delete (end) | O(1) | O(1) | Simple end pointer adjustment |
Delete (position) | O(n) | O(1) | Requires shifting elements |
Resize | O(n) | O(n) | Must copy elements to new array |
Optimization Tip: When possible, add and remove elements from the end of an array to avoid the O(n) operations required for insertions and deletions at arbitrary positions.
Accessing Elements O(1)
One of the key advantages of arrays is constant-time O(1) access to any element using its index.
Watch out: Accessing an index out of bounds will cause runtime errors in most languages. Always validate indices before access.
Traversing Arrays O(n)
Traversing is the process of visiting each element in an array exactly once, typically to perform some operation on each element.
Common Traversal Patterns
Performance Tip: For large arrays, a simple for loop with an index is often the most performant traversal method in most languages.
Inserting Elements
Insertion operations add new elements to an array, either at the end or at a specific position.
Insertion Operations
Efficiency Tip: Inserting at the end of an array is generally O(1) on average, while inserting at the beginning or middle is O(n) due to element shifting.
Deleting Elements
Deletion operations remove elements from an array, either from the end, beginning, or a specific position.
Deletion Operations
Performance Note: Deleting elements from the beginning or middle of an array requires shifting subsequent elements, resulting in O(n) time complexity.
Searching in Arrays
Searching operations locate specific elements within an array, with efficiency depending on whether the array is sorted.
Linear Search O(n)
Examines each element sequentially until the target is found or the end is reached. Works on both sorted and unsorted arrays.
Linear Search Time: O(n) | Space: O(1)
Binary Search O(log n)
A divide-and-conquer algorithm that works only on sorted arrays. It repeatedly divides the search interval in half, achieving logarithmic time complexity.
Binary Search Time: O(log n) | Space: O(1)
When to use: For sorted arrays, always prefer binary search (O(log n)) over linear search (O(n)). For small arrays (n < 20) or unsorted arrays, linear search might be more practical.
Binary Search Algorithm O(log n)
Binary Search is an efficient algorithm for finding a target value within a sorted array. It works by repeatedly dividing the search interval in half.
How Binary Search Works
The algorithm compares the target value to the middle element of the array:
- If the target matches the middle element, the search is successful
- If the target is less than the middle element, the search continues in the lower half
- If the target is greater than the middle element, the search continues in the upper half
This process continues until the target is found or the subarray is empty.
Implementation
Binary Search Time: O(log n) | Space: O(1)
When to use Binary Search:
- The collection must be sorted
- Random access is available (like in arrays)
- The collection is large enough that linear search would be inefficient
- The data doesn't change frequently (or can be re-sorted efficiently)
Time Complexity Analysis
- Best Case: O(1) - Target is at the middle position
- Average Case: O(log n) - Dividing search interval in half each time
- Worst Case: O(log n) - Target is at the extreme ends or not present
Common Mistakes:
- Calculating the middle index as
(left + right) / 2
can lead to integer overflow in some languages - Incorrectly defining the end condition (should be
left <= right
) - Using binary search on an unsorted array
Two Pointer Technique O(n)
A technique that uses two pointers to solve array problems efficiently, typically reducing time complexity from O(n²) to O(n).
Pointers start from opposite ends of the array and move toward each other.
Generic Template
Two Pointers (Opposite Directions) Time: O(n) | Space: O(1)
Common Problems
- Two Sum: Find a pair of numbers that add up to a target (in sorted array)
- Container With Most Water: Find two lines that together with x-axis form a container that holds the most water
- Palindrome Verification: Check if a string is a palindrome
- Reverse Array: Reverse an array in-place
- 3Sum/4Sum: Finding triplets/quadruplets that satisfy certain conditions
How to Identify: Consider using the two pointer technique when:
- You need to find a pair of elements in a sorted array that satisfy some condition
- You need to compare elements from both ends of an array
- The problem involves checking for palindromes or symmetry
- You're searching for a pair with a specific sum, product, or difference
- You need to find the smallest/largest value satisfying a condition
Both pointers move in the same direction but at different paces or with different conditions.
Generic Template
Two Pointers (Same Direction) Time: O(n) | Space: O(1)
Common Problems
- Remove Duplicates: Remove duplicates from a sorted array
- Move Zeroes: Move all zeroes to the end while maintaining the order of other elements
- Remove Element: Remove all instances of a given value from an array
- Merge Sorted Arrays: Merge two sorted arrays into one
How to Identify: Consider using same-direction two pointers when:
- You need to process elements sequentially while keeping track of a specific condition
- You're dealing with a sorted array where you need to remove or identify specific values
- You need to maintain two separate parts of an array (e.g., processed and unprocessed sections)
- A fast pointer needs to "scout ahead" while a slow pointer tracks a position of interest
- You need to detect cycles in linked lists (fast and slow pointers)
Sliding Window Technique O(n)
An efficient algorithmic technique used to process contiguous subarrays or sublists, typically converting O(n²) approaches to O(n).
Generic Template
Fixed Size Sliding Window Time: O(n) | Space: O(1)
Common Problems
- Maximum/Minimum Sum Subarray of Size K: Find the subarray of size K with the maximum/minimum sum
- Count Occurrences of Anagrams: Count occurrences of all anagrams of a pattern in a string
- Maximum of All Subarrays of Size K: Find maximum element in each subarray of size K
- First Negative Number in Every Window of Size K: Find the first negative integer in each sliding window
How to Identify: Consider using fixed-size sliding window when:
- You need to process subarrays/substrings of a specific size k
- You're asked to find maximum/minimum/average of all subarrays of size k
- You need to calculate something for every contiguous segment of fixed size
- The problem involves finding patterns of exact length k
Generic Template
Variable Size Sliding Window Time: O(n) | Space: O(1) or O(k)
Common Problems
- Minimum Size Subarray Sum: Find the minimum length subarray with a sum greater than or equal to target
- Longest Substring Without Repeating Characters: Find the length of the longest substring without repeating characters
- Longest Substring with K Distinct Characters: Find the longest substring with at most k distinct characters
- Fruits into Baskets: Collect maximum fruits from a tree where you can only have 2 types of fruits
How to Identify: Consider using variable-size sliding window when:
- You need to find a subarray that satisfies a specific condition (e.g., sum >= target)
- You need to find the longest/shortest subarray that meets a specific criterion
- You're working with problems involving substring constraints (like at most k distinct characters)
- The problem has a "minimizing" or "maximizing" aspect to continuous segments
- The problem involves some constraint that can be satisfied by expanding or contracting a window
Key Insight: In variable size windows, expand the window by adding elements until you satisfy a condition, then contract it by removing elements until the condition is no longer satisfied.
Prefix Sum Technique O(1) queries after O(n) preprocessing
A preprocessing technique that allows fast calculation of sum of elements in a given range. It precomputes cumulative sums to enable O(1) range sum queries.
How It Works
The prefix sum array prefix[i]
contains the sum of all elements from arr[0]
to arr[i]
. This allows us to compute the sum of any range [i, j]
as prefix[j] - prefix[i-1]
in O(1) time.
Generic Template
Prefix Sum Precomputation: O(n) | Query: O(1)
Common Problems
- Range Sum Queries: Find the sum of all elements in a given range
- Subarray Sum Equals K: Find number of subarrays with sum equal to K
- Maximum Subarray Sum: Find the subarray with the largest sum
- Number of Subarrays with Sum Divisible by K: Count subarrays with sum divisible by K
- Product of Array Except Self: Calculate product of all elements except the current one
How to Identify: Consider using prefix sums when:
- You need to calculate multiple range sums efficiently
- You're repeatedly querying for sums over different ranges of an array
- You need to find all subarrays with a specific sum
- The problem involves cumulative properties (sum, product, XOR, etc.) over ranges
- You see phrases like "find the number of subarrays that..." or "find the total of all subarrays that..."
Performance Tip: For multiple range sum queries, it's almost always more efficient to precompute the prefix sum once (O(n)) and then answer each query in O(1) time rather than computing each range sum directly (O(n) per query).
Kadane's Algorithm O(n)
A dynamic programming approach to find the maximum sum subarray in an array containing both positive and negative numbers, with O(n) time complexity.
How It Works
Kadane's algorithm scans the array from left to right, keeping track of the maximum sum subarray ending at each position. The key insight is that if the current running sum becomes negative, it's better to start a new subarray from the current position.
Generic Template
Kadane's Algorithm Time: O(n) | Space: O(1)
Variations and Extensions
- Circular Maximum Subarray Sum: Find maximum subarray sum in a circular array
- Maximum Product Subarray: Find the contiguous subarray that has the largest product
- Maximum Subarray Sum After One Deletion: Find maximum subarray sum after removing exactly one element
- 2D/Matrix Kadane's Algorithm: Extension to find maximum sum submatrix in a 2D array
How to Identify: Consider using Kadane's algorithm when:
- You need to find the maximum (or minimum) sum subarray
- You're dealing with problems asking for "maximum contiguous..." or "largest sum of contiguous..."
- The problem involves finding optimal subarray/substring with additivity property
- You need to find the best segment of data according to some criteria
- The problem can be solved by determining whether to extend the current segment or start a new one
Boyer-Moore Voting Algorithm O(n)
An efficient algorithm to find the majority element in an array (an element appearing more than n/2 times) with O(n) time complexity and O(1) space complexity.
How It Works
The algorithm uses a voting mechanism where we maintain a candidate and a count. We increment the count when we see the same element and decrement when we see a different element. If the count becomes zero, we choose a new candidate. The intuition is that the majority element will survive the "elimination" process.
Generic Template
Boyer-Moore Voting Algorithm Time: O(n) | Space: O(1)
Extensions
The algorithm can be extended to find multiple majority elements:
- Finding elements that appear more than n/3 times: Use two counters and two candidates (Moore's extended voting algorithm)
- Finding elements that appear more than n/k times: Use k-1 counters and candidates
How to Identify: Consider using Boyer-Moore Voting algorithm when:
- You need to find the majority element (element appearing more than n/2 times) in an array
- The problem asks for O(1) space complexity for finding dominant elements
- You need to identify elements with frequency above certain thresholds (n/2, n/3, etc.)
- The problem involves finding elements that "dominate" in some sense
- You see terms like "majority element," "dominant element," or "element appearing more than n/x times"
Building an ArrayList from Scratch
An ArrayList is a resizable array implementation that combines the random access efficiency of arrays with dynamic sizing capability.
Core Design Principles Dynamic Size
- Backing Array: Uses a standard array to store elements
- Capacity Management: Tracks both size (number of elements) and capacity (total available slots)
- Automatic Resizing: When array becomes full, creates a new larger array (typically double the size)
- Average Efficiency: Most operations have O(1) on average time complexity
Key Operations and Complexity
Operation | Description | Time Complexity |
---|---|---|
add(element) | Append element to end | O(1) on average |
add(index, element) | Insert at specific position | O(n) worst case |
get(index) | Access element by index | O(1) |
set(index, element) | Update element at position | O(1) |
remove(index) | Delete element by position | O(n) worst case |
size() | Get number of elements | O(1) |
Design Insight: The key to ArrayList's efficiency is its on average analysis. While resizing is an O(n) operation, it happens infrequently enough that the average cost per operation remains constant.
ArrayList Implementation
Let's implement a basic ArrayList from scratch to understand its internal workings.
Key Implementation Aspects
- Dynamic Resizing: The
ensureCapacity
method doubles the array size when needed - Efficient Shifting: Uses system array copy for fast bulk element movement when applicable
- Boundary Checks: Includes proper index validation to prevent out-of-bounds access
- Null Handling: Explicitly sets removed elements to null to aid garbage collection
Performance Insight: The most expensive operation is resizing, which has O(n) complexity. However, this happens infrequently (logarithmically with respect to the number of operations), leading to on average O(1) performance for appends.
ArrayList Operations Visualization
See how ArrayList operations work internally with this interactive visualization.
Implementation Note: The key to efficient ArrayList operations is balancing between excessive resizing (wasting memory) and too little resizing (wasting time copying elements). Most implementations double the size when capacity is reached, which provides a good balance.