Site Search:

find the only repeating element from an array


Problem:

Put integer 1 to 1,000,000 into an array of size 1,000,001. There is only one repeating element, the other elements only appear once. How to find that repeating element.

Analyze:
Approach 1:
We can create a HashMap, then iterate the array and fill the HashMap. The HashMap's key is the integer 1 to 1,000,000, the HashMap's value is how many times they appear in the array. Finally, the integer with value 2 in the HashMap is the repeating element. For example, giving array {4, 2, 1, 3, 5, 1}, you got the following HashMap:
4, 1
2, 1
1, 2
3, 1
5, 1
The number 1 appears 2 times, thus has value 2 in the HashMap, therefore is the repeating element.

This approach need to visit each element, the time complexity is O(1), we need a HashMap to store all the elements, the space complexity is O(N) as well.

Approach 2:
A better approach is to utilize the math. Let's add all the elements in the array together, the sum is Sum1. Then we add all the integer 1, 2, 3, 4, 5...,1,000,000 together, the sum is Sum0. The repeating element is Sum1 - Sum0. For example, your array is {4, 2, 1, 3, 5, 1}. The Sum1 = 4 + 2 + 1 + 3 + 5 + 1 = 16, the Sum0 = 1 + 2 + 3 + 4 + 5 = (1 + 5) * 5 / 2 = 15, the repeating element is Sum1 - Sum0 = 1.

This approach need to visit each element once, the time complexity is O(N). We only have to store Sum0 and Sum1, the space complexity is O(1). 

When the array size is N, the Sum0 is (1 + N) * N / 2  which is dominated by N square. If N is large, N square can easily overflow the Integer.MAX_VALUE or Long.MAX_VALUE. We need a better mathematical tool than this.

Approach 3:
Xor operator returns true if one — and only one — of the operands evaluates to true. Returns false if both operands evaluate to true or if both operands evaluate to false.

1 ^ 0 = 1
0 ^ 1 = 1
1 ^ 1 = 0
0 ^ 0 = 0

So when an integer is Xor to itself, the result will be 0; when an integer is Xor to 0, the result is the integer itself.

We can use this property to solve the problem: iterate the array and the numbers from 1 to 1,000,000, apply the Xor operator to each of them, the final result is the repeating element. For example, giving array {4, 2, 1, 3, 5, 1}, we can apply Xor operator (4^2^1^3^5^1)^1^2^3^4^5 = 1, 1 is the repeating element.

The time complexity is O(N) and the space complexity is O(1).
Besides, 1,000,000 = 0b11110100001001000000, the Xor operator at worst can produce a number 0b11111111111111111111 = 1,048,575, which won't overflow the Integer.MAX_VALUE.


Xor operator is a handy tool to cherry picking special numbers in an array.
The above problem is almost identical to the following problem.

Problem:
Put integer 1 to 999,999 into an array of size 1,000,000. There is only one missing element, the other elements only appear once. How to find that missing element.


A harder one is the following:

Problem:

There are N + 2 integer in an array. The N integers appeared even number of times, the other 2 integers are different, and they each appeared once. Find the 2 integers that appeared once. For example, we have array [2, 3, 5, 5, 4, 4], find 2 and 3.

analyze:
Once we apply xor to the array elements, the integers that appeared even number of times canceled out. We finally get 2 ^ 3 = 0b01 = 1.
2        10
3        11
1        01
Question is how to tell which is which from the result of a ^ b = c?
Notice c ^ b = a and c ^ a = b.
If we can manage to divide the array into 2 groups, group0 contains a, group1 contains b. Then we can have c xor elements in group1, those integers that appeared even number of times canceled out, we reduce the xor to c ^ b, which is a.

How to divide the array into 2 groups?
For example, the result of a ^ b = 2 ^ 3 = 0b01, means a and b must be different at bit 0, otherwise their xor result can not have bit 0 be 1. By the value at bit 0, we can divide the array into 2 groups.

group0 have bit 0's value 0: 2, 4, 4            10, 100, 100
group1 have bit 0's value 1: 3, 5, 5             11, 101, 101

c ^ group1 = (a ^ b) ^ (b ^ 5 ^ 5) = a

finally:
c ^ a = b

we found both a and b.

The above problem is almost identical to the following problem:

Problem:

There are N + 2 integer in an array. The N integers appeared even number of times, the other 2 integers are different, and they each appeared odd number of times. Find the 2 integers that appeared odd number of times. For example, we have array [2, 3, 2, 5, 2, 5, 4, 4], find 2 and 3.


A slightly different problem is the following one.

Problem:

There are  N + 3 integers in an array. The N integers appeared even number of times, the other 3 integers are different, and they each appeared only once. Find the 3 integers. For example, we have array [2, 3, 6, 5, 5, 4, 4], find 2, 3, 6

analyze:
Those 3 integers a, b, c are different, so we can manage to group them with a certain bit's. The integer a goes to group1 because the bit value is 1, integers b and c go to group0 because the bit value is 0.

2       010
3       011
6       110

For example, if we group the 3 integers with bit 0's value, integer a (3) goes to group1, while integer b and c (2 and 6) go to group0. The group1 has only one integer a we are looking for, even though we mix it with some integers that appear even number of times, we can easily get that integer with xor operator. (a ^ 5 ^ 5) = a

Actually, we group the 3 integers with bit 0's value, we can group the whole array with bit 0's value as well. Group0 will have even number of elements, while group1 will have odd number of elements.

group0 have bit 0's value 0: 2, 4, 4, 6            10, 100, 100, 110
group1 have bit 0's value 1: 3, 5, 5                11, 101, 101

We can apply xor to group1, the result will be a: (a ^ 5 ^ 5) = (3 ^ 5 ^ 5) = 3 =a

In order to find the value of b, c from group0, we can apply the technique used in the previous problem.

So we are able to find all values of a, b and c.

Last but not least, how do we know a b and c differ at bit 0?

b        2       010
a        3       011
c        6       110
d        7       111
we can first xor them, d = a ^ b ^ c = 2 ^ 3 ^ 6 = 0b111=7. The fact that d's certain bit is 1 can mean two things.

  1. case one, one integer has that bit's value 1, the other 2 integers have that bit's value 0,  0 ^ 1 ^ 0 = 1.
  2. case two, all 3 integers have that bit's value 1, 1 ^ 1 ^ 1 = 1 as well.
We can differentiate the 2 cased by an extra test.

For case 1, we choose to use bit 0's value to group the array, a (3) goes to group1, while b and c (2 and 6) go to group0. We got groups: 
group0 have bit 0's value 0: 2, 4, 4, 6            10, 100, 100, 110
group1 have bit 0's value 1: 3, 5, 5                11, 101, 101

apply xor to group1 we found a, apply xor to group0, we found b^c, which won't be 0.

For case 2, if we choose to use bit 1's value to group the array, a, b and c (2, 3 and 6) all go to group1. We got groups: 
group0 have bit 1's value 0: 4, 4, 5, 5        100, 100, 101, 101
group1 have bit 1's value 1: 2, 3, 6            010, 011, 110

apply xor to group1 gave us wrong answer, however apply xor to group0 gave us 0 as well, we can detect this case with xor(group0) == 0, then throw away the wrong answer.

The above problem is almost identical to the following problems:

Problem:

There are  N + 3 integers in an array. The N integers appeared even number of times, the other 3 integers are different, and they each appeared odd number of times. Find the 3 integers. For example, we have array [2, 3, 6, 3, 5, 3, 5, 4, 4], find 2, 3, 6

Problem:

There are  a few million integers in an array. All of the integers repeated 2 times except 3 integers. These 3 different integers only appeared once. Design an algorithm with time complexity O(N) and space complexity O(1) to find the 3 integers that only appeared once. For example, we have array [1, 1, 2, 3, 5, 5, 6, 4, 4, 7, 7...3000000, 3000000], find 2, 3, 6.


Using mathematics, your solution will appear to be very clever.

Problem:


Rotate an array N position. For example, given array [1, 2, 3, 4, 5, 6, 7], rotate it 3 position, it becomes [4, 5, 6, 7, 1, 2, 3]

approach 1:
A straightforward solution will be copy the last N numbers into a new array, then copy the original array's element N positions left, then copy the N numbers back into the original array. For example, copy 1, 2, 3 into a new array, then copy the elements in the original array leftwards, so it becomes [4, 5, 6, 7, 5, 6, 7], finally copy the 3 numbers from the new array back to the original array, therefore get [4, 5, 6, 7, 1, 2, 3].

This solution need an extra array, it also need to copy each element once plus copy the N numbers back.

approach 2:
a better solution will be to divide the array elements into 2 groups:
(1, 2, 3) and (4, 5, 6, 7)
then we rotate each group, in order to get:
(3, 2, 1) and (7, 6, 5, 4)
finally, we combine the 2 groups into one group
(3, 2, 1, 7, 6, 5, 4), then rotate it, the final result is:
(4, 5, 6, 7, 1, 2, 3)
which met our goal.

This solution have time complexity O(N) and space complexity O(1).

You have to be very clever in order to notice this trick. If you are not that clever, you'd better spent a few school years studying mathematics. The linear algebra tells us, for vector X and Y, we can define rotate operator I, applying I twice, we got the original element, applying I to two elements, we got their rotate.




The solution's cleverness actually came from applying a text book conclusion.