Thursday, August 15, 2013

Find First Non-Repeating Character in A String

Find First Unique Character in A String

Given a string, find the first non-repeating character in the string.

For example, given "geeksforgeeks", return 'f'.

Solution

A brute-force solution is to loop through the string and for each letter, check remaining letters to see whether it has duplicates. The average and worst-case running times are both O(n^2) which is bad.

Another solution is to use a HashMap to track the occurrence of each letter.
We iterate through the string twice: the first one is to compute the occurrences; and the second one is to find the first letter with occurrence = 1.
 public static Character findFirstUnique(String s) {  
   // map: char -- occurrence  
   HashMap<Character, Integer> map = new HashMap<Character, Integer>();  

   // loop 1: compute occurrence  
   for (int i=0; i<s.length(); ++i) {  
     char c = s.charAt(i);  
     if (map.containsKey(c)) {  
       map.put(c, map.get(c)+1);  
     } else {  
       map.put(c, 1);  
     }  
   }  

   // loop 2: find the first unique  
   for (int i=0; i<s.length(); ++i) {  
     char c = s.charAt(i);  
     if (map.get(c) == 1) return c;  
   }  

   return null;  
 }  
This algorithm runs in time O(2n)=O(n) and takes O(n) spaces.

Do we have to iterate through the entire string twice? For a string like "aaaaaaaaeeeeeeeeeeddddddddf", if we can avoid revisiting all of the duplicates, the running time will be improved (still the same order though).

So, ideally, besides the hashmap for occurrences, if we could have a data structure to store the known unique letters in order and remove ones that have at least one duplicates, then at the end of the first loop the first element is what we are looking for.

What we need for the data structure are:
  • O(1) time for operations like insert, remove, contains
  • elements are in insertion order
LinkedHashSet seems to be a good option here. LinkedHashSet can be thought as HashSet plus Doubly Linked List.
 import java.util.LinkedHashSet;  
   
 public Character findFirstUnique(String s) {  
   // map: char -- occurrence  
   HashMap<Character, Integer> map = new HashMap<Character, Integer>();  
   LinkedHashSet<Character> set = new LinkedHashSet<Character>(s.length());  

   // loop: compute occurrence  
   for (int i=0; i<s.length(); ++i) {  
     char c = s.charAt(i);  
     if (map.containsKey(c)) {  
       set.remove(c);  
     } else {  
       map.put(c, 1);  
       set.add(c);  
     }  
   }  

   // find the first unique in the ordered set  
   Iterator<Character> it = set.iterator();  
   if (it.hasNext()) return it.next();  
   return null;  
 }  
This algorithm runs in time O(n) and takes O(2n)=O(n) spaces. Note that it increases the complexity of implementation and maintaining two hash table also requires some extra time.

Is the O(n) time optimal?
Yes. Even to verify whether a given character is unique in the string, it requires to iterate through the entire string and compare each letter, which is O(n) time.

No comments:

Post a Comment