Repeated DNA Sequences

Repeated DNA Sequences

问题:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

思路:

  位操作得到hash

我的代码:

public class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> rst = new ArrayList<String>();
        if(s == null || s.length() < 10) return rst;
        int len = s.length();
        Set<Integer> set = new HashSet<Integer>();
        for(int i = 0; i <= len-10; i++)
        {
            String substr = s.substring(i,i+10);
            Integer key = getHash(substr);
            if(set.contains(key))
            {
                if(!rst.contains(substr))
                    rst.add(substr);
            }
            else
            {
                set.add(key);
            }
        }
        return rst;
    }
    public int getCode(char c)
    {
        switch(c)
        {
            case 'A':   return 0;
            case 'C':   return 1;
            case 'G':   return 2;
            default:    return 3;
        }
    }
    public Integer getHash(String s)
    {
        int hash = 0;
        for(int i = 0; i < s.length(); i++)
        {
            hash = hash << 2 | getCode(s.charAt(i));
        }
        return hash;
    }
}
View Code

学习之处:

  • 平常写代码的时候很少用位操作,所以每次看到位操作的问题,都会很生疏,需要加强这方面的内容
  • 常用的位操作
    • 用与操作可以得到一个数中的每一位,n&(0x00000001<<i)
    • 用或操作去构建一个新的数出来,逆向构建:rst |= (0x80000000 >>> i) 正向构建:hash = hash << 2 | getCode(s.charAt(i));
原文地址:https://www.cnblogs.com/sunshisonghit/p/4355337.html