TopCoder真题讲解之二

前几天写了一篇关于TopCoder上的习题的问题+代码+简评的文章,现在开始写该系列的第二篇文章.
注意,这个系列会有三篇文章,是以前我做过的一次完整的SRM的套题,有250分题,500分题和1000分题,本篇文章将介绍500分题.之后我将不定期抽时间去参加最新的SRM,然后把题目和解答奉上:)

题目来源:SRM250 DIV2
级别:500Points
题目:
Problem Statement:
Problem Statement
    
For computers it can be hard to determine in which language a given text is written. A simple way to try to determine the language is the following: for the given text and for some sample texts, for which we know the languages, we determine the letter frequencies and compare these.
The frequency of a letter is the total number of occurrences of that letter divided by the total number of letters in the text. To determine this, we ignore case and non-letter characters.
Once the letter frequencies of the text and of a language are known, we can calculate the difference between the two. This difference we define by the sum of the squared differences of the frequencies:
 
The lesser this value, the closer text resembles that language. Compare text with each element of languages and return the (0-based) index of the language that has the smallest difference with text. In case of a tie, return the smallest index.
Definition
    
Class:
LanguageRecognition
Method:
whichLanguage
Parameters:
vector <string>, string
Returns:
int
Method signature:
int whichLanguage(vector <string> languages, string text)
(be sure your method is public)
    

Constraints
-
languages contains between 1 and 50 elements, inclusive.
-
Each element of languages has length between 1 and 50, inclusive.
-
text has length between 1 and 50, inclusive.
-
Each element of languages and text consists only of characters with ASCII value between 32 and 127, inclusive.
-
Each element of languages and text contains at least one letter ('A'-'Z' and 'a'-'z').
Examples
0)

    
{"This is an English sentence.",
 "Dieser ist ein Deutscher Satz.",
 "C'est une phrase Francaise.",
 "Dit is een Nederlandse zin."
}
"In welke taal is deze zin geschreven?"
Returns: 3
The differences are 0.0385, 0.0377, 0.0430 and 0.0276, so the sentence is written in language 3, Dutch. Note that Dutch is somewhat similar to German, somewhat less similar to English and not similar to French.
1)

    
{"aaaaa","bbbb","ccc","dd","e"}
"xxx"
Returns: 0
In case of a tie, return the language with the smallest index.
2)

    
{"AABB","AaBb","A? B!","ab!@#$%"}
"ab"
Returns: 0
Ignore case and the non-letter characters.
/*
分析:在TopCoder或者ACM做题,最重要的是要看懂题目:(,这时,程序员才能真正的感受到英文的重要,因为不光是要看懂,更重要的是因为比赛是限时的,还需要看的快.本题的大致意思是要做个语言识别,当然实际上需要你做的可远没这个题目名字吓人.具体描述是,给你几个句子作为参选句子,然后给你一个句子作为需要识别的句子.你要选出这个句子是属于上述句子中哪种类型的.判断的依据就是根据字母出现的频率来判断,把各个字母出现频率和减去上面每个句子的频率和,值最小的,就是最接近的,则把该句子索引选出.这道题其实不难,关键在于理清思路,把需要实现的功能划分成子函数来实现,最后就可以较轻松的完成.我的解题代码如下:
*/

#include <iostream>
#include 
<string>
#include 
<stdio.h>
#include 
<vector>
#include 
<set>
#include 
<map>
#include 
<algorithm>

using namespace std;

typedef vector
<float> VEC_FLT;
class LanguageRecognition
{
public:
    
int lanNum;
    VEC_FLT difs;
public:
    
int whichLanguage(vector <string> languages, string text)
    
{
        
int mostone = 0;
        
float smallest = 100;
        
float tmp;
        VEC_FLT textFren;
        VEC_FLT curFren;
        textFren 
= GetVecOfString(text);
        
        
for(int i=0;i<languages.size();i++)
        
{
            curFren 
= GetVecOfString(languages[i]);
            tmp 
= GetDifferences(curFren,textFren);
            cout
<<tmp<<endl;
            
if(tmp<smallest) //如果更小,则换
            {
                smallest 
= tmp;
                mostone 
= i;
            }

        }

        
return mostone;
        
    }


    
float GetDifferences(VEC_FLT f1,VEC_FLT f2)
    
{
        
float returnValue = 0;

        
for(int i=0;i<26;i++)
        
{
            returnValue
+= (f1[i]-f2[i])*(f1[i]-f2[i]);
        }

        
return returnValue;
    }

    
    VEC_FLT GetVecOfString(
string str)
    
{
        VEC_FLT returnFlt(
26,0);
        vector
<int> letters(26,0);
        
int totalletters = 0;
        
for(int i=0;i<str.size();i++//首先统计有效字母的总个数和分别数量
        {
            
if(str[i]>=97&&str[i]<=122//为非大写的字母才统计
            {
                letters[str[i]
-97]++;
                totalletters
++;
            }

            
else if(str[i]>=65&&str[i]<=90//为非小写的字母才统计
            {
                letters[str[i]
-65]++;
                totalletters
++;
            }

        }


        
        
if(totalletters!=0)
        
{
            
for(int i=0;i<26;i++//生成频率
            {
                returnFlt[i] 
= letters[i]/(float)totalletters;
            }

        }

        
return returnFlt;
    }

}
;

呵呵,是不是也不难呢?所以说,高分的题也不一定难,关键还是要理清思路,才能快速的解决.
原文地址:https://www.cnblogs.com/wuxilin/p/362933.html