C#中使用regular expression去掉html tags (转载)

private string StripHTML(string htmlString)

{

//This pattern Matches everything found inside html tags;

//(.|\n) - > Look for any character or a new line

// *? -> 0 or more occurences, and make a non-greedy search meaning

//That the match will stop at the first available '>' it sees, and not at the last one

//(if it stopped at the last one we could have overlooked

//nested HTML tags inside a bigger HTML tag..)

// Thanks to Oisin and Hugh Brown for helping on this one...

string pattern = @"<(.|\n)*?>";

return Regex.Replace(htmlString,pattern,string.Empty);

}

Or with just one line of code:

string stripped = Regex.Replace(textBox1.Text,@"<(.|\n)*?>",string.Empty);