163邮件分析

其中一个:

POST /jy6/xhr/compose/compose.do?action=DELIVER&sid=ECAZYLPuiDZLwdoZkZuuCKcpMsNTceaZ HTTP/1.1

Accept: application/json, text/javascript, */*; q=0.01

Content-Type: application/json

邮件body部分:

{"composeId":"c:1445749068279","priority":3,"saveSentCopy":0,"subject":"test......","html":true,"account":""..."<tips1057@163.com>","to":""tips1057@sohu.com"<tips1057@sohu.com>","cc":"","bcc":"","content":"<div style="color: rgb(0, 0, 0); font-family: arial; font-size: 14px;"><div>xxxxxxxxxx</div><div></div><div></div><div></div></div><!-- jy5ContentSuffix -->","attachments":[],"requestReadReceipt":false}

另一封:

POST /js6/s?sid=UCWtLGQFLvIEoQSctGFFcKpWiwuDvCkq&func=mbox:compose&cl_send=1&l=compose&action=deliver HTTP/1.1

Host: mail.163.com

Connection: keep-alive

Content-Length: 1249

Accept: text/javascript

Body部分:

var=%3C%3Fxml%20version%3D%221.0%22%3F%3E%3Cobject%3E%3Cstring%20name%3D%22id%22%3Ec%3A1447060124093%3C%2Fstring%3E%3Cobject%20name%3D%22attrs%22%3E%3Cstring%20name%3D%22account%22%3E%22%E6%9D%8E%22%26lt%3Btips1057%40163.com%26gt%3B%3C%2Fstring%3E%3Cboolean%20name%3D%22showOneRcpt%22%3Efalse%3C%2Fboolean%3E%3Carray%20name%3D%22to%22%3E%3Cstring%3E%22tips1057%40126.com%22%26lt%3Btips1057%40126.com%26gt%3B%3C%2Fstring%3E%3C%2Farray%3E%3Carray%20name%3D%22cc%22%2F%3E%3Carray%20name%3D%22bcc%22%2F%3E%3Cstring%20name%3D%22subject%22%3E%E4%B8%A4%E4%B8%AA%E9%99%84%E4%BB%B6%3C%2Fstring%3E%3Cboolean%20name%3D%22isHtml%22%3Etrue%3C%2Fboolean%3E%3Cstring%20name%3D%22content%22%3E%26lt%3Bdiv%20style%3D%22line-height%3A1.7%3Bcolor%3A%23000000%3Bfont-size%3A14px%3Bfont-family%3AArial%22%26gt%3B%E6%B5%8B%E8%AF%95%E4%B8%A4%E4%B8%AA%E9%99%84%E4%BB%B6%26lt%3B%2Fdiv%26gt%3B%3C%2Fstring%3E%3Cint%20name%3D%22priority%22%3E3%3C%2Fint%3E%3Cboolean%20name%3D%22saveSentCopy%22%3Etrue%3C%2Fboolean%3E%3Cstring%20name%3D%22charset%22%3EGBK%3C%2Fstring%3E%3C%2Fobject%3E%3Cboolean%20name%3D%22returnInfo%22%3Efalse%3C%2Fboolean%3E%3Cstring%20name%3D%22action%22%3Edeliver%3C%2Fstring%3E%3Cint%20name%3D%22saveTextThreshold%22%3E1048576%3C%2Fint%3E%3C%2Fobject%3E

url解码后:

<?xml version="1.0"?><object><string name="id">c:1447059862152</string><object name="attrs"><string name="account">"李"&lt;tips1057@163.com&gt;</string><boolean name="showOneRcpt">false</boolean><array name="to"><string>"tips1057@126.com"&lt;tips1057@126.com&gt;</string></array><array name="cc"/><array name="bcc"/><string name="subject">特使</string><boolean name="isHtml">true</boolean><string name="content">&lt;div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"&gt;特使的不好使用&lt;/div&gt;</string><int name="priority">3</int><boolean name="saveSentCopy">true</boolean><string name="charset">GBK</string></object><boolean name="returnInfo">false</boolean><string name="action">deliver</string><int name="saveTextThreshold">1048576</int></object>

Html将一些特殊字符(Html语法字符)的一种表达方式。

下面列举几个常用字符:

&nbsp;  空格

&amp;   &

&lt;       <

&gt;      >

&quot;   "

&qpos;   '

对&lt;和&gt;处理,得到:

<?xml version="1.0"?><object><string name="id">c:1447059862152</string><object name="attrs"><string name="account">"李"<tips1057@163.com></string><boolean name="showOneRcpt">false</boolean><array name="to"><string>"tips1057@126.com"<tips1057@126.com></string></array><array name="cc"/><array name="bcc"/><string name="subject">特使</string><boolean name="isHtml">true</boolean><string name="content"><div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">特使的不好使用</div></string><int name="priority">3</int><boolean name="saveSentCopy">true</boolean><string name="charset">GBK</string></object><boolean name="returnInfo">false</boolean><string name="action">deliver</string><int name="saveTextThreshold">1048576</int></object>

对于content,其实我们关注的是text正文,怎么把html文档转换为txt呢?

用正则表达式可以处理这个问题:
"<[^>]*>"替换为"", refer to http://www.cnblogs.com/noblepaul/archive/2004/09/25/46532.aspx。

原文地址:https://www.cnblogs.com/tangxiaosheng/p/4951080.html