gsoap bug 小记

发送 socket 数据前，使用 _write 方法进行 socket 通讯检查的bug

version: gsoap_2.7.13

operate system: windowsxp

describe:

I want to send the http request mulitie times by the same socket. and I coded as follow:

for ( size_t retrytimes = 0; retrytimes < 3; retrytimes ++ )
{
    // whenever connected; send the http request;
    if(
        soap_end_send(  dom.soap  )
    )
    {
        // if failed, try to create the socket session;
        soap_connect( dom.soap, url , NULL ) ;
    }
    // once send success, break the loop;
    else
that is     {
        break;
    }
}

the problam is : wheath the connect is estabilsh or not, soap_end_send() always success.

I find that is because some code valide the socket connnect as follow:

in soapstd2.cpp::fsend() 中，

#ifdef WIN32
nwritten = _write(soap->sendfd, s, (unsigned int)n);
#else
nwritten = write(soap->sendfd, s, (unsigned int)n);
#endif

the write is just a normaly IS function and It can't sure weather the fild handle is a socket handle . so , if I just open a normal file, it is just return success.

I think send() is more 合适的。

2\容错性：

such tag:

<div><ul><span>hello!</span>
</div>

soap_element_end_in(struct soap *soap, const char *tag)

  if (tag && (soap->mode & SOAP_XML_STRICT))
  { soap_pop_namespace(soap);
    if (soap_match_tag(soap, soap->tag, tag))
    { DBGLOG(TEST, SOAP_MESSAGE(fdebug, "End element tag name does not match\n"));
//  //不匹配的情况下，手动地添加结束串使之匹配
    soap->bufidx -= strlen(soap->tag);
    soap->bufidx = offset;
    soap->ahead = SOAP_TT;
    strncpy(  soap->tag, tag, strlen( tag ) );

标签中如果第一个字为汉字，则出现乱码的问题。

如下的一个 utf-8 编码的串:

soap_get

<ul>
<li>公司：先原房产佳林店</li>
<li>地址：上海市浦东新区佳林路134号 </li>
</ul>

解析后保存的结果是

<ul>
<li>l 司：立好信房屋</li>
<li>0 址：常德路1041号-1043号</li>
</ul>

“公”和“地”成为乱码。这是 gosp 解析 xml 时的算法缺陷。

在函数 soap_wchar soap_get(struct soap *soap) 中，读取一个字符的算法是：

如果 gsoap->ahead 中有一个字符，则返回这个字符。否则，从 gsoap->buf 中取一个字符， gsoap->bufidx++.

gsoap->aHead 的值来源于 soap_peek_element（），其逻辑是：

取到标签 <li> 时，读一个utf-8字。如果不是<，则判定是标签的<li> 的 data（而不是子结点）。这就就需要把读到的utf-8 退回去。

这里不是退回 gsoap->bufidx--，而是把这个值保存在 gosp->ahead 中。而这里 gsoap->ahead 保存的是字符的 unicode 编码。

  if (c != SOAP_LT)
  { *soap->tag = '\0';
    if ((int)c == EOF)
      return soap->error = SOAP_EOF;
    soap_unget(soap, c);

3、抽象的层次

gosp dom 有两种基本对象:node 和 attribute 。每个对象有两个属性:　name和 value。规则简单。易于使用。但如果有以下的 html内容：

则 gosp 认为　大海是<span>你</span>的故乡　　是一个　value 属性。而不会将　<span> 解析为一个结点。

xmlspy 比 gsoap 提供了更高层次的抽象。xml 中 node 是抽象的 dom 树，而value 代表真正的值-- 这个值可能是　tag、也可能是 value，也可能是在 xmpspy 中，上面的 html 语句解析为

即一个div node 中有三个子 node。