<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to feature-requests</title><link>https://sourceforge.net/p/uriparser/feature-requests/</link><description>Recent changes to feature-requests</description><atom:link href="https://sourceforge.net/p/uriparser/feature-requests/feed.rss" rel="self"/><language>en</language><lastBuildDate>Sun, 04 Oct 2015 21:34:28 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/uriparser/feature-requests/feed.rss" rel="self" type="application/rss+xml"/><item><title>#7 a way to get complete path</title><link>https://sourceforge.net/p/uriparser/feature-requests/7/?limit=25#8904</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Ticket moved from /p/uriparser/bugs/26/&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Sebastian Pipping</dc:creator><pubDate>Sun, 04 Oct 2015 21:34:28 -0000</pubDate><guid>https://sourceforge.net7bc577fe823874ea5069322ac9f404616d17362b</guid></item><item><title>#6 URL segments and UTF8 support for REST API</title><link>https://sourceforge.net/p/uriparser/feature-requests/6/?limit=25#679a</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Hi!&lt;/p&gt;
&lt;p&gt;For UTF-8 maybe check feature request #5 (http://sourceforge.net/p/uriparser/feature-requests/5/).&lt;/p&gt;
&lt;p&gt;About a helper accessing a path segment by index: Limited time and too low priority to me right now, to be honest.&lt;/p&gt;
&lt;p&gt;About the simple example of a test I'm not sure yet what exactly you re asking for.  If there are open questions about usage, I can offer support via voice chat.&lt;/p&gt;
&lt;p&gt;Best, Sebastian&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Sebastian Pipping</dc:creator><pubDate>Sat, 07 Feb 2015 22:06:26 -0000</pubDate><guid>https://sourceforge.net71973db9e0380251115b3f2cc51c7b9647cb61a2</guid></item><item><title>URL segments and UTF8 support for REST API</title><link>https://sourceforge.net/p/uriparser/feature-requests/6/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;First of all - thank you for all your efforts for creating uriparser.&lt;br /&gt;
I am trying to use it in an application - server based on cesanta mongoose. I want to create a meaningful REST API. For this I need to handle urls of the type&lt;br /&gt;
&lt;a href="http://server/api/mpd/playlists" rel="nofollow"&gt;http://server/api/mpd/playlists/&lt;/a&gt;интернетрадио&lt;br /&gt;
There are two things that would make it easier to implement the REST API:&lt;br /&gt;
Having a facility to access path segment by index (just a convinience)&lt;br /&gt;
Having a way to process the UTF8 parts of the url without going in and out of multibyte, if possible.&lt;br /&gt;
Even a simple example of a test in the test suite would be enough.&lt;br /&gt;
Thanks&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mircho</dc:creator><pubDate>Sat, 07 Feb 2015 18:54:44 -0000</pubDate><guid>https://sourceforge.netf13d4b7a3eb126e10f9f0b0561e0d2a6c85c64d9</guid></item><item><title>#5 Add support for UTF8</title><link>https://sourceforge.net/p/uriparser/feature-requests/5/?limit=25#e3d1</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;If you are speaking of the length of the verbatim content in field UriQueryListStructA.value, that's stored implicitly (as mentioned before).&lt;/p&gt;
&lt;p&gt;If you are speaking of the length of content in field UriQueryListStructA.value &lt;em&gt;after&lt;/em&gt; decoding to UTF-8, that would need internal UTF-8 decoding from uriparser, knowledge of the encoding in there etc.&lt;/p&gt;
&lt;p&gt;Also, please note that adding fields to structures breaks ABI compatibility with prior releases so that's something library authors need to think twice about.&lt;/p&gt;
&lt;p&gt;If you aim at storing length to known space requirements up front, what might help is a (safe) heuristic:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A character in UTF-8 may take 1 to 4 bytes&lt;/li&gt;
&lt;li&gt;A character in UTF-16 may take 2 to 4 bytes (see &lt;a href="https://en.wikipedia.org/wiki/UTF-16#Examples" rel="nofollow"&gt;https://en.wikipedia.org/wiki/UTF-16#Examples&lt;/a&gt; for four-byte examples)&lt;/li&gt;
&lt;li&gt;If the input was four-byte UTF-8 characters only, UTF-16 output would take x1/2 to x1 space in bytes, or "strlen(...) / 2 + 1" wchar_t elements at worst.&lt;/li&gt;
&lt;li&gt;If the input was all single-byte UTF-8 characters, UTF-16 output would take x2 to 4x (worsened on purpose) space in bytes or "strlen(...) * 2 + 1" wchar_t elements at worst.&lt;/li&gt;
&lt;li&gt;So "strlen(...) * 2 + 1" makes a safe worst case wchar_t character space calculation for later conversion to UTF-16.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Is that what you are looking for?&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Sebastian Pipping</dc:creator><pubDate>Fri, 23 Jan 2015 20:18:30 -0000</pubDate><guid>https://sourceforge.net39856c78e75674ab55f6b03ed62a8f2c440e7968</guid></item><item><title>#5 Add support for UTF8</title><link>https://sourceforge.net/p/uriparser/feature-requests/5/?limit=25#6bd4</link><description>&lt;div class="markdown_content"&gt;&lt;blockquote&gt;
&lt;p&gt;The only null byte in a UTF-8 string possible is an actual null character.&lt;br /&gt;
Please check the table at &lt;a href="https://en.wikipedia.org/wiki/UTF-8#Description" rel="nofollow"&gt;https://en.wikipedia.org/wiki/UTF-8#Description&lt;/a&gt; .&lt;br /&gt;
So UTF-8 can contain null bytes to the very same degree as ASCII.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Ok. But to exclude any &lt;strong&gt;potential&lt;/strong&gt; error during converting or operating with, It will be very useful if I have a size of that bufer that I will convert to UTF-16 or operate with that bufer treated as UTF-8 String. It's simple to add, isn't it? You parse query and adding size is not very difficult but will be very useful.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Sebastian Pipping</dc:creator><pubDate>Fri, 23 Jan 2015 19:44:08 -0000</pubDate><guid>https://sourceforge.net59276a95f7af5858096276b1958c6ea72c036b14</guid></item><item><title>#5 Add support for UTF8</title><link>https://sourceforge.net/p/uriparser/feature-requests/5/?limit=25#3002</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Hello again,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;UTF-8 String can contain zeroes! That's why the field "size" is needed.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The only null byte in a UTF-8 string possible is an actual null character.&lt;br /&gt;
Please check the table at &lt;a href="https://en.wikipedia.org/wiki/UTF-8#Description" rel="nofollow"&gt;https://en.wikipedia.org/wiki/UTF-8#Description&lt;/a&gt; .&lt;br /&gt;
So UTF-8 can contain null bytes to the very same degree as ASCII.&lt;/p&gt;
&lt;p&gt;More importantly, the string in field "value" is not a /full/ UTF-8 string but uses single byte characters shared with ASCII, only.  UTF-8 is what you have after the conversion in another buffer.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;So, now you haven't support for UTF8, at least for Windows.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It's the same for Linux.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;And I must manually convert from UTF8-bytes to UTF-16 as I did.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;One way or another, an additional call to a converter function is needed.  uriparser could ship with a UTF-8 to UTF-16 function, but I do not consider that to be uriparser's job.  There are other libraries to do that, that you can easily use together with uriparser with more or less the same level of convenience.&lt;/p&gt;
&lt;p&gt;I'm happy to have a quick Skype/mumble/Phone/Jitsi about it some time, if you feel that could help.  In that case, contact me offlist about a time and the medium of choice, please.&lt;/p&gt;
&lt;p&gt;Best, Sebastian&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Sebastian Pipping</dc:creator><pubDate>Thu, 22 Jan 2015 13:47:25 -0000</pubDate><guid>https://sourceforge.net800e007d81f99a3d08bc442aeeb23fef780372df</guid></item><item><title>#5 Add support for UTF8</title><link>https://sourceforge.net/p/uriparser/feature-requests/5/?limit=25#e2d2</link><description>&lt;div class="markdown_content"&gt;&lt;blockquote&gt;
&lt;p&gt;Your options are...&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So, now you haven't support for UTF8, at least Windows. And I must manually convert to UTF8 as I did.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;About adding length to UriQueryList: member "value" is a zero terminated string so it is &amp;gt;carrying its length around implicitly.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;"member "value" is a zero terminated string"&lt;/em&gt; will not work for UTF-8 String.&lt;br /&gt;
UTF-8 String can contain zeroes! That's why the field "size" is needed.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">rasjv</dc:creator><pubDate>Thu, 22 Jan 2015 08:27:51 -0000</pubDate><guid>https://sourceforge.net2670c74a3a734992b375dba79bd8845fdbfb083d</guid></item><item><title>#5 Add support for UTF8</title><link>https://sourceforge.net/p/uriparser/feature-requests/5/?limit=25#302a</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;My understanding is that&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;you have a string in a wchar_t array&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;with a percent encoded UTF-8 string.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Your options are:&lt;/p&gt;
&lt;p&gt;a) Convert the string into a char array picking every second byte. If the URI is valid that's a lossless operation. You the parse the URI using uriParseUriA, run uriDissectQueryMallocExA to dissect the query and run uriUnescapeInPlaceExA on the query parts. That should give valid UTF-8 if it was initially.&lt;/p&gt;
&lt;p&gt;b) Keep the string in the wchar_t array, use uriParseUriW, then use uriDissectQueryMallocExW, copy the query parts into a char array picking every second byte (again lossless), run uriUnescapeInPlaceExA on those, again valid UTF-8 if it was initially.&lt;/p&gt;
&lt;p&gt;About adding length to UriQueryList: member "value" is a zero terminated string so it is carrying its length around implicitly.&lt;/p&gt;
&lt;p&gt;Best, Sebastian&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Sebastian Pipping</dc:creator><pubDate>Tue, 20 Jan 2015 20:45:46 -0000</pubDate><guid>https://sourceforge.net56702ced3eeb81d69aac3085866615f429ca0d21</guid></item><item><title>#5 Add support for UTF8</title><link>https://sourceforge.net/p/uriparser/feature-requests/5/?limit=25#c443</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Hello, Sebastian.&lt;/p&gt;
&lt;p&gt;I use uriParseUriW. The whole code is:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;UriParserStateW state={0};&lt;br /&gt;
  UriUriW uri={0};&lt;/p&gt;
&lt;p&gt;state.uri = &amp;amp;uri;&lt;br /&gt;
  if (uriParseUriW(&amp;amp;state, L"https://www.google.com/search?q=%D1%80%D0%B0%D0%B7%D0%B1%D0%BE%D1%80+URL+%D0%BD%D0%B0+%D0%BF%D0%B0%D1%80%D0%B0%D0%BC%D0%B5%D1%82%D1%80%D1%8B+C%2B%2B&amp;amp;ie=utf-8&amp;amp;oe=utf-8#q=parse+URL+C%2B%2B\x0") != URI_SUCCESS)&lt;br /&gt;
  {&lt;br /&gt;
     /&lt;em&gt; Failure &lt;/em&gt;/&lt;br /&gt;
     uriFreeUriMembersW(&amp;amp;uri);&lt;br /&gt;
  }&lt;br /&gt;
  //success&lt;br /&gt;
  //do something with uri&lt;/p&gt;
&lt;p&gt;UriQueryListW * queryList=0;&lt;br /&gt;
  int itemCount;&lt;br /&gt;
  if (uriDissectQueryMallocW(&amp;amp;queryList, &amp;amp;itemCount, uri.query.first,&lt;br /&gt;
     uri.query.afterLast) != URI_SUCCESS)&lt;br /&gt;
  {&lt;br /&gt;
     /&lt;em&gt; Failure &lt;/em&gt;/&lt;/p&gt;
&lt;p&gt;}&lt;br /&gt;
  //success&lt;br /&gt;
  //do something with queryList&lt;br /&gt;
  const wchar_t *query1;&lt;br /&gt;
  query1=queryList-&amp;gt;value;&lt;/p&gt;
&lt;p&gt;uriFreeQueryListW(queryList);&lt;br /&gt;
  uriFreeUriMembersW(&amp;amp;uri);&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;UTF-8 characters are not double-byte characters, they can be from 1 to 6 bytes.&lt;br /&gt;
UTF-16(wchar_t) yes, are double-byte characters in most cases(although can be very seldom special cases there UTF-16 character is more than double-byte, but this is an offtopic). &lt;/p&gt;
&lt;p&gt;So, I'll show you screenshot and you understand more clearly.&lt;br /&gt;
I have a query in UTF-8 but escaped in URL:&lt;br /&gt;
%D1%80%D0%B0%D0%B7%D0%B1%D0%BE%D1%80+URL+%D0%BD%D0%B0+%D0%BF%D0%B0%D1%80%D0%B0%D0%BC%D0%B5%D1%82%D1%80%D1%8B+C%2B%2B&lt;br /&gt;
This is a UTF-8 string not UTF-16! It means: &lt;strong&gt;разбор URL на параметры C++&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If you treat it as UTF-16 you will get what you can see on the screenshot highlighted in red:&lt;br /&gt;
&lt;img alt="query1" src="http://s30.postimg.org/86xh8dnkh/query1.png" rel="nofollow" /&gt;&lt;/p&gt;
&lt;p&gt;So, I need manually convert query1 bytes to UTF-8 string with this code(for Windows):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;wchar_t query1_utf16&lt;span&gt;[256]&lt;/span&gt;;&lt;br /&gt;
  char query1_corrected&lt;span&gt;[256]&lt;/span&gt;;&lt;br /&gt;
  int len,rez,i;&lt;br /&gt;
  len=wcslen(query1);&lt;br /&gt;
  for (i=0;i&amp;lt;len;i++)&lt;br /&gt;
     query1_corrected&lt;span&gt;[i]&lt;/span&gt;=&lt;em&gt;((char &lt;/em&gt;)((char &lt;em&gt;)query1+2&lt;/em&gt;i));&lt;br /&gt;
  rez=MultiByteToWideChar(CP_UTF8,0,query1_corrected,len,query1_utf16,256);&lt;br /&gt;
  query1_utf16&lt;span&gt;[rez]&lt;/span&gt;=0;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And we see exactly what we must see:&lt;br /&gt;
&lt;img alt="query1_corrected" src="http://s29.postimg.org/el095ukhj/query1_corrected.png" rel="nofollow" /&gt;&lt;/p&gt;
&lt;p&gt;If I will use your char-functions(A-ending) then the same except I get multibyte characters in "query1" and must manually convert this string type of char to UTF-16 to work with it in Windows. The "query1" string must be treated as UTF-8 string if you say: &lt;em&gt;"UTF-8 is supported already"&lt;/em&gt;. And it must be converted to UTF-16 in Windows OS because this the default Unicode format for this OS.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;It's also will be very useful if you give a size in bytes of such UTF-8 string (it can be calculate during parsing) and let do not make any additional calculations. So, I think a new property must be added to the "UriQueryListA struct" to the existing: key,value,next named len:&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;int size;&lt;/p&gt;
&lt;/blockquote&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">rasjv</dc:creator><pubDate>Tue, 20 Jan 2015 07:28:24 -0000</pubDate><guid>https://sourceforge.netf8d71f5f567325f344d5a3cfffecf17e93e766b4</guid></item><item><title>#5 Add support for UTF8</title><link>https://sourceforge.net/p/uriparser/feature-requests/5/?limit=25#e288</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Hello rasjv,&lt;/p&gt;
&lt;p&gt;are you using uriParseUriA/char or uriParseUriW/wchar_t?  All that uriparser knows about encoding is single-byte or double-byte characters.  If I'm not mistaken, all characters I see in the URI above have the same single-byte encoding in both ASCII and UTF-8. In that sense, UTF-8 is supported already.  Please help me understand what you are asking for.&lt;/p&gt;
&lt;p&gt;Best, Sebastian&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Sebastian Pipping</dc:creator><pubDate>Mon, 19 Jan 2015 19:58:11 -0000</pubDate><guid>https://sourceforge.net865213f503d5f8e052e1d8692998f242919b338d</guid></item></channel></rss>