Hi @ll,
almost 4 years ago, with Windows 10 1903, after more than a year
beta-testing in insider previews, Microsoft finally released UTF-8
support for the -A interfaces of the Windows API.
0) <https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page#activeCodePage>
| If the ANSI code page is configured for UTF-8, -A APIs typically
| operate in UTF-8. This model has the benefit of supporting
| existing code built with -A APIs without any code changes.
The last claim is but a bloody DANGEROUS lie!
As shown hereafter, it must read instead:
"This model has the malefit of causing buffer overruns in existing
code!"
1) For 30 years, the documentation of the -A interfaces for file and
directory management of the Win32 API states:
"The maximum path name length is 260 characters."
See <https://msdn.microsoft.com/en-us/library/aa363855.aspx>
"CreateDirectoryA function" for example:
| For the ANSI version of this function, there is a default string
| size limit for paths of 248 characters (MAX_PATH - enough room
| for a 8.3 filename). ...
...
| The 255 character limit per path segment still applies.
This constitutes a contractual GUARANTEE for the product behaviour!
2) The documentation for the file systems supported by Windows says
too "The maximum length of a file name segment is 255 characters."
See <https://msdn.microsoft.com/en-us/library/ee681827.aspx>
"File System Functionality Comparison"
3) With these 2 contractually GUARANTEED preconditions, the following
code is safe, i.e. not susceptible to a buffer overrun:
CreateDirectoryA() fails as soon as szPath exceeds the documented
limit which is less than the buffer size of 260 characters.
CHAR szANSI[] = "€"; // or one of the following other 122 characters
// from ANSI code page 1252:
// ‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂ
// ÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
CHAR szPath[MAX_PATH] = "";
do
strcat(szPath, szANSI);
while (CreateDirectoryA(szPath));
4) With UTF-8 support enabled, the same code now suffers from a
buffer overrun:
CHAR szUTF8[] = u"€"; // or "\xE2\x82\xAC"
CHAR szPath[MAX_PATH] = "";
do
strcat(szPath, szUTF8);
while (CreateDirectoryA(szPath), NULL);
STRIKE 1!
5) Given the 2 guarantees from 1) and 2), the following code is
also safe and not susceptible to a buffer overrun: see
<https://msdn.microsoft.com/en-us/library/aa365740.aspx>
"WIN32_FIND_DATAA structure" and "FindFirstFile function"
<https://msdn.microsoft.com/en-us/library/aa364418.aspx>
wfd.cFileName can NEVER receive a file/directory name longer
than 255 characters, so the concatenation of C: or .\ (as
well as C:\ and ..\ too) and wfd.cFileName NEVER overruns a
buffer of MAX_PATH!
#define PATTERN "C:*" // or "C:\\*" or ".\\*" or "..\\*"
WIN32_FIND_DATAA wfd;
CHAR szPath[MAX_PATH] = PATTERN;
HANDLE hFind = FindFirstFileA(szPath, &wfd);
if (hFind != INVALID_HANDLE_VALUE) {
do {
strcat(szPath + strlen(PATTERN) - 1, wfd.cFileName);
GetFileAttributesA(szPath); // do something with the
... // found file system objects
} while (FindNextFile(hFind, &wfd));
FindClose(hFind);
}
6) With UTF-8 support enabled and a file or directory named
€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€
(or other 86 characters from the above 123) present in the
CWD of drive C:, FindFirstFileA() (and FindNextFileA() too)
return a string of 86 * 3 = 258 characters in wfd.cFileName
which causes a buffer overrun in previously safe code!
STRIKE 2!
7) The following code enumerates ALL file system objects in a
(root) directory or network share:
WIN32_FIND_DATAA wfd;
HANDLE hFind = FindFirstFileA("\\\\host\\share\\*", &wfd);
if (hFind != INVALID_HANDLE_VALUE) {
do { // code to process wfd.cFileName omitted
} while (FindNextFile(hFind, &wfd));
FindClose(hFind);
}
8) With UTF-8 support enabled and a directory or file named
€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€
(i.e. 87 or more of the 123 characters from above) present,
both FindFirstFile() and FindNextFile() FAIL with the
previously impossible, NEVER encountered Win32 error code
234 alias ERROR_MORE_DATA: wfd.cFileName is to short for
UTF-8 encoded file/directory segment names!
STRIKE 3!
stay tuned, and far away from UTF-8 in Windows
Stefan Kanthak
PS: for the full story of Microsoft's EPIC failures with UTF-8
in Windows, see
<https://skanthak.homepage.t-online.de/quirks.html#quirk33>
<https://skanthak.homepage.t-online.de/quirks.html#quirk32>
<https://skanthak.homepage.t-online.de/quirks.html#quirk31>
Data
Build on a solid foundation with Vulners data
We provide the essential building blocks for cybersecurity solutions with comprehensive, structured, and constantly updated vulnerability and exploits data
Api
Power your application with Vulners API
The Vulners REST API offers reliable, high-performance access to vulnerability intelligence, with 99.9% SLA uptime and CDN-backed data delivery for seamless global access
App
Assess and manage vulnerabilities with Vulners tools
Built on top of Vulners' database and SDK, end-user solutions give security professionals and developers lightweight and powerful tools for vulnerability remediation