Writing cross-platform huge file I/O code is tricky because the 64-bit offset versions of fseek
and ftell
are not standard across compilers and platforms. This article documents the CMarkup experience with this, related discoveries made along the way, and it may be useful to anyone dealing with this issue.
The limit of signed 32-bit integer offsets in ftell
and fseek
is 2^31-1 = 2147483647 which is around 2GB. For example, a common cross-platform way of getting file size is fseek
with SEEK_END
and then ftell
, but ftell
cannot return a number higher than the limit of its return type.
The 64-bit versions of these functions deal in offsets of billions of gigabytes (8 exabytes) which should be enough for single file sizes in the next decade or two ;).
There are no 64-bit offset versions of fread
and fwrite
functions. This is because they do not deal explicitly with file offsets, though they do operate relative to the current file pointer. The underlying file pointer can generally handle the huge file, so in theory you can read multiple 2GB blocks using fread
even if you don't have a 64-bit ftell
function available to query the current offset.
So even as huge file support was added to the file I/O functions in C/C++ libraries by the early 90s, there was no graceful way to promote the integer types of offset variables used in existing programs. The fread
and fwrite
functions were upgraded under the covers, while new versions of ftell
and fseek
were added because they dealt explicitly with the offset.
I've seen #ifdef WIN64
used in conjunction with _fseeki64
. I am not sure why they were doing that, but don't be confused. A 64-bit operating system is not required for 64-bit file offsets; you can usually use 64-bit offsets in a 32-bit operating system.
The 64-bit offset versions of fseek
and ftell
are used to support huge files with the CMarkup release 11.0 file mode methods. The file mode methods give read and write access to files without loading the entire document into memory. File mode does not require 64-bit offsets, but 64-bit offsets are needed if you are dealing with files over 2GB.
Most UNIX flavor compilers like gcc are standardized on the ftello
and fseeko
which use the off_t
integer type that depends on compiler setup. The functions resolve to the old fseek
/ftell
or huge fseeko64
/ftello64
in concert with the _FILE_OFFSET_BITS
and _LARGEFILE_SOURCE
macro defines.
Unfortunately VC++ doesn't use this system.
Visual C++ provides its huge versions, _fseeki64
and _ftelli64
inconsistently. There is no clean way to always know whether the 64-bit versions are available and to automatically compile accordingly.
In CMarkup release 11.0 the 64-bit functions are used by default in Visual Studio 2005. This is not correct for target platforms where the 64-bit functions are deliberately excluded such as for Windows CE.
Since the 64-bit functions are covertly distributed with Visual Studio 6.0 (see below), in CMarkup release 11.1 I declared prototypes for the 64-bit functions by default in Visual Studio 6.0. The caused more problems because the functions are not available in some build configurations of VC++ 6.0.
In CMarkup release 11.2 with any version of Visual Studio you will need to define MARKUP_HUGEFILE
in the project definitions if you want huge file access. This will eliminate these linker errors for first time users, while putting the burden on those who need huge file access to specify the define.
One alternative would be to use the Win32 File APIs and the SetFilePointerEx
available on Win2K+, but this would greatly increase the code differences between the different platforms.
CMarkup started using fseeki64 and ftelli64 in release 11, but they are optional. In the next release I will not use them unless you set a define. To turn them off do something like this:
Old line 223:
#elif _MSC_VER >= 1000 // VC++
New line 223:
#elif _MSC_VER > 4000 // never
11.1 Issue: linking fseeki64 and ftelli64
Robin Hilliard 12-Jun-2009
I'm still on VC++ 6.0 (don't ask) and the functions _fseeki64
and _ftelli64
don't exist. To fix this, you just need to bump the compiler
version directive as follows:
Old line 223:
#elif _MSC_VER >= 1000 // VC++
New line 223:
#elif _MSC_VER > 1200 // > VC++ 6.0?
In CMarkup release 11.1 I implemented a workaround to offer huge file (>2GB) access by default in Visual C++ 6.0 and Visual Studio .NET versions before VC++ 2005. Huge file access is only useful for the developer version file mode (see Open) methods.
However, I did not realize that it depended on a project setting for these functions to be available in the VC++ libraries you link with. If in your project settings you have the default Microsoft Foundation Classes setting "Use MFC is a Shared DLL" you will get the following problem when linking CMarkup release 11.1:
Linking... Markup.obj : error LNK2001: unresolved external symbol __ftelli64 Markup.obj : error LNK2001: unresolved external symbol __fseeki64
If you are able to change your Microsoft Foundation Classes project setting to "Use MFC in a Static Library" that is one way to alleviate the linker error and keep huge file access. Otherwise, you can change the compiler version directive as shown above and you cannot use file mode with files over 2GB.
Microsoft Visual Studio\VC98\CRT\SRC\FSEEKI64.C Microsoft Visual Studio\VC98\CRT\SRC\FTELLI64.C
Microsoft Visual Studio\VC98\Lib\LIBCMT.LIB
dumpbin /exports libcmt.lib
does not show it but the following does:
dumpbin /all libcmt.lib
Linking Problem of CMarkup V11.1
Kang Yiqi 27-Jul-2009
I'm using CMarkup (V11.1) class in my laboratory project (Windows XP & Visual C++ 6.0) for parsing xml documents. I've download Markup.h & Markup.cpp from www.firstobject.com and added them into my VC++ project.
There's a problem in Markup.h, line 223: "#elif _MSC_VER >= 1000
". As the version of VC++ 6.0 is 1200, MCD_FSEEK
and MCD_FTELL
are defined as _fseeki64
and _ftelli64
. This causes 2 linking errors, e.g. "unresolved external symbol __ftelli64"
However, if I choose to "Use MFC in a Static Library", just the same as the sample project does, it's okay. But the size of execution file seems to be too big. So I changed the line 223 like this: "#elif _MSC_VER > 1200
", and the problem is solved. I also try version 11.0 of CMarkup and there's no such problem, because it define MCD_FSEEK
as _fseeki64
only when _MSC_VER
is bigger than 1400 (VC++ 2005).
I would suggest that for VC++ 6.0, it's better to define MCD_FSEEK
and MCD_FTELL
as fseek
and ftell
, rather than _fseeki64
and _ftelli64
Thank you for the excellent feedback and research. Release 11.2 is fixed according to your suggestion.
David 17-Jun-2009
I got an error in my application which is a windows mobile program, developing in VS 2008. I created the application in the following steps:
a) Create a smartdevice project: New project -> smart device ->Win32 Smart Device project
b) Add markup files: copy the two files into the project directory, then add them into the project, and select the "cmarkup.cpp", set it properties "Not Using Precompiled Headers"
c) Add some code in the testmarkup.cpp:
#include "Markup.h"
...CMarkup xml; xml.Load(_T("UserInfo.xml"));
d) Compile the project, then I got the following error [configuration Debug Windows Mobile 5.0 Pocket PC SDK (ARMV4I)]: