KHTML on Win32 Native
Start Here

Development Journal

Unresolved issues are marked with a bold exclamation point (!) .

Wednesday, March 24th, 2004

I finally got around to finding the long lost password and login for this site.

This project stalled. Life got difficult for me in early 2003. I spoke with folks who were interested in this project, but none so interested as to assume control.

I am working on a new initiative to create an open source XML data base, Momento. I've written an XML testing framework, and an encapsulation of Java SAX for reuse that you might find interesting. It is all at The Engine Room.

The modifications that I made to the source that permit it to be compile under VC++ 6.2 where rolled into the KHTML base back in 2003.

I am frequently asked about ports of Apple's libaries, things like WebCore. Never happened. I was looking to implement those services using the Win 32 API and standard C++.

I must say that I am astounded that no one has picked up this project. The KHTML source base is quite excellent. I suppose the fears of GPL and the healthy respect of the QT license have done a lot to keep KHTML off of Windows. Pity. It is excellent code.

In the comming months, provided I can get Momento off the ground, I'll be starting to play around with an ASF licensed browser in C++. If anything, this project taught me that there world longs for a standards compliant, lightweight, emeddable browser under a BSD license.

Let me know if your're one of those people.
Friday, January 17th, 2003

- KWQString.cpp begins.

The KWQ code is difficult to follow since there are three string classes, one for NextStep, one for CoreFoundation, and one for KWQ. I will not be needing the the first two. The QTextCodec class translates itself to a CoreFoundation string in order to encode and decode. In fact, the transcodeFrom methods are really implemented in QString.

If, when assigned to QString, the data is not copied, then I'd like to move the actual transcoding into QString where I can allocate read directly from and write directly to the underlying character buffer. KWQString.cpp now complies, less the allocation functions, and the transcode from Unicode function.

The calls to Xerces to transcode are simple enough to implement, I just need to find a place to put them.

Thursday, January 16th, 2003

- KWQTextCodec.cpp nears completion.

! QTextCodec::codecForLocale is the last step in KWQTextCodec.cpp but it is confusing. This method is only called once in KHTML when nothing else can be found to encode a form submission. This will happen when the none encodings accepted by the server can be found on the system, or if the server has failed to to specify which encodings it accepts.

Implementing this method is confusing because a Win32 application does not assign locale specific code pages. The Win32 API states:

The system locale determines which code page (ANSI, DOS, Macintosh) is used on the system by default. The system locale setting affects only ANSI applications, that is, applications that are not fully Unicode-compliant.

KHTML is a Unicode application. I suppose the case may be that no code pages are installed on the Win32 machine and all applications are emulating ANSI applications. Should I just return UTF8, rather than call GetOEMCP()?

Since this is only called to submit a form, and since form submission has far more dependences, I am going to defer the implementation of this function.

! I feel that it is fair to say that KWQTextCodec.cpp can now do without the asterisk next to its name in the list of ported files. There is one issue regarding this file but it has been documented. Hopefully another developer with a better understanding of I18N will be able to propose a solution to this problem. Once I complete the wrappers to Xerces, I  can move on to the next file.

Added High Relevance which will contain the links that I am visiting most frequently at any point in the project. I keep this page open during development to take notes and to follow the links I've collected.

I am growing ever more impressed with SourceForge, is offerings and implementation. It was a little confusing at first, probably because there is so much there on offer. It is all starting to make sense now. 

! I am gathering up the private e-mail discussions I've had about this project and culling them for a project mission statement.

Wednesday, January 15th, 2003

Using this file (character-sets) and Perl I created static lookups table to map MIBenums to character encoding names. This to implement QTextCodec::codecForMib. It also helps me to quickly validate that a requested codec exists. I wrote the lookup functions in C and packaged them in their own DLL so that, in theory, the DLL could be updated separately should the database change.

Win32 provided a transcoder API. The database of available codecs is kept in the Windows registry. Native support, for smaller binary and better performance. Rather than sift through this API myself, I am going to statically link to  the Xerces C++ Parser. Xerces has C++ interface for transcoding, with an implementation that uses the Win32 transcoding API. They have sorted out the details of reading the registry and invoking the appropriate API functions.

The Unicode resources I've encountered: 

! I still have not sought an API method that will give me the character encoding for the locale. This is required to implement QTextCodec::codecForLocale.

I am using an STL map to keep a hash of codec objects that have been created, although it's a pity that I must since they will only be wrappers around Xerces transcoders.

The STL resources I have encountered:

I am surprised that the inclusion of <map> did not drastically slow down the compile time of KWQTextCodec.cpp.

Tuesday, January 14th, 2003

My raw thoughts form this development journal. My free programming will be spent today creating this site and configuring SouceForge.

KHTML compiled. Visual C++ works. Probably because KHTML makes use of Qt, and therefore follows Qt's coding standards. Qt is cross-platform, so KHTML has inherited cross-platform traits. Many people are quick to suggest a flavor of Win32 GCC, but it is unnecessary. It misses the point. Native means that it does not require any form of UNIX abstraction layer.

There are a few classes in KHTML that do not compile. KHTML also has one class that depends on pthreads, which means that a pthreads like wrapper to Win32 threads is necessary.

The KWQ classes are the real chore. KWQ is built on top of NextStep. It is written in Objective C++. It depends on an OS X library, CoreFoundation, for Unicode support, URLs, and some other basics.

Last night I began to go though and port all of the KWQ classes, starting with the basic classes and working my way up. I would like to do without CoreFoundation, which provides services that are already provided by Win32.

The Unicode support is the first to port, and has been quite simple so far, since the Unicode support in Win32 is fairly strong. QChar simple calls the wide variant of the standard C library to determine the digitness or casedness of characters. I suspect that there may be caveats to these functions.