An old/new approach to QtWebKit Hybrid

Published Wednesday August 31st, 2011
28 Comments on An old/new approach to QtWebKit Hybrid
Posted in QtWebKit, WebKit

In a previous blog post, we were talking about the new direction we’re taking with WebKit2, and a few people were concerned about the future of hybrid application development. Re-iterating the issues we were facing (see Ademar’s comment in the above blog), it’s extremely costly to maintain a deep C++ API like the one we provided for QtWebKit on top a fast-moving project like WebKit. Since I’m a fan of the hybrid approach,  I was thinking about it a lot. Though I’d much rather start my blog post with “The QtWebKit bridge now works with WebKit2”, that’s unfortunately not feasible. The QtWebKit bridge relies too much on the internals of WebKit(1), which has the application and the web-view bound together in a single process.

What we’re trying to achieve with hybrid can be divided into three categories:

  1. Tweak and control the browser window and interactions. This would still be possible to some extent as we’re still exposing a WebView to QML/C++.
  2. Interacting with existing web sites, for example by injecting JavaScript into the frame. This is currently not exposed in WebKit2.
  3. Exposing functionality and “local” data to the web view that would otherwise not be  available to the browser.

The problem I’m trying to address here is the third one. If we look at what kind of functionality we want to access from our web view, it’s in most cases data or off-screen functionality. For example, using C++ for heavier computing, a custom caching technique, or accessing capabilities such as a local calendar or microphone.
This brings me back to the way this had been achieved in the past. Since one of the things web views could always do is access the network via http, creating a small local HTTP server that exposes local functionality would give us the same value, without tying us to a particular implementation detail in WebKit, such as the QtWebKit bridge.

So what am I selling?

The idea is to use a headless web-server as a “channel” between the web-view and the host application. The server would listen on a port which would be accessible to the HTML page via HTTP requests, and the Qt application can respond to those requests and publish messages to that channel. This approach gives us that same data/functionality channel between the Qt application and the HTML page, without the need for custom APIs inside WebKit.
This is not a new approach. But, there have been a couple of issues with it in the past. First of all, security. Unlike the bridge, a local web server is not tied to a particular web view, which means that any web site can attempt to access your locally exposed functionality if it knows what port you use for your web server.
Second, integrating a full-blown Apache web-server or even lighttpd adds a lot of complexity to the equation, and buys us very little, as we don’t necessarily need to support PHP, CGI, serving files with different mime-types, or a scalable process model. This becomes even more apparent when compared to the ease of use of the QtWebKit bridge, which really makes writing a standalone Qt application with a web view easier.

Introducing Qt Web Channel

So I decided to try and come up with the smallest possible implementation of a web server that would give us two things – a web server that is only accessible to certain web-views, and tight integration with the host Qt application, without changing anything in the web view internals.  A key guideline to this implementation was to not try to solve any problem beyond that. The result is below. The HTML page can send requests, and we can respond to those requests from our QML or Qt-C++ container.  The web view can also subscribe to a named message channel that QML/C++ can later publish messages to.

From JavaScript, it might look like this:

navigator.webChannel.exec("doSomething", function(response) {
//    do something with the response.
});
While this can be handled from QML:
WebChannel {
        onExecute: {
            if (requestData == "doSomething")
                response.send("OK");
        }
}
The per-web-view security is handled through a simple shared secret, that needs to be passed to the web-view, for example via the url. The secret is a string that the HTML page needs to send with ever request to our web server, otherwise the request will automatically be rejected. At the example below, a random website cannot access your local functionality without knowing your web channel’s base url, which contains that shared secret.
WebView {
        url: "index.html?webChannelBaseUrl=" + webChannel.baseUrl;
}

This can be installed and accessed easily as a QML import, which allows you to use this web-server as a channel between web and native in your own QML application. It’s also possible to use this from C++, without QML. See the README and examples.

A nice outcome of this endeavor was that it was pretty easy to write an abstraction on top of it that produces the same QObject integration as the good-old bridge. See http://qt.gitorious.org/qt-labs/qwebchannel/trees/master/examples/qtobject.
But there are some things in the current API that this approach doesn’t handle, such as injecting javascript into an existing page, or C++ manipulation of web elements. Those would unfortunately require an actual additional WebKit API.

Flow of a web-channel based application

To explain a bit how the application would behave in reality, this is the basic flow:

  1. A WebChannel element is created from QML.
  2. The WebChannel generates a secret-protected URL pointing to a script.
  3. The QML application passes the script URL to the WebView.
  4. The HTML page in the WebView load the script from the URL.
  5. The HTML page now has a webChannel object, that looks something like this:
    {
        exec: function(message, onSuccess) { ... },
        subscribe: function(eventID, onMessage) { ... }
    }
  6. The web page can execute or subscribe to messages that are handled in the QML container.

What’s next

This is, for now, a concept. Whether or not it gains momentum, is up to you.
Code is at http://qt.gitorious.org/qt-labs/qwebchannel.
Thoughts?
Do you like this? Share it
Share on LinkedInGoogle+Share on FacebookTweet about this on Twitter

Posted in QtWebKit, WebKit

28 comments

Adam Higerd says:

It just so happens that point #2 is absolutely mission-critical for my application. It also happens that being able to work with non-HTTP protocols is incredibly mission-critical — arguably MORE critical than the other point because I could always implement the other behavior Comet-style if I really had to but that would be… very suboptimal. I’m currently able to achieve this through the current QtWebKit combined with QNetworkAccessManager. (And I’m worried about how I’m going to port all of this to stock WebKit for an iOS port!)

I know what advantages WebKit2 brings for actual web browsers. What advantages will it bring to hybrid applications?

Furthermore, one thing that had been mentioned a few times with the WebKit1 implementation was that there was no way to manipulate the page from C++. It sounds to me like WebKit2 is bringing us FARTHER away from this, rather than CLOSER to it, which is a significant disadvantage for a number of application concepts. What can we expect to see in this regard?

No'am Rosenthal says:

@Adam, regarding your first question, WebKit2 provides a much better browsing experience, which for some class of hybrid applications, like ones that render the web content with zooming/scrolling rather than full-screen, makes a lot of sense.
Some class of hybrid applications are better off with WebKit1, especially if they rely on deep integration with WebKit internals like QWebElement.

Regarding the second question, if I understand it correctly, in WebKit1 you can manipulate pages with C++ today. With WebKit2 this is trickier because the internals of the web page is decoupled from the application’s process. I believe that something like QWebFrame::evaluateJavaScript, which can actually give you a lot of capabilities in terms of manipulating existing web pages, is technically possible in some form or another (probably more async-ish than the current version), but right now it’s more a matter of priorities as we’re trying to flesh out the base API.

Dragan says:

Right, and the logical next step would be to expose C++ application via a websocket and build a small JS lib around it to make it easier to use, no?

No'am Rosenthal says:

@Dragan: my original version for this was web-socket based, but I wanted to publish something simpler first. Yes, this would be a logical next step that can save some overhead.

Ryan says:

Why not register a custom protocol with WebKit so websites can then go to “qt://whatever/resource” and get the required page? This would call out to a defined Qt interface which applications can derive from.

If adding POST-like functionality is also possible, then you’re golden.

Probably some security ideas like only file locally loaded can access the qt:// page, or the maybe the Qt interface implementation can check the referer if the data should be loaded or not.

No'am Rosenthal says:

@Ryan: That requires APIs in WebKit. It’s already available in WebKit1. The concept of this particular project was to enable basic bridging functionality for WebKit2 without additional APIs – no more, no less 🙂

JarJarThomas says:

Hi

This is now a really critical theme for us. For our next iteration of our software we decided to use an hybrid aproach.
An html view (probably generated automatically or by a customer) and objects providing functionality and signals.
QML is just not an option because we need a real desktop ui for the rest and all the stuff
like manipulation of the ui by c++ at runtime,
different screensizes
complex userinterfaces
are just nothing that qml is designed for.

Therefore we need a robust way to use webkit with c++ in a widget application.
With Webkit1 and the Bridge it works really well.
We have an Object that exposes functionality, inject it into webkit, finished.

One developer can simply handle this, no need to do anything special in html, no need to do some complicated (and therefore dangerous) workarounds.

So basically this is what is needed for real hybrid applications.
One way to expose functionality simple to use
without doubling code (create an object in javascript that encapsulates the functionalty that an object in c++ already has)
Simply manipulate a page if something changes.

If that simplicity is removed regardless why and not another SIMPLE TO USE way exists, it is a real backstep and nothing can make that better.

So whatever design decision is, it should first
-> Not see qml as primary target, if it works easy in c++ it will be easy in qml also, but if it is a monster to use in c++ it failed
-> Really see the point that you don’t want to define your code literally twice, once in c++ once in javascript.
-> Think big … think about an application that has to handle 300-400 parameters, complex data like on the fly generated images, curves and other non trivial data.

That has to work or it is just useless besides some playstuff that can be easily done without html.

JarJarThomas says:

Regarding the headless webserver

I do not especially have something against it, i used it already in other projects.
The big plus is that you CAN, if you want to allow it, have remote control easily.

What we did was a webserver that allowed to access registered QObjects.
If a QObject had a property, we could request and set it.
If a QObject had slots, we could call it.

What we were missing were signals (and that’s the big drawback here regarding webbridge).
Because of missing signal functionality we had to poll all the time if something happend and had to write our own signal handling system in JavaScript that asked all 10 seconds if a signal happend.

Nevertheless to say … the performance was really ugly. Mostly because of the http overhead.
I don’t know what overhead you have in your solution, but we had up to 1kb of traffic for a single value
(http request header, the continue response, the data response, the caching tags).

No'am Rosenthal says:

@JarJarThomas: The qtobject example in this solution handles signals using comet, and it’s possible to optimize it in the future to use web sockets.
There’s no QML dependency in the code, it’s pure C++.

JarJarThomas says:

So in the end (and hopefully not hitting the submit button) to early again 🙂

If that system can be as simple to use and handle as webbridge -> GREAT you are a lifesaver.
Especially the possible remote control would be great (if i want to allow it).
But please do not focus only on qml but also on more performance relevant tasks and c++ interface.

So great to hear that something is tried to make webkit 2 usable for us too.

So now really end 🙂

Greetings Thomas

Æ says:

What about utilizing mobility api (http://tinyurl.com/3clxymo) for finding services and using them e.g. with web-socket backend?

zbenjamin says:

These days i really feel like Qt is getting crippled. It seems like everything that is needed on the Desktop is getting removed.
I considered to use QWebKit as our main Gui Tool. I still can’t see things like real QAbstractItemModels in QML and no a ListModel is NOT a real ItemModel. But now i’m not shure anymore, what will be possible in the new Implementation. Just 2 classes that are only useable from QML? That does not sound like it will offer lots of API.
Maybe at least you guys should make the internal WebKit API accessible, so everyone can just use what they need.
I think injecting JavaScript is really critical if you try to build hybrid web applications.

No'am Rosenthal says:

@zbenjamin: of course it feels like that. We’ve just started with WebKit2. Give us some time to finish the API.
WebKit2 is a huge step forward, especially as a backend for a browser, but also for interactivity and security in general.
Also, we’re trying not to repeat past mistakes – WebKit1 had dozens of permutations of how to render the view and how to access the internals. It became very hard to maintain and slowed down development. In WebKit2 we’re trying to do it right by exposing little at first and gradually add more. It’s not about removing or crippling anything.

zbenjamin says:

@JarJarThomas

You are definitely right, lots of changes these days feel like backsteps.
And sometimes i also feel a “we want to change that and we don’t care that its proven and used by a lot of people” kind of mentality.
I can not remember i had a feeling like that in good ol’ Trolltech days. Lets hope for the best.

zbenjamin

Jim says:

We use an application that render QWebElement into images. It will be nice if you can add this functionality to webkit 2.
I think a priority would be to create a C++ api. I know that webkit2 is different from webkit 1 , and probably it will require a way to synchronize the threads.

Dragan says:

@Noam,
Yes the optimization is great but I am more interested in the ability of a websocket to deliver events rather than be forced to use long poll. Do you plan to publish the WS variant too and which version of the protocol does it come with (http://wiki.tools.ietf.org/html/draft-ietf-hybi-thewebsocketprotocol-13 or the old Hickson drafts)?

No'am Rosenthal says:

@Dragan: I see the websockets approach as an optimization to the long-polling approach. I want to see first if the base solution gains any traction.

Aaron Seigo says:

I spoke with a webkit/gtk+ hacker the other week about what they were doing about these things and their current solution is to create render-side plugins that can be loaded from the application via IPC directives (like your HTTP daemon here). Coupled with your HTTP bridge solution, could this not address point #2 in your list?

Also, as a side note: a single shared secret that passes in clear text is no security. It’s vulnerable to guess attacks and man-in-the-middle. As a PoC (proof-of-concept) it’s indeed fine as it shows you can add security measures, but the end result must certainly be more robust than this, right? 🙂

No'am Rosenthal says:

@Aaron: Guess attacks, yes. Man in the middle, not if it’s only exposed through localhost. But, yes, for now this is just an initial solution and suggestion to improve its security are welcome.

Scorp1us says:

Why not just use a local socket with digest auth?

I’m getting worried that we’re going to go from 0 to 10 QtWebServers. I suggest we put on the brakes and come up with one base server with plugins.

Niels Mayer says:

For another approach to “web views [that] access the network via http, creating a small local HTTP server that exposes local functionality would give us the same value, without tying us to a particular implementation detail in WebKit” — please take a look at http://code.google.com/p/qtzibit/ (
MeeGo: http://nielsmayer.com/meego/qml/qtzibit-0.1.0-1.i586.rpm
Harmattan: http://nielsmayer.com/meego/qml/qtzibit_0.1.0_armel.deb ).

This code, mostly javascript and HTML-template delegates for rendering the data:
http://qtzibit.googlecode.com/svn/trunk/exhibit/src/webapp/examples/YouTube/YouTube.html
Produces this result:
http://nielsmayer.com/meego/qml/qtzibit-youtube-named-feeds.png

Helmut ;Muelner says:

To repeat Aaron: a single shared secret that passes in clear text is no security! That shared secret will also be visible as a string in a distributed executable, or in distributed QML or published source code.

No'am Rosenthal says:

The shared secret is generated in runtime when the web channel is created. It’s not distributed with the source code or executable.

Domenico Zucchetti says:

Many application will need to implement a REST API (http://en.wikipedia.org/wiki/Representational_State_Transfer)
But I’m not able tu judge if the same approach could be also used to interact between Webkit2 and c++ QT.

No'am Rosenthal says:

@Domenico: no reason why we couldn’t use REST here; though my solution doesn’t have that kind of abstraction/convention.

Adam Higerd says:

So what’s happening with QNetworkAccessManager? Why does QWebChannel require the use of an HTTP channel? Why does it require binding a socket at all?

One of the most valuable features for hybrid applications, I think, is the fact that QNetworkAccessManager is so beautifully extensible. (As I mentioned in my first comment, I shudder to imagine how much work it’s going to take me to port my app to iOS.) With the current QtWebKit, I could implement this QWebChannel as a scheme (perhaps “qt://”) for QNetworkAccessManager take care of it. It wouldn’t require any networking at all — no need for a shared secret, no need for a socket, and even no need for the whole bulky mess that is HTTP protocol headers. All you need is a URI, a content-length, and the content — you don’t even need a verb for something that’s an internal implementation detail of a high-level API, although if you expose the channel to code on either side you may want to include the verb anyway for the sake of satisfying the people who want REST.

If WebKit2 isn’t going to be using QNetworkAccessManager, I’m… worried, to say the least. Porting to WebKit2 will be like porting to iOS at that point, and that’s not something I’m looking forward to.

No'am Rosenthal says:

@Adam: extending QNetworkAccessManager is an orthogonal problem to what QWebChannel is trying to solve. For now, I wanted to create basic hybrid application support for WebKit2 without the need for new APIs.
I don’t see a major technical blocker in allowing QNetworkAccessManager extensibility in WebKit2 in the future, but that would probably come after we have the base API working well.

Tobias says:

I’m using QWebKit as primary display solution in a client. It displays complex data sets, which get preformatted and displayed in various tables. Why I had chosen this approach is to allow the graphical designer to create a HTML+CSS page template which is just altered by the application.

Some points affect this application, some not. I will write a short explanation – it may be helpful for some to solve the own problems.

First I created a own QNetzworkAccessManager, which introduced a new protocol. I will call it “app:/” here in this example. This protocol is the interface to my client. If the QWebKit component is requesting a page with this protocol, the page is in fact provided by the client application.

Second I created JavaScript libraries which are using JSON to communicate with the client over this protocol. After loading a page (template), a JavaScript function is triggered which is fetching the data via JSON over the “app:/” interface. This does not require any JavaScript injection – it just uses JavaScript and JSON.

At last, and this is the part which I have to change, there is a special Interface I introduce to the script. It is used to trigger some actions in the interface, and to avoid implementing the whole value -> HTML rendering in JavaScript. Also some settings about collapsed sections are read over this interface. But – all this points I’m able to solve by moving these actions into JavaScript. So I can implement the value rendering in JavaScript. And trigger the application actions by JSON messages.

If you are interested to see my solution code, feel free to contact me:
n7p7imgc1hbytr40 at gmail.com

Commenting closed.

Get started today with Qt Download now