Now that the IrCOMM code will be used more widely, there should be an explanation about the current bugs and why crashes occur.
The most common symptoms of something going wrong are:
1. a spontaneous restart of the Newton
2. an error message saying "Sorry a problem has occurred" with a positive, random error number
3. stalled communication between the Newton and the peer device
4. crash of the peer device
5. no connection to the peer device
6. failure of an upper level protocol (PPP, TCP/IP, HTTP)
Symptoms 1 and 2 hint at a mistreatment of some of the internal IrDA data. One example is the treatment of a IrLMP control packet as a data packet. I've tried to make the code robust for these cases but it depends mostly on the packet types sent by the peer device (and thus on the peer's IrDA stack implementation) and the level of my successful reverse engineering of the IrDA and comm layer.
Symptom 3 happens when there is either a protocol error (i.e. one side sends packets the other side doesn't handle in its current state) or, more likely, there are not enough resources on the Newton to deal with the packet. A typical situation is receiving lots of small packets and handling each packet in a time consuming way. The layers of the comm system are not completely decoupled, meaning that an upper layer can effectively block the whole stack. If that happens, the Newton sends RNR packets to the peer, telling that it is currently not able to handle data packets. The cure for a situation like this is to increase the number and size of the IrDA receive buffers (currently, I use seven 512 byte buffers).
Symptom 4 is causes by sending a malformed IrCOMM or TinyTP packet. Although I tried to catch all packets leaving the Newton to transform them to correct IrCOMM packets, there might still be a place somewhere where a packet is passed through an unknown channel (this happened with the PPP driver talking directly to the serial comm tool).
Symptom 5 is usually a sign that the peer device and the Newton disagree with IrDA service to use. Currently, the Newton requests the IrDA:IrCOMM service over TinyTP. Peer devices can however provide IrLPT or something else and as a consequence, deny the connection.
Symptom 6 is unrelated to IrCOMM, but it could still be caused by errors in the Nitro code, e.g. when packet data is corruptet.
Anyway, I'm working on all of these...
This is it: Nitro. It comes with source code so you can take a look at all the stuff I recently wrote about. I will also put the hacked TSerialEndpoint and the TSerialChip implementation on the web at some time.
Nitro for now advertises it's services under the "ircm" label. Any application that wants to make an IrCOMM connection should use this as the service identifier. For applications that haven't been modified (e.g. the NIE Modem and Serial support module), patching the application package by replacing the Unicode string "aser" or "irda" with "ircm" should do the trick (Unicode strings use two-byte characters, in this case the high byte is zero and the low byte is the ASCII code). I've provided a patched NIE Modem & Serial module on Nitro's web page. I'll try LookOut as soon as I receive my USB IrDA dongle.
Eric Schneck (of iTunes Plug-in fame) has had success using Nitro with his Motorola Timeport T280, Earthlink and SimpleMail. I have currently some trouble for establishing a PPP connection with a Nokia 3650 with Radiolinja as the provider, so this calls for some logging capabilities to find out what exactly happens.
Other missing pieces are a working OBEX layer (which shouldn't be to difficult), the associated data handlers and a way to not only initiate but also accept incoming connections.
Task almost complete: I just got a working PPP connection via IrDA from my trusty development MP2100 to a Linux box. Including fetching a web page, so there was also some data going back and forth. There is still some serious testing left (including other devices such as cell phones), and some features like hardware flow control are missing for now, but this is definitely a nice success!
The latest difficulty (as if there haven't been enough) was to find a place where to handle the IrCOMM specific data in the regular IrDA data flow. My first idea was to use the StartOutput and GetComplete funcitons as hooks. But again, the NewtonScript layer caused some trouble. To get it working anyway, the inner workings of the TIrDATool had to be exposed a bit more.
The TIrDATool class is derived from TAsyncSerTool. TAsyncSerTool provides asynchronous serial I/O via two ring buffers (TCircleBuf class), one for reading, one for writing. On the MP2x000, the ring buffers are filled by the Voyager chipset using DMA.
The two scenarios that had to be investigated further are reading and writing. In the TIrDATool class, this involves copying data from the ring buffers to the buffers used by the comm layer (and thus by the NewtonScript layer). These buffers are parameters to the StartInput and StartOutput functions. They are instances of the CBufferList class and are directly linked to the parameters given to the Input and Output functions of the associated NewtonScript endpoint. And this is also the reason why hooking into StartOutput and GetComplete doesn't work quite well: These buffers do not reflect any of the packet-related characteristics of the IrDA data flow, they are merely holding the final streams of data.
So there has to be a way to get into the data flow from the ring buffers to the target buffers. This data transfer is done in the DoOutput and DoInput methods of TIrDATool. In fact, these methods are called for any IR data transfer, including the low level IrLMP and IrLAP packets. Quite handy.
Since there is some protocol overhead and logic for IrDA, the TIrDATool uses a number of helper classes. The most important one is TIrSIR. TIrSIR uses TIrLAPPutBuffer for any outgoing requests and contains a CBufferSegment for the incoming data. The task is now to put the IrCOMM data into the TIrLAPPutBuffer before sending and removing the IrCOMM data from the CBufferSegment after receiving.
Implementing the IrCOMM comm tool seems to go quite well. The first step was to get all the base classes of TIrDATool correctly declared. Lots of digging through C++ vtable declarations... I've got this love/have relationship with C++, at the moment I'm happy how easy it is to reverse engineer and patch it. And how cross-platform and cross-compiler it is.
Much of the added logic goes into the ConnectionComplete, StartOutput and GetComplete functions of the new comm tool. Sending the IrCOMM negotiation packet right after connecting works already, receiving the answer is however still a bit unclear. I hope that the comm tool mechanism lets me schedule something via StartInput. Otherwise, a timer event based mechanism like the one I tried with the serial endpoint event handler might be an alternative.
I guess a bit of clarification regarding the whole comms system on the Newton might be useful at this point. The big picture is still unclear at some points and might even be incorrect here and there, but this is how it is supposed to work...
A communications mechanism is encapsulated in a comm tool. There are several tools on the Newton, including the serial and IrDA tool. They are derived from the class TCommTool. For the IrDA tool, the full inheritance chain is: TUTaskWorld, TCommTool, TSerTool, TAsyncSerTool and finally TIrDATool. Comm tool classes make use of virtual functions to overload needed functionality (good for hacking too). A comm tool is running in its own task.
Comm tools are instantiated via an associated service. For the IrDA tool, this is the TIrDAService class. A service is an implementation of the TCMService protocol. When declaring a service class, the capabilities macro has to be used to state which service this class provides. This is also the parameter used when instantiating an endpoint.
Services are managed by the Communications Manager. The CM is not implemented in a single class but instead via several global functions and classes. The CMGetEndpoint function is the starting point for all endpoint communication. It triggers starting the necessary comm tool task by looking up the requested service and calling the Start method.
Endpoints and comm tools are bound together via an event handler (TEndpointEventHandler). It is important to know that an endpoint can be used by the NewtonScript task and the comm tool task, requiring synchronisation to prevent crashes.
On the lowest level, a comm tool uses hardware specific drivers to perform the actual I/O operations. The drivers can be layered, in the IrDA cases it is the TIrGlue class talking to a serial chip that controls the Ir LED. The TSerialChip protocol is an example for such a driver.
So far, I have been able to implement, replace or modify some of the more interesting classes in this picture: TSerialChip, TSerialEndpoint, TEndpointEventHandler and finally, TIrDATool. This would eventually pave the way for stuff like Bluetooth, SSL or SSH.
Good news first? Ok here they are: The IrCOMM endpoint it working transparently. The last changes included asynchronous sending, missing is still a nice packaging and a configuration method. In this incarnation, it should work for all IrCOMM connections via NewtonScript. And this brings us to the bad news...
The PPP serial driver does something fairly unusual. While it starts out instantiating a regular serial endpoint (which we can then replace with our enhanced IrCOMM endpoint), it then talks to the IrDA comm tool directly. This is unusual for several reasons: The comm tool interface is not visible on the NewtonScript side, it is not really intended to be called directly (only indirectly through the endpoints) and there is already an endpoint available for data transfer. I didn't dig deeper into this, but I suspect that this was either easier for the people implementing the PPP serial driver or that the PPP serial driver is derived in a strange way from the serial or IrDA tool.
Anyway, by talking directly to the comm tool, our IrCOMM endpoint is bypassed and cannot do its magic. The solution: Implement an IrCOMM comm tool derived from the IrDA tool. Paul Guyot has suggested that from the beginning but I had the feeling that a modified endpoint might be a bit easier. There is some uncertainty about the complexity of a comm tool but it might indeed be not too hard...
As it turns out, the NIE Modem & Serial support is a good test case. It uses only asynchronous function calls, including nBind and nConnect. Binding an endpoint is not an issue, but as you might remember, an IrCOMM connect requires some handshaking. Also, the connection options are a bit different.
Two problems need to be solved to make an asynchronous connect work: Find a place where to do the IrCOMM handshake and manage the IrCOMM connection options in a way that it does not break anything else. The option management is indeed quite interesting in that the class that controls a TSerialEndpoint from the NewtonScript side evaluates the result of the connect and passes it on to the NewtonScript layer. For that purpose, it uses the options objects set in the nConnect call. To make the IrCOMM stuff play nice with this mechanism, the IrCOMM connect options have to be removed before controll passes on...
The asynchronous IrCOMM handshake will be handled either before the first send or receive operation. Another possibility is staring it from the endpoint's event handling routine.
Good news: A previously dumb IrDA endpoint now knows all about TinyTP and IrCOMM. This is on the NewtonScript side of things, so if you now instantiate an IrDA endpoint, you can initiate an IrCOMM connection (notice that accepting IrCOMM connections is not done yet).
There are some things to note regarding the current implementation: If the code is active, all IrDA connections are IrCOMM connections. This will could get in the way of existing applications such as beaming. It is a bit difficult to anticipate when to use IrCOMM and when to use IrLMP (the Newton's default). For that reason, I will add a preference setting to turn on IrCOMM manually.
It is also possible to talk to another IrCOMM peer directly using the C++ interface. This is easier because it uses synchronous communication. I'll add a NewtonScript interface to these functions as well.
One big piece of the puzzle is however currently missing: how to make existing applications that use a plain serial connection (i.e. not IrDA) to use IrCOMM? I was hacking the NIE Modem/Serial support package a bit, changing the instantiation parameters from 'aser' (serial connection) to 'irda' (IrDA connection) and the physical location of the port from 'extr' (external port) to 'infr' (IR led). It didn't really do the job, but some parts of the IrCOMM code came already to life. Time to set some breakpoints and find out what's going on!
A pretty nasty problem showed up that required going deeper into the endpoint mechanism: Asynchronous data transfer. It is needed because the NewtonScript world works based on the execution of small, non-blocking pieces of code that either return quickly or kick off some sort of background task (such as the whole endpoint infrastructure). One of the central points of data handling in a NewtonScript-based communication application is therefore the InputScript that processes incoming data received by an endpoint. There is no blocking read. This is something to get used to and confuses programmers regularily. For that reason, more "recent" example code from Apple introduced finite state machines which deal with these asynchronous events on the NewtonScript side. In the C++ world, most things are however pretty standard (one reason why a rewrite of parts of NHttpLib, Raissa and MAD Max in C++ could be very promising).
The problem in the IrCOMM case is that data reception is not handled by the nRcv function in the endpoint but in the endpoint's event handler code. I hadn't looked into that mechanism before, but fortunately, the event handling code is part of the TEndpoint protocol interface and can be overridden. In the process of mapping the mechanism, some of the classes and data structures had to be reverse engineered as well. It was not too painful - usually, it is possible to gain some understanding by looking at the C++ constructor and the initialization code. But nevertheless, I think that reverse engineering data structures is more tedious than trying to understand code.
Asynchronous reception of data works now. With one little problem left: sending a packet when handling an incoming packet doesn't seem to work correctly. It is needed to advance TinyTP credits to the peer. The situation becomes critical when a large amount of data is sent from the peer. If the peer doesn't get any credit, it just stops. The solution to this is to either find a way to send data properly in the event handler or set up some sort of idle handler in a different thread that checks the remote credit from time to time and sends a packet if needed. Time to look at more event handler code!