This discussion is about the challenges posed by a hybrid radio / Internet network, and the management thereof.
I will start by giving some background, then throw it open for your input at the end.
This debate is not about whether or not we should connect packet radio nodes together via the Internet. It’s a fait accompli - the hybrid network exists, and it won’t go away.
It certainly works, and it’s better than what we had before, but it’s not ideal. The question is, can we agree on how to make it work better?
And if it’s not future proof, can we start laying the foundations for something that will be?
In the old days, before Internet linking, node tables were never full. About 150 nodes was the most ever seen, and a good proportion of them were not connectible.
The packet network topology was more or less defined by the terrain. Where you had favourable terrain you tended to get links, and vice versa. As a result, some areas, like Wales, the south-west and Scotland were very badly served. You tended to route traffic in the expected compass direction, and there was a definite horizon to the network seen from each node.
The network was slow and unreliable, and real time uses such as chat were not really feasible. As we moved more and more BBS traffic through it, it got worse. Then we started to lose sites due to commercial interests.
After the novelty had worn off, the users got dissatisfied and went away to pursue other hobbies such as building web sites and arguing in newsgroups. BBS’s lost users and closed down, and with nothing to do, many nodes closed too. That left just a hard core of more dedicated users and sysops who believed Packet had a future.
A couple of years ago, nodes started to link together using the Internet as virtual radio links. It was just a trickle at first, but it’s turning into a flood.
As a result we now have huge nodes tables, and it’s quite easy to have 2000 nodes in a table. A much larger proportion of these are connectible, quickly and reliably.
The network topology is no longer defined by the underlying geography. Providing you can get an Internet connection, even if you’re the other side of the world, you can be part of the network. This has brought some neglected areas back to life. People are dusting off their TNC’s and using Packet again.
The faster network enables real time uses such as chat, and longer circuits are possible. As a result of all this, users are a lot more satisfied, and the packet experience is more rewarding.
We must capitalise on this, and keep up the momentum. We mustn’t let poor network management destroy this opportunity to revitalise Packet.

This is a map, drawn a few months ago by G0SYR, of the UK packet node network. I’m afraid it doesn’t show up very well, but the red lines are the RF connections and the blue ones are the Internet ones. The user ports are not shown. It’s immediately striking how much internet linking is going on, and it’s getting more all the time.
You can see nodes, for instance in the south-west (point) and north east (point), which rely on the Internet for their connectivity. You can also see isolated bits of radio network, "glued together" by the Internet.
But all this new connectivity is not without it’s problems, the main ones being those I’ve listed. I’ll talk about each one in more detail shortly...
The huge nodes tables are a problem, as is the volume of routing traffic.
Parts of the RF network are becoming invisible from some nodes.
The number of overseas nodes is also perceived by some as a problem.
The presence of nodes using (X)net software is causing us some headaches, and the older node software gives us some management issues too.
Finally, there is greater potential for routing loops.
What’s so bad about large nodes tables?
Well, for a start, they use lots of memory, which makes life difficult for dinosaurs like me who are still programming for DOS. But more importantly, the nodes broadcasts use lots of precious bandwidth - I will talk more about that shortly.
Users can’t cope with big tables. Anything more than a 25 line screen full, or about 100 nodes at most, is too much information for the brain to take in, and we have to be mindful of users who haven’t got scrollback.
Over the average 1200 baud shared simplex user link, managing 50 characters per second with a following wind, it would take 40 seconds to download a 100-node table, but nearly 14 minutes for a 2000-node table. Just imagine if someone issued the N command as an innocent enquiry. Worse, if they did it via the network it would tie up bandwidth for a very long time. And worse still, if they maliciously or innocently issued a string of "N" commands... well that would be an effective denial of service situation.
Finally, some of the older software has problems handling lots of nodes, and I’ll come back to this point.
If you have large node tables you’ll be exchanging a lot of routing information such as nodes broadcasts or INP3 traffic. This can sometimes be too much for a poor 1200 baud simplex link, although good links and full duplex ones can cope reasonably well. Your rigs spend a lot more time transmitting, so everything could get hotter.
Some sysops have Internet traffic quotas. They may have to pay a lot more for overseas traffic, or might be penalised if they exceed a monthly allowance. They obviously have a keen interest in keeping the unnecessary traffic to a minimum.
An important point is that, in order to avoid network instability and routing loops, routing information must propagate through the network very quickly. If not, the network could go into a state of oscillation, with routes constantly flipping around. Thus the routing information needs to have priority over other traffic, and naturally this delays the other traffic. The delays may be imperceptible if the network is stable, or they may get excessive when nodes are being switched on and off.
Some people welcome the presence of overseas nodes in our tables, others consider them an unwelcome nuisance.
Unless you have friends in Japan, you’re statistically unlikely to want to connect to a Japanese node. We have most in common with people in our own country and a lot less in common with people abroad, due to language and cultural differences.
If you aren’t interested in them, they are just visual clutter in the nodes tables, and if the tables are size limited, they are taking up valuable space which could be occupied by more relevant nodes.
Although foreign nodes are only of interest to a few users, we should do our best not to exclude the minority in our attempts to please the majority, and there’s another point - these nodes can sometimes provide useful connectivity to nodes we *do* want. Parts of the UK network are for example glued together via nodes in New Zealand.
Consider a node with 40 peer links, which is not uncommon. With a nodes table table limited (by any means) to 100 nodes, almost half the table is occupied by those peers.
Add in their associated chat servers, BBS’s, PMS’s, DX Clusters and so on, and you could have most, if not all the table taken up by the callsigns of immediate neighbour systems. That leaves no room for anything more than 1 hop away. You may appear in other people’s tables, but they don’t appear in yours.
It may not get that bad - you may be capping the nodes table, or controlling its size using route quality, MINQUAL, round trip time or whatever.
But the RF nodes tend to have long Round Trip Times and low qualities, so they tend to get excluded from the list in favour of faster, higher quality ones. Nodes just a few miles away by radio are then more difficult to find than the ones much further away, but reached via the Internet. You get black holes in the network.
I don’t want to start a software war, but this foreign software, and the some of the sysops who run it, have caused us quite a few headaches, and no doubt we’ve done the same to them. In the old days it wouldn’t have been much of an issue, as they had their network and we had ours, the two being largely separate. But increasing connectivity has caused us to rub shoulders a lot more often.
The first problem we noticed was that the default settings caused netrom nodes broadcasts every 10 minutes. These were duplicated in INP3 broadcasts at the same interval, plus updates every 10 seconds. However, the volume of routing information traffic wasn’t the issue, because these were Internet links. Nor were the corrupt frames they were sending. What really caused a problem was the fact that their obsolescence counters were being decremented 6 times faster than ours, so our nodes would drop out of their tables in the interval between our hourly NetRom broadcasts. This might not sound like our problem, but those Xnet nodes were connected to other nodes in our system, and our traffic was being routed through them. When our nodes dropped out of their tables, circuits would hang.
This happened quite frequently due to another feature of Xnet. As you may know, they use both time and quality-based routing metrics. But rather than keep these two completely separate, they attempt to equate them. When Xnet receives only Netrom qualities they are converted to fictitious target time and hop counts. These get modified through the Xnet network, and where they broadcast back to Netrom, a fictitious quality is generated from the time and hop counts. The net result is that the quality emerges a lot higher than if it had degraded through a pure netrom network, and this bends all the routing towards the Xnet network. You can’t simply drop the route quality, because you end up routing to that neighbour via a third party.
Good old BPQ has limit of around 180 nodes in typical configuration, possibly less if you have lots of ports. The Net X1J seems to have a practical limit of just over 100.
If presented with more nodes than they can fit in the table, the nodes tend cycle in and out of table, causing havoc to the network. TheNet may actually crash under thse conditions, or at least behave very strangely, so I am told.
The older software doesn’t use or facilitate time-domain routing metrics, so the presence of one of these nodes blocks the flow of such information, breaking up the time-based network. Many of the BPQ ones don’t support IP routing, so we have to tunnel through them.
Certain Telnet clients send each character as it is typed, causing commands to be split over several packets. BPQ doesn’t like this, treating each packet as a separate command rather than waiting for the end of line character, and it makes them unusable from Telnet. And there are certain sequences which are known to crash BPQ stone dead every time.
BPQ’s static parameters can make it seem very tardy at times on the modern fast network.
There are other problems, but I don’t wish to denigrate old software or dissuade anyone from using it. It was the state of the art when it was written, and in many cases is perfectly adequate for the job.
In a well-connected network there are bound to be multiple routes through it, many of which will lead back to the start point. That’s the mark of a good robust network. However it also makes routing loops more likely. A routing loop is the case where a packet follows a circular path through the network, never reaching its target. It results in level 4 connection failures.
There are several contributory factors, but the main cause is delay in propagating routing changes. In a network of older software, if a node dies, or goes invisible, it takes several hours per hop for the node to disappear from all the tables. As this process is asynchronous, the routing can flip dramatically, and if there is lots of connectivity this can end up in a loop.
Links which bridge large sections of the network, such as the typical Internet link, make loops more likely, by connecting nodes whose routing information differs widely in freshness. If the links are good, the quality doesn’t drop off quickly, and the node is visible over a wider area.
Luckily, modern software propagates changes a lot more quickly, and attempts to prevent loops occurring. But there is still a lot of old software in the network.
We’ve alleviated many of the problems by using routing metrics based on round trip time rather than netrom quality These give a true and up to date measure of the goodness of a route, and the sysops can’t mess with them. But they can’t be used everywhere, because there are some poor routes and old software in the network.
Using these metrics, we can restrict the number of hops accepted from each neighbour. Thus on the overseas nodes we can say we only want 1 hop, which is the neighbour itself, but none of his neighbours. We’ve tried limiting the overseas visibility by setting MINQUAL equal to the route quality, but it wasn’t so successful.
When the foreign nodes creep in using Netrom qualities instead of time-based metrics, an effective way to restrict their numbers is to degrade the quality according to callsign. We can do this in such as way as to favour the countries according to their likelihood of anyone wanting to connect there.
Nodes tables were capped for various reasons, usually to save memory, cut the volume of broadcasts and make life easier for users. But as I said previously, it can cause problems and I don’t think it’s a good long-term solution.
Internet links *are* very high quality compared to radio links, and ought to be given higher netrom qualities. At first we used 240, to prevent everything going via Xnet, but that caused too many Internet nodes in the tables. So we reduced the qualities to manage that. It’s not ideal, and we need to agree the figures.
Taking it to the extreme, you can actually set the Xrouter Internet link qualities at or below MINQUAL, so they don’t appear in the RF network at all. This partitions the networks, with the Xrouters acting as the gateways between them.
I’m now coming to the firm conclusion that we have too much connectivity, and should limit it, so I haven’t been replacing Internet links when they die.
This is a slide on which I jotted a few random ideas while developing this presentation, but quite frankly I ran out of time, so I never got chance to finish it.
One of the ideas was to try and make netrom quality reflect the length of the link, so that the distant nodes don’t displace the more local ones. It would be very difficult to do, and I don’t think it’s a good idea.
It bugs me that some sysops distort quality to manipulate the routing, such as favouring RF links over Internet ones. This isn’t doing the users any service. I have always felt that quality should be a real measure, or a decent estimate, of the goodness of a link, and we should all agree on the figures to use.
If we don’t use a common standard for quality, the routing is distorted, and traffic ends up going where it shouldn’t.
I built dynamic route quality measurement and adjustment into Xrouter, but it’s a dangerous tool because NetRom information propagates too slowly. It’s best used as a guide, then lock the quality in.
Where do we go from here? Is the current network future proof, or is it doomed to eventual failure?
My own view is that NetRom is struggling, and after nearly 20 years of it we should probably be thinking about something to replace it. That won’t happen overnight, and probably not even in my lifetime!
I can’t really see how a single, un-partitioned mesh network can cope with ever-increasing connectivity, and I feel it should be partitioned into self-contained and self-administered subnets, with carefully controlled interconnections between them. Perhaps the UK needs to be more insular, entrusting overseas connectivity to just a few strategic gateways.
Lastly, here are a few suggestions for discussion. You probably have many others.
Firstly, is it desirable, necessary or feasible to coordinate the network? There’s a lot of duplicated effort, so should we attempt to make proper use of it?
Should we sysops agree and publish some guidelines on how to properly set up an internet-linked node?
Should we ask sysops to restrict the number of Internet neighbours they have, especially overseas ones. Can those links be redeployed in a more useful way, perhaps to strengthen the UK network?
There currently seems to be total disagreement over Internet route qualities, in fact route qualities in general. Can we reach some agreement, or is it impossible?