Copyright (C) 2002 Stephen Compall, licensed under the GNU Free
Documentation License, version 1.1 or later.
The Network SEE
***************
This is the Network SEE architecture, otherwise known as "Compall
Analytic Engine" or "the 00110110" :), version 2.1.
This version is a bit more like identd, and in fact, I intend to base
much of the code on Serveez, which is a more generic identd with lots
of useful portable-server stuff. That's good because I don't intend to
ever build this thing myself with M$ VC++.
Also, I noticed that the components are somewhat independent, so have
settled on a process-based model, rather than a thread-based model.
The design is inspired by the GNU Hurd. As a matter of fact, a great
deal of the IPC work could be done better with Mach messaging; as the
SEE is a GNU project, it is intended to work best with the GNU system,
so some changes for the Hurd will be in order. Watch.
Version 2.1 is also a little more service-focused, with more thought
given to the request/reply architecture. All connection methods must
have a request/reply method, and all must support URLs for each
one. Services and clients receive bidirectional "put" and "get" pipes.
The daemon, startsee (or whatever), detaches and starts up a few
subprocesses, to handle the different aspects of the SEE:
seedaemons: The portion that automatically starts up webservices
(servers) as they are needed.
see*port: Listens for "remote" requests to connect to webservices
hosted by "this" SEE. Also serves those requests, acting as a proxy.
see*users: Listens for local requests to connect to webservices
hosted by other SEEs.
The *s indicate that multiple daemons are supported. For example, you
may have a SOAP port, that operates on port 80, and accepts both SOAP
and HTTP, passing them to appropriate daemons run in the seedaemons
(or just proxy-forwarding them to apache!); another seeport might be a
Jabber "client" that really acts as a server.
The see*users handle the opposite side of these communications,
sending requests to seeports on other machines.
Note: these systems only strip away the *transport* layer of the
communications protocol used.
seeport
-------
This abstracts away *transport mechanisms*, not the data that goes
over the transport. Services have two ways in which to react to
different uses of mechanisms:
* Ignore it: always use the same data protocols. This gives people
with different connection capabilities different ways in which to
connect to your service.
* Use as metadata: Assume the capabilities of the client are
dependent on the transport used. For example, an XML-RPC or Jabber
talk would send straight XML data and expect a client app to offer
a UI, but an HTTP GET/POST would give HTML back.
Say we have an application that accepts registrations in this XML
form:
Rhys Weatherley
Qt/Embedded
DotGNU Portable .NET
Tree Compiler-Compiler
Unix(TM) V7 Upgrade
One of those "Free Software" geeks...we don't have to pay
him much.
Ok, now you have the data, which is ideally separated from the method
of communication used. (In a perfect world...) But you could wrap this
in a SOAP envelope, use straight HTTP PUT, send to a Jabber ID, get
from a UnixTM named pipe, whatever.
Every time a service on the local machine is requested, the preferred
seeport agreed upon by the two systems accepts a connection. A
connection is defined as a route over which the service can receive
requests and send replies; e.g., HTTP/1.0 can still be considered a
connection in the abstract, but you must reestablish a concrete
network connection for each request/reply series. If a seeport/seeuser
pair so desires (and it's allowed by the protocol), they can maintain
a persistent connection across requests, and even across services and
clients, as long as this sharing doesn't hold up other
services/clients for the benefit of one. So if it's overloaded, make
another connection anyway!
Another agreed-upon seeport also opens up a "connection", but this
time, it is for the client. Why? This is a more P2P way of
interface. It also allows for more intelligent asynchronous servers
and clients. The current model doesn't allow for out-of-order replies
to requests, but a service/client pair can hack these with
service-to-client requests.
Also, this is a way to support different types of "interfaces." For
example, we use the special format over Jabber, but if you get an HTTP
POST of regular form data, you would react differently, sending back a
web page. (And a GET would give you the form!) This is an example of
something the seeport is *not* supposed to strip away, because it
deals with data, not the transport mechanism. It would be feasible to
hack Phoenix to get pages via Jabber as well as HTTP. Your choice.
seeuser
-------
These subprocesses implement the transport mechanisms on the other
side. Because different sorts of transport mechanisms use different
addressing methods, you will need to support each one you wish to
support in your own application. However, beyond that, what you do
over the different transports is up to how the service reacts to
different transport mechanisms.
You will usually want to use RLS to access a new service. This will
have the SEE send you information about which interface it should use;
this information may contain a `service stub', or a portable
executable that contains all the right RPC calls.
seeuser uses URLs to find the seeport to connect to.
When a client is received, the RLS system will set up a connection
through one of these to allow the client program to connect to it
through FIFOs, and the seeuser connects to the seeport through the
given URL. Again, as with seeports, connections can be cached and
queued, as long as one request/reply sequence doesn't get in the way
of others.
Every service that accepts a connection will make a connection back
through seeuser to the client. The reason for this is described in the
previous section.
Why is this a program, not a library? To allow more language
independence. Think of it as a library implemented over IPC.
seedaemons
----------
This runs all the different PEs that are registered as services this
machine should offer. It runs the VMs with the PEs as subprocesses
owned by whatever user insisted that the service should run on this
machine. Also, it informs the service of new client connections.
All connections that originated in the SEE will continue to go through
the SEE; see Use Cases. This operates by way of a private FIFO
collection.
seeauth
-------
The truth is, I don't know much about how the auth system is going to
work. But we do have two requirements:
1. When a service user connects to the service, he/she must send
identity information to the service,
2. and the service must identify itself immediately to the user.
The SEE handles the latter, but only provides a capability for the
stub to easily do the former. This is so the user can trust the
service before loading the PE; the vice versa is unnecessary, because
the stub is provided by the service. The assumption here is that we
will need some kind of persistent state, and that is what the seeauth
component is all about: to hold that state. An actual seeauth system
is not included in this specification, because I really don't know how
to do it.
Finally, because the auth systems are interoperable, the user should
never know what sort of auth system is in use; it should be a
single-item interface.
RLS
***
These are the resource locator strings defined by DotGNU. Their
purpose in SEE's realm is to specify a webservice w/o specifying the
method of communication with the webservice.
i.e., jabber://sirian@theoretic.com/SEE/echo may connect you to the
echo server over jabber, and http://antares.evansville.edu/echo may
connect you over HTTP (either SOAP or straight HTML), but neither
specifies the echo service w/o specifying the method of communication.
rls://csserver.evansville.edu/~sc87/echo is a nonambivalent
specifier. It is good to think of the RLS server as a nameserver, to
point you in the direction of the service. It operates as a transport
mechanism, because it is special that way; we can't do it as a
webservice itself, because webservices need to be resolved.
The rls nameserver that operates on a system port is reserved and
nonduplicatable by users, for obvious reasons; however, Jabber-based
rls nameservers are acceptable, e.g. rls://sirian@theoretic.com and
rls://voridor@jabber.com could be on the same machine, but connected
as different Jabber users. The assumption here is that if there is a
user field in the URI, it is a Jabber address; but if there is no user
field, it operates on the port.
Note that this is abuse of the URI syntax; a proper user field (where
sirian is above) is used to identify the client, not the server. But
this is how it is in Jabber world.
Also! Jabber is not essential to the operation of the SEE. Finally,
the port for the non-Jabber RLS is 39879.
Services
********
The sorts of operations a service must support has changed somewhat
from 2.0.1 to 2.1, in order to focus on the webservice easy-implement
model, while still being truly language-independent (language bindings
for an API don't count as language independence).
A service must accept small XML blocks that give 4 filenames. The
service must open 2 of these for reading, and 2 of these for writing
(which is which will be defined in the XML).
If the service only receives 2 filenames, then the SEE on the service
machine was unable to establish a connection back to the requesting
machine. Therefore, requests to the client are not available. The
service must be ready for this, and kill it if, say, the service
should be P2P.
TODO: lookup stream-function behavior on FIFO so we can decide how to
end a request.
SMALL NOTE ON LATENCY: The other aspect of the SEE <-> service
protocol currently defined is the latency test. This is a string,
"", to which a service program should respond to with
"" if it is ready to immediately accept a request. The SEE is
not required to then provide a request, so do not assert that the next
message will be that request.
If the service takes too long to ack, seedaemons will start up another
copy of the service. Beware: this can result in a race condition, with
an overloaded system leading to the SEE putting even more of a load
on. The answer is obvious: build good services.
Configuration
*************
This doesn't cover the idiot options, whose importance will be
realized during development and added to this spec then. These are the
major elements needed for a working SEE:
* ability for users to add/remove their own webservices, either at
boot-time or on demand, subject to allowance by the
superuser/owner of SEE.
* ability for users to add/remove their own transport-layer handlers
(seeports). This is especially useful in the realm of Jabber,
where each user could have his/her own virtual server.
The owner of the SEE will be able to moderate these requests as he/she
sees fit.
Interface
*********
The SEE itself offers multiple services. I don't know how to describe
them concisely yet; see above. Or better yet, continue.
Asterisk
********
There has been some confusion about the use of the * symbol in
see*user and see*port descriptions, as well as the seedaemons daemon.
The * is a shell glob; there are more than one of each of these, and
they are recognized by their names beginning with what comes before
the * and ending with what comes after the *. As every seeport has a
matching seeuser, startsee uses these names to decide what transports
can be agreed upon for communication. For example, if seeIRCport is
available on the replying machine, but seeIRCuser (of some sort) is
not available on the requesting machine, then the transport `IRC'
can't be used.
I don't know whether there will be different running auth daemons or
not at this time. But for this architecture, auth is very basic, so
there is only one.
The term `seedaemons' refers to a single daemon. seedaemons keeps all
the service daemons (which *are* separate, and are not called
seedaemons) alive, and sends new connections to them, kills, receives
kills, etc.
Use Cases
*********
Yet more abuse of terminology :] Here are some examples of things you
might do in the SEE framework. Things in [] are explanations of why
things are the way they are; they can be skipped.
Also, where the method of IPC isn't defined, there is a ? the options
probably fit into Pipe, FIFO (or Named Pipe) and Socket.
1. I want to communicate with the echo server at
rls://csserver.evansville.edu/sc87/echo! Here's what happens:
a. I run a simple program that connects to seeRLSuser via ? and
gives it the rls
b. seeRLSuser attempts to connect to whatever sort of RLS the URI
specifies, Jabber or Internet port
c. Upon successful connection, the seeRLSport (or
seeJabberRLSport) on the server retrieves information about
different interfaces (e.g., Portable .NET GUI, HTML) to the
service available, and returns them to the client.
d. The client picks one of these methods, and tells the server
about it.
e. The server returns a data package to match what the user
wanted. This could be a URL for a pseudo-website for HTML, for
example, or a full PE. Also returned is information about the
service's virtual identity. [It isn't returned until now because
it may not be necessary; you need auth for a PE because it's a
trusted action, but HTML doesn't require that much trust.]
f. Whatever your return type, you'll probably want to connect to
the actual server now. So you'll do that by starting the PE or
loading the web-URL up in a browser. (seedaemons can analyze the
PE---or URL, you should consider the URLs as PEs---and pass back
a command-line to execute. But it is the user's choice what
interface to use, and it is your runner program's responsibility
to filter certain kinds you don't understand.) The PE will
connect, and using the appropriate protocol, talk to one of the
seeusers on the system. (Theoretically, a webservice author could
skip the seeports, but that would eliminate their benefits:
multiple services on 1 port, for example.)
g. seeRLSport and seeRLSuser will negotiate another transport
that is available on both machines, *and* with the service in
question. You can ignore this; information about the thing to use
will be included in the PE execution line, the PE will open the
files it receives and never know the difference.
h. Anyway, if the echo service at that URL isn't running, the
seedaemons pool will start it, and the seeport will link the
client to the service through itself like this: client <->
seeport <-> service. [The seeport-service link will probably be a
FIFO. We can't use a pipe, because the comm link is created by
seedaemons.]
2. I am a luser, and I want to add my own echo server!
a. Well, since RLS is based on URI, add a path from your home
directory (or if system, from ??). If the "virtual path" you want
through RLS is "/luser/echo", then the path to the file would be
`~/SEE/Services/luser/echo'. This is a Portable Executable that
seedaemons can start.
b. Then, for each interface you are offering, there is a file
named like the service, but suffixed with the type: echo-HTML for
HTML, echo-PNET for CIL executables. HTML should contain a URL
that can be loaded in a web-browser to access a service, and PNET
should contain a PE! These types are not required; you may use
any suffix you like, as long as it is supported by users!
c. You need a service stub. This consists of writing a program
that accepts two filenames. You will open one of these for
reading, and the other for writing. As far as you can see, these
are direct links to the service daemon on the other machine.
d. Finally, you have to tell SEE that the service is
available. This is only for first-time; on each subsequent boot,
SEE will scan this directory for services. Send your username to
see through ?, followed by newline, and it will do a rescan. [If
it receives more than, say, 10 of these requests in a second, it
will stop listening for a whole minute.]
3. I am connected to the echo server, and I want to download the
service and use it on my machine! Well, I don't exactly know how
that's going to work, maybe you can figure it out. After the
download, however, it should probably go into ~/SEE/Private.
4. I wish to create a new transport mechanism!
a. Well, first, you can just implement this in your webservice
like a normal standalone server would. You still get the
convenience of automatic client distribution and automatic
startup, and can still access the other seeports. However, this
just isn't nice.
b. Write a seeport. This involves listening for remote connects,
stripping out a path to the service, giving the path to
seedaemons, and kindly accepting the returned ?, whereas you will
connect to the service daemon and pass messages, first stripping
off the non-service data for the service, then wrapping it back
up when returning to the client.
c. Write a seeuser. This does the same thing as the seeport, but
in mirror. You should be able to take connect-data in the form of
a URI, figure out what to do with it to get to the seeport, and
that's it.
d. Release your transport under the GNU General Public License,
and contribute to the DotGNU SEE project. This is the way your
program can be of the greatest possible use to everyone,
including yourself.
5. I don't want seedaemons to start up a new instance of my service
every time I start up!
a. Here's what happens in a SEE-compatible webservice: it starts
up, and is passed an index and two filenames in stdin, in an XML
block. Here's an example of what that block could be:
/var/SEE/Connections/~sc87/echo.3.in
b. It opens the first of these for reading, and the second of
these for writing, and stores them with the index. This is an
open connection.
c. Communication is always done in blocks; requests and replies,
that is.
d. When the connection is remote-closed, the service reads from
stdin an XML piece telling it the connection is closed. It closes
the files matching the index passed with the XML.
e. When the connection is service-closed, the service writes an
XML piece on stdout just like the one in 5c.
f. Oh, how does SEE know the service is SEE-compatible? All
services are SEE compatible. If they're not, change them!
On a side note, please help the Hurd developers by trying Debian
GNU/Hurd out; it's a chance to
pioneer the next generation of GNU.
Thanks to Rich333 for alternative names to "Network SEE".
**********************************************************************
Older versions:
"Plan to throw one away; you will, anyhow."
Version 2.0.x: Totally ignoring the assertion in use case 5(c),
which was there then, I became obsessed with raw network
communication through the transports. While this is admirable in
terms of librarization, it doesn't work in terms of protocol.
Version 1: A multi-mode, multi-threaded monolithic daemon that could
talk across the network, listen across the network, deliver
service-stubs for services that would be required to listen on their
own ports, and each user would start up his/her own daemon.
Consists of the following:
Runner-listener: listens for the user to run a stub.
Plugin-listener: listens for plugin information requests
ignorer: The top SEE on the machine, listens for coord. between others.
whiner: Other SEEs, inferior to ignorer
server: Listens on network
client: sends requests to other SEEs
pluginAPI: How programs can talk to the plugin-listener.
runningAPI: How programs can talk to the Runner-listener.