Copyright (C) 2002 Stephen Compall, licensed under the GNU Free Documentation License, version 1.1 or later. The Network SEE *************** This is the Network SEE architecture, otherwise known as "Compall Analytic Engine" or "the 00110110" :), version 2.1. This version is a bit more like identd, and in fact, I intend to base much of the code on Serveez, which is a more generic identd with lots of useful portable-server stuff. That's good because I don't intend to ever build this thing myself with M$ VC++. Also, I noticed that the components are somewhat independent, so have settled on a process-based model, rather than a thread-based model. The design is inspired by the GNU Hurd. As a matter of fact, a great deal of the IPC work could be done better with Mach messaging; as the SEE is a GNU project, it is intended to work best with the GNU system, so some changes for the Hurd will be in order. Watch. Version 2.1 is also a little more service-focused, with more thought given to the request/reply architecture. All connection methods must have a request/reply method, and all must support URLs for each one. Services and clients receive bidirectional "put" and "get" pipes. The daemon, startsee (or whatever), detaches and starts up a few subprocesses, to handle the different aspects of the SEE: seedaemons: The portion that automatically starts up webservices (servers) as they are needed. see*port: Listens for "remote" requests to connect to webservices hosted by "this" SEE. Also serves those requests, acting as a proxy. see*users: Listens for local requests to connect to webservices hosted by other SEEs. The *s indicate that multiple daemons are supported. For example, you may have a SOAP port, that operates on port 80, and accepts both SOAP and HTTP, passing them to appropriate daemons run in the seedaemons (or just proxy-forwarding them to apache!); another seeport might be a Jabber "client" that really acts as a server. The see*users handle the opposite side of these communications, sending requests to seeports on other machines. Note: these systems only strip away the *transport* layer of the communications protocol used. seeport ------- This abstracts away *transport mechanisms*, not the data that goes over the transport. Services have two ways in which to react to different uses of mechanisms: * Ignore it: always use the same data protocols. This gives people with different connection capabilities different ways in which to connect to your service. * Use as metadata: Assume the capabilities of the client are dependent on the transport used. For example, an XML-RPC or Jabber talk would send straight XML data and expect a client app to offer a UI, but an HTTP GET/POST would give HTML back. Say we have an application that accepts registrations in this XML form: Rhys Weatherley Qt/Embedded DotGNU Portable .NET Tree Compiler-Compiler Unix(TM) V7 Upgrade One of those "Free Software" geeks...we don't have to pay him much. Ok, now you have the data, which is ideally separated from the method of communication used. (In a perfect world...) But you could wrap this in a SOAP envelope, use straight HTTP PUT, send to a Jabber ID, get from a UnixTM named pipe, whatever. Every time a service on the local machine is requested, the preferred seeport agreed upon by the two systems accepts a connection. A connection is defined as a route over which the service can receive requests and send replies; e.g., HTTP/1.0 can still be considered a connection in the abstract, but you must reestablish a concrete network connection for each request/reply series. If a seeport/seeuser pair so desires (and it's allowed by the protocol), they can maintain a persistent connection across requests, and even across services and clients, as long as this sharing doesn't hold up other services/clients for the benefit of one. So if it's overloaded, make another connection anyway! Another agreed-upon seeport also opens up a "connection", but this time, it is for the client. Why? This is a more P2P way of interface. It also allows for more intelligent asynchronous servers and clients. The current model doesn't allow for out-of-order replies to requests, but a service/client pair can hack these with service-to-client requests. Also, this is a way to support different types of "interfaces." For example, we use the special format over Jabber, but if you get an HTTP POST of regular form data, you would react differently, sending back a web page. (And a GET would give you the form!) This is an example of something the seeport is *not* supposed to strip away, because it deals with data, not the transport mechanism. It would be feasible to hack Phoenix to get pages via Jabber as well as HTTP. Your choice. seeuser ------- These subprocesses implement the transport mechanisms on the other side. Because different sorts of transport mechanisms use different addressing methods, you will need to support each one you wish to support in your own application. However, beyond that, what you do over the different transports is up to how the service reacts to different transport mechanisms. You will usually want to use RLS to access a new service. This will have the SEE send you information about which interface it should use; this information may contain a `service stub', or a portable executable that contains all the right RPC calls. seeuser uses URLs to find the seeport to connect to. When a client is received, the RLS system will set up a connection through one of these to allow the client program to connect to it through FIFOs, and the seeuser connects to the seeport through the given URL. Again, as with seeports, connections can be cached and queued, as long as one request/reply sequence doesn't get in the way of others. Every service that accepts a connection will make a connection back through seeuser to the client. The reason for this is described in the previous section. Why is this a program, not a library? To allow more language independence. Think of it as a library implemented over IPC. seedaemons ---------- This runs all the different PEs that are registered as services this machine should offer. It runs the VMs with the PEs as subprocesses owned by whatever user insisted that the service should run on this machine. Also, it informs the service of new client connections. All connections that originated in the SEE will continue to go through the SEE; see Use Cases. This operates by way of a private FIFO collection. seeauth ------- The truth is, I don't know much about how the auth system is going to work. But we do have two requirements: 1. When a service user connects to the service, he/she must send identity information to the service, 2. and the service must identify itself immediately to the user. The SEE handles the latter, but only provides a capability for the stub to easily do the former. This is so the user can trust the service before loading the PE; the vice versa is unnecessary, because the stub is provided by the service. The assumption here is that we will need some kind of persistent state, and that is what the seeauth component is all about: to hold that state. An actual seeauth system is not included in this specification, because I really don't know how to do it. Finally, because the auth systems are interoperable, the user should never know what sort of auth system is in use; it should be a single-item interface. RLS *** These are the resource locator strings defined by DotGNU. Their purpose in SEE's realm is to specify a webservice w/o specifying the method of communication with the webservice. i.e., jabber://sirian@theoretic.com/SEE/echo may connect you to the echo server over jabber, and http://antares.evansville.edu/echo may connect you over HTTP (either SOAP or straight HTML), but neither specifies the echo service w/o specifying the method of communication. rls://csserver.evansville.edu/~sc87/echo is a nonambivalent specifier. It is good to think of the RLS server as a nameserver, to point you in the direction of the service. It operates as a transport mechanism, because it is special that way; we can't do it as a webservice itself, because webservices need to be resolved. The rls nameserver that operates on a system port is reserved and nonduplicatable by users, for obvious reasons; however, Jabber-based rls nameservers are acceptable, e.g. rls://sirian@theoretic.com and rls://voridor@jabber.com could be on the same machine, but connected as different Jabber users. The assumption here is that if there is a user field in the URI, it is a Jabber address; but if there is no user field, it operates on the port. Note that this is abuse of the URI syntax; a proper user field (where sirian is above) is used to identify the client, not the server. But this is how it is in Jabber world. Also! Jabber is not essential to the operation of the SEE. Finally, the port for the non-Jabber RLS is 39879. Services ******** The sorts of operations a service must support has changed somewhat from 2.0.1 to 2.1, in order to focus on the webservice easy-implement model, while still being truly language-independent (language bindings for an API don't count as language independence). A service must accept small XML blocks that give 4 filenames. The service must open 2 of these for reading, and 2 of these for writing (which is which will be defined in the XML). If the service only receives 2 filenames, then the SEE on the service machine was unable to establish a connection back to the requesting machine. Therefore, requests to the client are not available. The service must be ready for this, and kill it if, say, the service should be P2P. TODO: lookup stream-function behavior on FIFO so we can decide how to end a request. SMALL NOTE ON LATENCY: The other aspect of the SEE <-> service protocol currently defined is the latency test. This is a string, "", to which a service program should respond to with "" if it is ready to immediately accept a request. The SEE is not required to then provide a request, so do not assert that the next message will be that request. If the service takes too long to ack, seedaemons will start up another copy of the service. Beware: this can result in a race condition, with an overloaded system leading to the SEE putting even more of a load on. The answer is obvious: build good services. Configuration ************* This doesn't cover the idiot options, whose importance will be realized during development and added to this spec then. These are the major elements needed for a working SEE: * ability for users to add/remove their own webservices, either at boot-time or on demand, subject to allowance by the superuser/owner of SEE. * ability for users to add/remove their own transport-layer handlers (seeports). This is especially useful in the realm of Jabber, where each user could have his/her own virtual server. The owner of the SEE will be able to moderate these requests as he/she sees fit. Interface ********* The SEE itself offers multiple services. I don't know how to describe them concisely yet; see above. Or better yet, continue. Asterisk ******** There has been some confusion about the use of the * symbol in see*user and see*port descriptions, as well as the seedaemons daemon. The * is a shell glob; there are more than one of each of these, and they are recognized by their names beginning with what comes before the * and ending with what comes after the *. As every seeport has a matching seeuser, startsee uses these names to decide what transports can be agreed upon for communication. For example, if seeIRCport is available on the replying machine, but seeIRCuser (of some sort) is not available on the requesting machine, then the transport `IRC' can't be used. I don't know whether there will be different running auth daemons or not at this time. But for this architecture, auth is very basic, so there is only one. The term `seedaemons' refers to a single daemon. seedaemons keeps all the service daemons (which *are* separate, and are not called seedaemons) alive, and sends new connections to them, kills, receives kills, etc. Use Cases ********* Yet more abuse of terminology :] Here are some examples of things you might do in the SEE framework. Things in [] are explanations of why things are the way they are; they can be skipped. Also, where the method of IPC isn't defined, there is a ? the options probably fit into Pipe, FIFO (or Named Pipe) and Socket. 1. I want to communicate with the echo server at rls://csserver.evansville.edu/sc87/echo! Here's what happens: a. I run a simple program that connects to seeRLSuser via ? and gives it the rls b. seeRLSuser attempts to connect to whatever sort of RLS the URI specifies, Jabber or Internet port c. Upon successful connection, the seeRLSport (or seeJabberRLSport) on the server retrieves information about different interfaces (e.g., Portable .NET GUI, HTML) to the service available, and returns them to the client. d. The client picks one of these methods, and tells the server about it. e. The server returns a data package to match what the user wanted. This could be a URL for a pseudo-website for HTML, for example, or a full PE. Also returned is information about the service's virtual identity. [It isn't returned until now because it may not be necessary; you need auth for a PE because it's a trusted action, but HTML doesn't require that much trust.] f. Whatever your return type, you'll probably want to connect to the actual server now. So you'll do that by starting the PE or loading the web-URL up in a browser. (seedaemons can analyze the PE---or URL, you should consider the URLs as PEs---and pass back a command-line to execute. But it is the user's choice what interface to use, and it is your runner program's responsibility to filter certain kinds you don't understand.) The PE will connect, and using the appropriate protocol, talk to one of the seeusers on the system. (Theoretically, a webservice author could skip the seeports, but that would eliminate their benefits: multiple services on 1 port, for example.) g. seeRLSport and seeRLSuser will negotiate another transport that is available on both machines, *and* with the service in question. You can ignore this; information about the thing to use will be included in the PE execution line, the PE will open the files it receives and never know the difference. h. Anyway, if the echo service at that URL isn't running, the seedaemons pool will start it, and the seeport will link the client to the service through itself like this: client <-> seeport <-> service. [The seeport-service link will probably be a FIFO. We can't use a pipe, because the comm link is created by seedaemons.] 2. I am a luser, and I want to add my own echo server! a. Well, since RLS is based on URI, add a path from your home directory (or if system, from ??). If the "virtual path" you want through RLS is "/luser/echo", then the path to the file would be `~/SEE/Services/luser/echo'. This is a Portable Executable that seedaemons can start. b. Then, for each interface you are offering, there is a file named like the service, but suffixed with the type: echo-HTML for HTML, echo-PNET for CIL executables. HTML should contain a URL that can be loaded in a web-browser to access a service, and PNET should contain a PE! These types are not required; you may use any suffix you like, as long as it is supported by users! c. You need a service stub. This consists of writing a program that accepts two filenames. You will open one of these for reading, and the other for writing. As far as you can see, these are direct links to the service daemon on the other machine. d. Finally, you have to tell SEE that the service is available. This is only for first-time; on each subsequent boot, SEE will scan this directory for services. Send your username to see through ?, followed by newline, and it will do a rescan. [If it receives more than, say, 10 of these requests in a second, it will stop listening for a whole minute.] 3. I am connected to the echo server, and I want to download the service and use it on my machine! Well, I don't exactly know how that's going to work, maybe you can figure it out. After the download, however, it should probably go into ~/SEE/Private. 4. I wish to create a new transport mechanism! a. Well, first, you can just implement this in your webservice like a normal standalone server would. You still get the convenience of automatic client distribution and automatic startup, and can still access the other seeports. However, this just isn't nice. b. Write a seeport. This involves listening for remote connects, stripping out a path to the service, giving the path to seedaemons, and kindly accepting the returned ?, whereas you will connect to the service daemon and pass messages, first stripping off the non-service data for the service, then wrapping it back up when returning to the client. c. Write a seeuser. This does the same thing as the seeport, but in mirror. You should be able to take connect-data in the form of a URI, figure out what to do with it to get to the seeport, and that's it. d. Release your transport under the GNU General Public License, and contribute to the DotGNU SEE project. This is the way your program can be of the greatest possible use to everyone, including yourself. 5. I don't want seedaemons to start up a new instance of my service every time I start up! a. Here's what happens in a SEE-compatible webservice: it starts up, and is passed an index and two filenames in stdin, in an XML block. Here's an example of what that block could be: /var/SEE/Connections/~sc87/echo.3.in /var/SEE/Connections/~sc87/echo.3.out b. It opens the first of these for reading, and the second of these for writing, and stores them with the index. This is an open connection. c. Communication is always done in blocks; requests and replies, that is. d. When the connection is remote-closed, the service reads from stdin an XML piece telling it the connection is closed. It closes the files matching the index passed with the XML. e. When the connection is service-closed, the service writes an XML piece on stdout just like the one in 5c. f. Oh, how does SEE know the service is SEE-compatible? All services are SEE compatible. If they're not, change them! On a side note, please help the Hurd developers by trying Debian GNU/Hurd out; it's a chance to pioneer the next generation of GNU. Thanks to Rich333 for alternative names to "Network SEE". ********************************************************************** Older versions: "Plan to throw one away; you will, anyhow." Version 2.0.x: Totally ignoring the assertion in use case 5(c), which was there then, I became obsessed with raw network communication through the transports. While this is admirable in terms of librarization, it doesn't work in terms of protocol. Version 1: A multi-mode, multi-threaded monolithic daemon that could talk across the network, listen across the network, deliver service-stubs for services that would be required to listen on their own ports, and each user would start up his/her own daemon. Consists of the following: Runner-listener: listens for the user to run a stub. Plugin-listener: listens for plugin information requests ignorer: The top SEE on the machine, listens for coord. between others. whiner: Other SEEs, inferior to ignorer server: Listens on network client: sends requests to other SEEs pluginAPI: How programs can talk to the plugin-listener. runningAPI: How programs can talk to the Runner-listener.