557 – Advanced kernel-level clustering concept outline

Bug 557 - Advanced kernel-level clustering concept outline

Summary: Advanced kernel-level clustering concept outline

Status:	REJECTED INVALID

Alias:	None

Product:	Process Management
Classification:	Unclassified
Component:	Other (show other bugs)
Hardware:	i386 Linux

Importance:	P2 normal
Assignee:	Bugme Janitors Team

URL:
Keywords:

Depends on:
Blocks:

Reported:	2003-04-08 13:16 UTC by mlmoser
Modified:	2003-04-08 14:02 UTC (History)
CC List:	0 users

See Also:
Kernel Version:	Future
Subsystem:
Regression:	---
Bisected commit-id:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description mlmoser 2003-04-08 13:16:44 UTC

Distribution:  All future
Hardware Environment:  All must be of compatible architecture
Software Environment:  Transparent alterations to process and thread managment
Problem Description:  I have a bunch of s****y computers

Steps to reproduce:  Upgrade and get new machines over the years

What I wish to propose is a system called RAIP-U (Redundant Array of
Interdependant Processing Units, "rape you").  This allows a fault tolerant
(semi-ft?) method of doing parallel processing using many boxes and RAM.  I will
outline it lightly here; more detail may appear later but I don't know what goes
inside process managment.

Alterations to the fundamental process and thread managment are obviously
needed; the same exact interface has to be able to check the process ID and
thread handle and find where it goes and who has the backups.

First thing is the easiest, the connectivity.  The protocols should work over
TCP/IP, direct (crossover) connections with a special protocol (yes, with NICs,
but a switch/hub can't be used), serial line (serial-serial or parallel ports or
such things, usb-usb even), or other things you can think of.  It should be
tight, to reduce latencey.  This is the most important part.

Next is what is communicated and how things are handled.  I will break this
down.  Each part should be independant as much as possible (so RAM and HW
sharing can be done, but process load sharing can be off).


PROCESSES

The main thing is process transfers.  A central master sends jobs out to one of
the slaves, chosen by its stats and loads.  This job is a thread or a process. 
The job always has the slave/job ID of its owners (related processes, ones that
forked it or something, or which process the thread is a part of) and related
threads, as well as the Master/job ID of the host process (and the Master
system).  When these slaves make a new thread or fork a process, they make their
own decisions, pass it to another slave or back to the master, and still give
the same data (slaves never become masters).

When processes have to send data, a buffering scheme should watch how far along
it goes without a break in the data sending.  For example, if it sends data and
then does N instructions, halts, has already put out X bytes of data to send
through IPC, or otherwise seems to need to send the data out NOW, the data is
sent.  It goes to whatever machine the process or thread it goes to is on.  No
buffering is done at all if the target job is on the same machine; IPC works
EXACTLY the same if at all possible.

The Master always is the physical hardware and software that the job is on. 
Always.  If the job asks about its hardware or tries to write to hardware or a
driver, this data is sent to the Master, which handles it and responds as if the
job is on the Master.

If a job seems to need extensive communication with another job or with the
master, it is relocated to the machine that has that job or to the Master.  If
it seems particularly... pointless... it may be relocated to a Slave IF and only
if the Master is starting to reach 95%+ CPU usage.

If a job sends or receives data relatively infrequently, it may be subject to
RAIP-U fault tolerance.  If it appears safe and easy, you could shove a copy of
the job in its current state on another Slave or back on the Master.  Then, if
that job blinks out, the related jobs and the Master would cause a switchover to
a machine with a fault tolerance copy.

An API to this should be given, allowing the process to disable auto-relocation,
auto-fault tolerance, and automatic buffering.  This will allow the jobs to
handle these most efficiently.  For example, a music program may thread its
mixing thread, place it on a slave, give itself a large buffer on output, and
then the mixer may make a fault-tolerance backup.  Then, if the machine with the
mixer dies, the job that handles it will be notified (via API call with a
previously passed function pointer), and the job will be able to readjust to
resend any sent data that wasn't processed and returned, then tell the mixer job
to make another RAIP-U FT backup.

As a final note, it may be possible to use other machines with similar, more
advanced hardware to get around h/w incompatibility (i.e. Athlon's 3DNOW! when
you just have a 386 on the Master).



RAM

Oh there's more to this thing than just process sharing.  The next module is
inter-boxen RAM.  This is an extension on virtual RAM.  Basically, no you don't
map RAM based on job, you map RAM as ... well... blocks of RAM.  So just like a
30 MB partition on your HDD is a virtual RAM block, so is a 200 MB segment of
the 4 gig of RAM on Slave A.  You could use a set of machines (diskless even) as
RAM, even to the point that you have just whatever the OS needs and then the
rest as shared RAIP-U RAM.

You should be able to use this to indirectly access virtual RAM, but only the
Master manages this, lest the stupid Slaves make a loop (Give me some of your
RAM as my VRAM, which is in turn some of your VRAM that you got from someone
else's RAM...).



HARDWARE

This is the scariest part.  Each Slave holds a definition of its own hardware,
if it has drivers for it or can otherwise write to/read from it.  This allows
the Master to map out devs for the Slaves' devices (burners, HDD's, etc) in
/dev, and (somehow) mess with the drivers to communicate over the RAIP-U
connection to share, say, HARD DISKS!  (YES!)  Some 400 SCSI disks maybe?  80
USB2.0 ports?



Netware/RAIP-U

The kernel should be able to send a copy of itself to act as a dedicated RAIP-U
server if it gets a Netware or RAIP-U network boot request (diskless machines
using bootp?).  This should work, provided these are of similar architecture.




Go think about it, I don't know if I missed this.  Bye.

--Bluefox

Note You need to log in before you can comment on or make changes to this bug.