Proposed Keen Bulletin Board design. ---------------------------------------------------------------------- Administrivia. ==================== This is a proposed design for the Keen Bulletin Board (KBB) system. This document was written by Simon Fowler, incorporating ideas from Garth Wallace and Eric Schissel. Comments on this document should be sent to Simon Fowler . Comments on KBB in general should be sent to one or all of: Simon Fowler, Garth Wallace , or Eric Schissel , or posted to the KBB list . More information can be found on the KBB homepage: http://himi.org/kbb/ This document is released under the GNU General Public License, version 2 or greater. This version: $Id: kbb-design.txt,v 1.1 2002/10/05 17:16:49 simon Exp $ Introduction. ==================== Keen Bulletin Board is a web based bulletin board, intended to be a stand-alone system (ie, not dependent on anything other than a CGI capable web server and a standard Perl installation). Design Overview. ==================== The KBB system is made up of two major components: the data files and directory heirarchy supporting them, and the perl code that manipulates and maintains that data. The data can be broken down into several elements: posts (the data received directly from the poster, along with some meta-data related directly to the post), structural meta-data, which is used to maintain threads, topics and the overall site, and presentation data ,which is what the end-user actually sees in their browser. The system code can be divided into two major components: the frontend code, which generates and maintains the presentation data, and the backend code, which stores post data and maintains the structural meta-data. (user data?) Data. ==================== (note: I'm currently ignoring user data, because it's basically seperate from the rest of the data . . . ) All data in the KBB system is stored as flat ASCII text files in a directory tree. Conceptually, a site is structured as a tree, rooted at the site level, with a set of topic nodes below that, threads below them, with individual posts as leaf nodes. This tree structure is generated primarily by the directory tree: the site is rooted at a base directory, with topics as subdirectories off this base; threads are similarly subdirectories of the topic directories. The posts themselves are stored as files in the appropriate thread directory. Post and Meta-Data. -------------------- Meta-data about non-leaf nodes is stored in that node's directory; post meta-data is stored as a header on the post's file (to reduce the number of files stored - it'd get _really_ big otherwise). Posts. --------------- All data about a particular post is stored in a single flat ASCII file in the post's thread directory. The file is made up of two sections: a header, which contains the post's meta-data (the username of the poster, the time it was posted, etc), and the body of the post, which contains the unparsed data, directly from the posting form. This data is generated at the time of posting, and should remain unchanged unless explicitly edited by the poster. Threads. --------------- A thread is made up of a set of post files stored in a directory, with a thread meta-data file recording the subject, creator, time it was created, time it was last updated, etc. The thread meta-data file is updated every time a post is made, by the process that handles new posts. (note: it may be an idea to have the unchanging data in one file and the highly variable data in another - this would minimise the chances of corruption) Topics. --------------- A topic is made up of a set of thread directories, with a thread summary file for each one, and a topic meta-data file. A thread's summary is updated every time the thread's meta-data file is updated, by the new post handler, as is the topic meta-data. The Site. --------------- A site is made up a set of topic directories, with a summary for each one, and a site meta-data file. The topic summaries are updated whenever the topic's meta-data file changes, by the new post handler. The site meta-data should be almost unchanging. Presentation Data. -------------------- (This is your stuff, gwalla . . . Feel free to fix it up however you want . . . ;-) Presentation data is basically the HTML representation of all the other data. Posts. --------------- Post presentation data is generated by passing the post body through the KBBCode parser, to produce an html fragment that is stored in a .fhtml file. Threads. --------------- The presentation version of a thread is essentially generated by taking the .fhtml version of each post and concatenating them together with a header and a footer (that's why the .fhtml files, btw - simple generation of the thread file). Long threads are broken up over multiple pages - these are generated the same way, using a subset of the post files for each page. (question: should _all_ pages of a multiple page thread be updated at once, or merely the last page? Updating all of them would be more consistent with UBB, but also take a hell of a lot more work . . . ) Topics. --------------- A topic's presentation version is generated similarly to a thread's: the thread summaries are concatenated to produce a set of HTML files containing a list of threads, sorted by most recent update. (similar question regarding updating all the pages . . . ) The Site. --------------- The site's presentation version is generated much the same way as the topic, using the topic summaries. Code. ==================== Backend. -------------------- The backend code is made up of a set of programs, each designed to handle a single task, providing the service to other programs via a simple message passing mechanism. Requests and data are passed from the frontend code to the appropriate backend servers, which then process the data and update the site accordingly. The backend programs are concerned with maintaining the site's structure and meta-data, and handling the storage of the post data. The intention is to make the system as parallelisable as possible, and hence as scalable as possible - to handle more traffic, simply create more servers. KBB_IPC.pm --------------- KBB_IPC.pm is a module defining a simple, two way, unix-sockets based IPC mechanism, designed to pass the contents of a hash between processes. The hash's contents are key/value pairs - the key is the name of a property defined in the interface of the server that is being communicated with; the value is simply the value of that property. The properties passed in a communication session are defined by the server process: this defines an interface, that should be used exclusively by both sides of the communication. The interface should consist of a request from the client to the server, containing the requested command and the data required by the server to complete it, and an optional reply. Aside from those general constraints, a server can define an arbitrary interface. KBB_Defines.pm --------------- KBB_Defines.pm provides definitions for various shared values, most notably the paths for various server's communication sockets. Also included are paths for the base of the data and presentation trees, for the html-docs root, and so forth. Any server that needs to export an interface via KBB_IPC should put it's server socket path in KBB_Defines, in order to centralise the information required for communication. new-post.pl --------------- new-post.pl is the central core of the backend - it is the process that handles requests directly from the frontend CGIs, controling the overall progress of the processing of a post. Question: should new threads be handled here, or via a dedicated process? new-post.pl defines the following interface: command: one of new, update, delete new and update require the following properties: username, password, thread-id, and post-data. delete requires: username, password, thread-id, and postnumber. Replies: new, update, delete: a field with either success or error as key (no value is specified - it could be an error or success code). user-data.pl --------------- user-data.pl essentially implements a very simple, single threaded DBMS, using Berkedly DBs to store data as sets of key/pair values, using the username as the key. It is intended that only one instance should be running at any time, in order to simplify the task of maintaining consistency within the database. user-data.pl defines the following interface: command: one of verify, add and update. verify requires: username and password. add currently requires: username, password, email (this list will increase as more fields are added). update requires: username, password, and optionally newpass and newmail (the new values of password and email, respectively). Replies: verify, add, update: a field with either success or error as key (no value is specified - a useful error/success response would be sensible). (gwalla, this is yours - feel free to fix it ;-) parse-kbbcode.pl --------------- parse-kbbcode.pl parses post data to convert the KBBcode it contains to HTML fragments, and to filter out any HTML in the post. parse-kbbcode.pl defines the following interface: command: parse. parse requires: post-data, filename. Replies: No replies are defined. gen-html.pl --------------- gen-html.pl generates the final HTML versions of threads and topic pages, using the HTML fragments generated by parse-kbbcode.pl and summaries of thread meta-data generated by new-post.pl. gen-html.pl defines the following interface: command: update. update requires: thread-id. Replies: No replies are defined. The Algorithm. -------------------- The algorithm followed by the backend processes is quite simple: a CGI script sends a new-post request to the new-post server, which stores the post data and meta-data in a post file. The new-post server then requests the KBBCode parser to parse the post data, updates the thread and topic meta-data, asks the HTML generator to update the relevant pages, and finally returns a success message to the CGI script. The data is added to the tree at the post level, and then filters up to the higher levels in summary form. Frontend. -------------------- (note: this isn't my area, so this part is rather sketchy at present) The frontend code comprises the various posting and thread creation CGIs and the HTML forms that feed data to them. (note: some of these divisions are probably arguable - should gen-html.pl be considered backend or frontend, for example. I'm open to correction on this) The frontend functionality required is: posting forms - one form per thread, maintained as part of that thread's meta-data (this is to allow us to have thread-specific data in cookies set by the posting form. It sounds like a terrible wasteof space, but it makes things sufficiently easier to be worth it). A new thread posting form is also required, one per forum for similar reasons. A search form will be required, probably communicating with a backend search process - at the very least, a recursive grep through the data tree. (What else? Suggestions, please ;-) Most of these pages will need to be generated by scripts - to do this we'll be using the HTML:Template module, and defining templates for the various forms and pages. Administration. -------------------- (note: again, this isn't my stuff, and in fact I don't think this has been discussed much at all . . . ) Administration functionality required includes: user management (this functionality should be available to the users themselves as well as to administrators). Forum creation and management (delegated to forum-specific moderators as well as to general admins). Site creation and management (available only to general admins - this includes things like classifying forums, creating new sites, etc). (What else? Help me!!! Please . . . ;-) Miscellaneous. ==================== The current organisation of the source tree is as follows: kbb: html-templates Templates for generating HTML pages. lib General modules (KBB_IPC, KBB_Defines). backend Backend code. frontend Frontend code. html Plain HTML pages. doc Documentation. A proposed organisation for a site directory tree is as follows: site forum1 thread1 thread2 forum2 thread1 site-code cgi-bin html-templates lib backend frontend comms (directory containing the IPC sockets) This organisation separates the code from the HTML, and keeps user-supplied data in a non-executable directory. An alternative organisation, which would separate the post data from the presentation data would be as follows: site forum1 thread1 site-data forum1-data thread1-data site-code cgi-bin html-templates lib backend frontend comms In this organisation post-data and meta-data is stored in the site-data directory, with only the HTML code visible to the webserver. For a more security conscious site, this might be preferable. The site organisation should be configurable by the admin. Another issue pertaining to this sort of organisation is the question of maximising flexibility - ideally, a modular system should allow the replacement of any module with an alternative implementation that provides the same interface. In this case, I would like to be able to replace the meta-data, structural data and post-data storage with a database implementation, in order to take advatage of the scalability and reliability prvided by an RDBMS, without /requiring/ one. To make this possible, it has to be possible for the data storage to be entirely separate from the files the webserver sees - hence the requirement for configurable site structure.