This is the Small Time Intranet Logger project. INTRODUCTION: Intranet Logger is a suite of programs designed to centralize the parsing and presentation of system logs generated by computers in an intranet. The log data is pushed to the logging server by each client machine. The logging server, in turn, maintains the information in an RDBMS and then responds to queries via a http daemon interfaced to the RDBMS. The log data is pushed to the logging server using shell scripts and nfs. The log data is parsed at the server, formatted for database loading and loaded to the database. In its present state, this is done using Perl. The RDBMS is MySQL, the httpd is Apache and the two are interfaced via PHP. With the exception of Apache and its own license, the system is a GNU-OTS, i.e., it uses GNU programs throughout according to GNU's documentation, i.e., (G)NU (O)ff (T)he (S)helf. It's tempting to just say "GOTS", isn't it? But Apache is not licensed (to my knowledge) under GNU but it is free. Then it's a (F)ree OTS system. Free to me, free to you. I say "FOTS" because I haven't tried to write or rewrite daemons or define a new network protocol or define new log format standards or anything else like that. As it turned out, the available software was more than enough foundation. Besides, one of the central themes in the design and coding is ease of adaptability to existing software and environments. The only part I would even come close to calling 'innovative' is in the design of the database. It is built of table 'families' consisting of a 'main' table and 'auxiliary' tables that corresponds to each field in the main table that is not already numeric or date/time in format. Unique text data strings are stored in the auxiliary tables and the index numbers to the strings are stored in the main. More on this later. Yes, it does set the stage for some convoluted, funky looking queries, at least until you get the hang of how the queries need to be constructed. Then they are just cumbersome. The result is, however, a drastic reduction in the size of the database on disk and the means to almost eliminate endlessly repetitive text strings in the stored data. There is also the speed payoff that comes from all the data in the large tables being date/time or numeric. Actually, I get the feeling that the word 'innovative' is a stretch. For all I really know this may be a method in use since the early 1940's. Maybe the 50's. The people at MySQL say that we should assume that data will occupy five times (500%) more disk space after it is inserted in a database. My results so far show an increase in disk size of the database to be only 50%-100% even with every field of every table of the database indexed. I think this is good. It all started from a desire to learn more about Perl. Then came an increased interest in security. So, looking for some security-focused data on which to do some practical extracting and reporting, I thought of log data. When I found Devshed and its 'The Soothingly Seamless Setup of Apache, SSL, MySQL, and PHP' the course of Intranet Logger was pretty much set. If you find it only fractionally as cool as I do, I will be very pleased. ASSUMPTIONS: 1. The log data is in text. 2. Record fields in the log data are seperated by white space. 3. TBD FEATURES: 1. Presently handles all logs generated by TCP Wrappers, Apache, the output of 'last', 'lastlog', and 'w' as well as all logs generated by, or through, the syslog daemon. 2. It should handle any log generated by virtually any operating system as long as: A. The log data is in ASCII text. B. The message data is at the end of the record, i.e., regardless of how many fields may come prior to the message and regardless of how long the message may be. C. Lines after the first in multiline log entries begin with whitespace. D. The clients deliver log data to the server, i.e., data is pushed to the logging server, not pulled from the clients. E. The format of the log entries is consistent enough, in terms of which data is in which field, to make processing cost effective. (Actually I expect conditions 'A' and 'C' to be temporary in the long run though they are necessary at this point) 3. It minimizes disk space by maintaining 'auxiliary' tables in the database. By using these 'auxiliary' tables, the database stores only one copy of the text data of any given field in any given record. These records are referenced by index numbers and it is these numbers that are stored in the 'main' tables. So the 'main' tables (the tables that have a record for every unique log entry) store only numberic/date/time data and so are relatively small and fast. Another way to look at it: The system I have been using recently inserted its 1 millionth record into its database. The database occupied approximately 180MB on disk. That comes out to approximately 180 bytes per record. If the size of the data is 1/5 its size in the database, we're looking at 36 bytes per log entry. Wow, if I do say so myself. 4. Does not depend on constant network connectivity to work. IOWs, any client machine or the server itself can go down for almost an indefinite length of time and the logs will simply accumulate on their respective hosts and/or the logging server until the connection is re-established. The only constraint here is available disk space. 5. The code is written to be as 'self-documenting' as possible. And I opted for making the code understandable first and efficient second. In other words, I have tried to make the system accessible to the widest possible audience. There is an index of functions, settings and other points of interest in the files so that a cut and paste into the search feature of your editor will take you to the code you want without scrolling. Also, when debugging, there are many checkpoints which will place text strings in the debugging output that you can use to search to the code generating the output. I got really tired of scrolling throught the code. Lastly, the majority of the documentation is in the code files. Indeed, a good deal of the text in this documentation is just a cut and paste or a rewrite of the comments in the code. 6. The whole of the work done until now has been done with an eye toward collaboration, i.e., this has been an open source project since its conception. 7. Compartmentalized development. The project development has 6 more or less distinct sub-projects: a. acquire - Collecting logs on the client and transferring them to the server. b. parse - Filtering the log entries for corruption and duplication, formatting the entries for loading into the database, and loading them. c. dbms - Everything about the database: Design, structure, implementation. d. analysis - How the information in the log entries is used. e. httpd - Everything about the http interface to the database and to the world. f. security - Everything about security. Of course, there are some dependencies and some overlap. But these are the sub-project classifications I use in my local CVS repository and they have worked well enough that I have seen no reason to modify them. 8. As of the initial release, sub-projects a, b, c are funtional enough that I am using the system now. Sub-projects d, e, f are still waiting. 9. Initial development was with Slackware 7.1. Development to this point was finished with Slackware 8.0. If you are using a 'straight' installation and setup of Slackware, this should work fairly easily. I don't know about other OS's. This is one of the reasons I have looked forward to the initial release of the project. ATTENTION: 1. This project is totally beta. Use at your own risk. Whatever you do, do it with copies of the logs and test databases first. If you set the 'rootdir' variable in 'prepLogs4DBMS_Vars.pm' and 'load_tables.pl' and maintain the directory structure within the new root, the system should work anywhere. That is, it's pretty easy to set up a testbed.