Go to the Next or Previous section, the Detailed Contents, or the Amiga E Encyclopedia.


19 String Handling and I/O

This chapter shows how to use normal strings and E-strings, and also how to read data from a file. The programs use a number of the string functions and make effective (but different) use of memory where possible. The key points to understand are:

The problem to solve is reading of a CSV (comma separated variables) file, which is a standard format file for databases and spreadsheets. The format is very simple: each record is a line (i.e., terminated with a line-feed) and each field in a record is separated by a comma. To make this example a lot simpler, we will forbid a field to contain a comma (normally this would require the field to be quoted). So, a typical input file would look like this:

Field1,Field2,Field3
10,19,-3
fred,barney,wilma
,,last
first,,

In this example all records have three fields, as is well illustrated by the first line (i.e., the first record). The last two records may seem a bit strange, but they just show how fields can be blank. In the last record all but the first field are blank, and in the previous record all but the last are blank.

So now we know the format of the file to be read. To operate on a file we must first open it using the Open function (from the `dos.library'), and to read the lines from the file we will use the ReadStr (built-in) function. There will be four versions of a program to read a CSV file: two of which read data line-by-line and two which read all the file at once. Of the two which read line-by-line, one manipulates the read lines as E-strings and the other uses normal strings. The use of normal strings is arguably more advanced than the use of E-strings, since cunning tricks are employed to make effective use of memory. However, the programs are not meant to show that E-strings are better than normal strings (or vice versa), rather they are meant to show how to use strings properly.

/* A suitably large size for the record buffer */
CONST BUFFERSIZE=512

PROC main()
  DEF filehandle, status, buffer[BUFFERSIZE]:STRING, filename
  filename:='datafile'
  IF filehandle:=Open(filename, OLDFILE)
    REPEAT
      status:=ReadStr(filehandle, buffer)
      /* This is the way to check ReadStr() actually read something */
      IF buffer[] OR (status<>-1) THEN process_record(buffer)
    UNTIL status=-1
    /* If Open() succeeded then we must Close() the file */
    Close(filehandle)
  ELSE
    WriteF('Error: Failed to open "\s"\n', filename)
  ENDIF
ENDPROC

PROC process_record(line)
  DEF i=1, start=0, end, len, s
  /* Show the whole line being processed */
  WriteF('Processing record: "\s"\n', line)
  REPEAT
    /* Find the index of a comma after the start index */
    end:=InStr(line, ',', start)
    /* Length is end index minus start index */
    len:=(IF end<>-1 THEN end ELSE EstrLen(line))-start
    IF len>0
      /* Allocate an E-string of the correct length */
      IF s:=String(len)
        /* Copy the portion of the line to the E-string s */
        MidStr(s, line, start, len)
        /* At this point we could do something useful... */
        WriteF('\t\d) "\s"\n', i, s)
        /* We've finished with the E-string so deallocate it */
        DisposeLink(s)
      ELSE
        /* It's a non-fatal error if the String() call fails */
        WriteF('\t\d) Memory exhausted! (len=\d)\n', len)
      ENDIF
    ELSE
      WriteF('\t\d) Empty Field\n', i)
    ENDIF
    /* The new start is after the end we found */
    start:=end+1
    INC i
  /* Once a comma is not found we've finished */
  UNTIL end=-1
ENDPROC

There are a couple of points worth noting about this program:

To change this to use normal strings (in a very memory efficient way), we need to alter only the process_record procedure. Some note-worthy differences are:

PROC process_record(line)
  DEF i=1, start=0, end, s
  /* Show the whole line being processed */
  WriteF('Processing record: "\s"\n', line)
  REPEAT
    /* Find the index of a comma after the start index */
    end:=InStr(line, ',', start)
    /* If a comma was found then terminate with a NIL */
    IF end<>-1 THEN line[end]:=NIL
    /* Point to the start of the field */
    s:=line+start
    IF s[]
      /* At this point we could do something useful... */
      WriteF('\t\d) "\s"\n', i, s)
    ELSE
      WriteF('\t\d) Empty Field\n', i)
    ENDIF
    /* The new start is after the end we found */
    start:=end+1
    INC i
  /* Once a comma is not found we've finished */
  UNTIL end=-1
ENDPROC

The next two versions of the program are basically the same: they both read the whole file into one large, dynamically allocated buffer and then process the data. The second of the two versions also uses exceptions to make the program much more readable. The differences from the above version which uses normal strings are:

PROC main()
  DEF buffer, filehandle, len, filename
  filename:='datafile'
  /* Get the length of data in the file */
  IF 0<(len:=FileLength(filename))
    /* Allocate just enough room for the data + a terminating NIL */
    IF buffer:=New(len+1)
      IF filehandle:=Open(filename, OLDFILE)
        /* Read whole file, checking amount read */
        IF len=Read(filehandle, buffer, len)
          /* Terminate buffer with a NIL just in case... */
          buffer[len]:=NIL
          process_buffer(buffer, len)
        ELSE
          WriteF('Error: File reading error\n')
        ENDIF
        /* If Open() succeeded then we must Close() the file */
        Close(filehandle)
      ELSE
        WriteF('Error: Failed to open "\s"\n', filename)
      ENDIF
      /* Deallocate buffer (not really necessary in this example) */
      Dispose(buffer)
    ELSE
      WriteF('Error: Insufficient memory to load file\n')
    ENDIF
  ELSE
    WriteF('Error: "\s" is an empty file\n', filename)
  ENDIF
ENDPROC

/* buffer is like a normal string since it's NIL-terminated */
PROC process_buffer(buffer, len)
  DEF start=0, end
  REPEAT
    /* Find the index of a linefeed after the start index */
    end:=InStr(buffer, '\n', start)
    /* If a linefeed was found then terminate with a NIL */
    IF end<>-1 THEN buffer[end]:=NIL
    process_record(buffer+start)
    start:=end+1
  /* We've finished if at the end or no more linefeeds */
  UNTIL (start>=len) OR (end=-1)
ENDPROC

PROC process_record(line)
  DEF i=1, start=0, end, s
  /* Show the whole line being processed */
  WriteF('Processing record: "\s"\n', line)
  REPEAT
    /* Find the index of a comma after the start index */
    end:=InStr(line, ',', start)
    /* If a comma was found then terminate with a NIL */
    IF end<>-1 THEN line[end]:=NIL
    /* Point to the start of the field */
    s:=line+start
    IF s[]
      /* At this point we could do something useful... */
      WriteF('\t\d) "\s"\n', i, s)
    ELSE
      WriteF('\t\d) Empty Field\n', i)
    ENDIF
    /* The new start is after the end we found */
    start:=end+1
    INC i
  /* Once a comma is not found we've finished */
  UNTIL end=-1
ENDPROC

The program is now quite messy, with many error cases in the main procedure. We can very simply change this by using an exception handler and a few automatic exceptions.

/* Some constants for exceptions (ERR_NONE is zero: no error) */
ENUM ERR_NONE, ERR_LEN, ERR_NEW, ERR_OPEN, ERR_READ

/* Make some exceptions automatic */
RAISE ERR_LEN  IF FileLength()<=0,
      ERR_NEW  IF New()=NIL,
      ERR_OPEN IF Open()=NIL

PROC main() HANDLE
  /* Note the careful initialisation of buffer and filehandle */
  DEF buffer=NIL, filehandle=NIL, len, filename
  filename:='datafile'
  /* Get the length of data in the file */
  len:=FileLength(filename)
  /* Allocate just enough room for the data + a terminating NIL */
  buffer:=New(len+1)
  filehandle:=Open(filename, OLDFILE)
  /* Read whole file, checking amount read */
  IF len<>Read(filehandle, buffer, len) THEN Raise(ERR_READ)
  /* Terminate buffer with a NIL just in case... */
  buffer[len]:=NIL
  process_buffer(buffer, len)
EXCEPT DO
  /* Both of these are safe thanks to the initialisations */
  IF buffer THEN Dispose(buffer)
  IF filehandle THEN Close(filehandle)
  /* Report error (if there was one) */
  SELECT exception
  CASE ERR_LEN;   WriteF('Error: "\s" is an empty file\n', filename)
  CASE ERR_NEW;   WriteF('Error: Insufficient memory to load file\n')
  CASE ERR_OPEN;  WriteF('Error: Failed to open "\s"\n', filename)
  CASE ERR_READ;  WriteF('Error: File reading error\n')
  ENDSELECT
ENDPROC

The code is now much clearer, and the majority of errors can be caught automatically. Notice that the exception handler is called even if the program succeeds (thanks to the DO after the EXCEPT). This is because when the program terminates it needs to deallocate the resources it allocated in every case (successful or otherwise), so the code is the same. Conditional deallocation (of the buffer, for example) is made safe by an appropriate initialisation.

If you feel like a small exercise, try to write a similar program but this time using the `tools/file' module which comes in the standard Amiga E distribution. Of course, you'll first need to read the accompanying documentation, but you should find that this module makes file interaction very simple.


Go to the Next or Previous section, the Detailed Contents, or the Amiga E Encyclopedia.