This chapter shows how to use normal strings and E-strings, and also how to read data from a file. The programs use a number of the string functions and make effective (but different) use of memory where possible. The key points to understand are:
The problem to solve is reading of a CSV (comma separated variables) file, which is a standard format file for databases and spreadsheets. The format is very simple: each record is a line (i.e., terminated with a line-feed) and each field in a record is separated by a comma. To make this example a lot simpler, we will forbid a field to contain a comma (normally this would require the field to be quoted). So, a typical input file would look like this:
Field1,Field2,Field3 10,19,-3 fred,barney,wilma ,,last first,,
In this example all records have three fields, as is well illustrated by the first line (i.e., the first record). The last two records may seem a bit strange, but they just show how fields can be blank. In the last record all but the first field are blank, and in the previous record all but the last are blank.
So now we know the format of the file to be read.
To operate on a file we must first open it using the Open
function (from the `dos.library'), and to read the lines from the file we will use the ReadStr
(built-in) function.
There will be four versions of a program to read a CSV file: two of which read data line-by-line and two which read all the file at once.
Of the two which read line-by-line, one manipulates the read lines as E-strings and the other uses normal strings.
The use of normal strings is arguably more advanced than the use of E-strings, since cunning tricks are employed to make effective use of memory.
However, the programs are not meant to show that E-strings are better than normal strings (or vice versa), rather they are meant to show how to use strings properly.
/* A suitably large size for the record buffer */ CONST BUFFERSIZE=512 PROC main() DEF filehandle, status, buffer[BUFFERSIZE]:STRING, filename filename:='datafile' IF filehandle:=Open(filename, OLDFILE) REPEAT status:=ReadStr(filehandle, buffer) /* This is the way to check ReadStr() actually read something */ IF buffer[] OR (status<>-1) THEN process_record(buffer) UNTIL status=-1 /* If Open() succeeded then we must Close() the file */ Close(filehandle) ELSE WriteF('Error: Failed to open "\s"\n', filename) ENDIF ENDPROC PROC process_record(line) DEF i=1, start=0, end, len, s /* Show the whole line being processed */ WriteF('Processing record: "\s"\n', line) REPEAT /* Find the index of a comma after the start index */ end:=InStr(line, ',', start) /* Length is end index minus start index */ len:=(IF end<>-1 THEN end ELSE EstrLen(line))-start IF len>0 /* Allocate an E-string of the correct length */ IF s:=String(len) /* Copy the portion of the line to the E-string s */ MidStr(s, line, start, len) /* At this point we could do something useful... */ WriteF('\t\d) "\s"\n', i, s) /* We've finished with the E-string so deallocate it */ DisposeLink(s) ELSE /* It's a non-fatal error if the String() call fails */ WriteF('\t\d) Memory exhausted! (len=\d)\n', len) ENDIF ELSE WriteF('\t\d) Empty Field\n', i) ENDIF /* The new start is after the end we found */ start:=end+1 INC i /* Once a comma is not found we've finished */ UNTIL end=-1 ENDPROC
There are a couple of points worth noting about this program:
buffer
, is used to hold each line before it is processed.
If a record exceeds the size of this E-string then ReadStr
will only read a partial record, and the next ReadStr
will read some more this record.
However, the program considers each call to ReadStr
to read a whole record, so it will get the records slightly wrong in this case.
This is a limitation of the program and it should be documented so that users know to constrain themselves to datafiles without long lines.
ReadStr
may return -1 to indicate an error (usually when the end of the file has been reached), but the E-string read so far may still be valid.
The check on the E-string and error value is the proper way of deciding whether ReadStr
actually read anything from the file.
start
and end
, and the calculation of the length of a portion of a string.
MidStr
is used to copy a field from a record, so an E-string must be used to hold the field.
s
is only valid between the successful allocation by string
and the DisposeLink
.
It would be incorrect to try to, for instance, print it at any other point.
On the other hand, a more complicated program may want to store up all the data, and so it may be inappropriate to deallocate the E-string at this point.
In this case, the pointer to the E-string could be stored and it might be valid for the rest of the program.
String
is very closely followed by deallocation using DisposeLink
.
This suggests that a single E-string could be allocated and used repeatedly (like buffer
is), due to the simple nature of this example.
To change this to use normal strings (in a very memory efficient way), we need to alter only the process_record
procedure.
Some note-worthy differences are:
buffer
are turned into normal strings by terminating them with NIL
when necessary.
This involves changing a comma that is found.
buffer
memory is reused (as described above).
This is fine for this example, although if the fields were needed after a record had been processed they would need to be copied, since the contents of buffer
are changed by ReadStr
.
PROC process_record(line) DEF i=1, start=0, end, s /* Show the whole line being processed */ WriteF('Processing record: "\s"\n', line) REPEAT /* Find the index of a comma after the start index */ end:=InStr(line, ',', start) /* If a comma was found then terminate with a NIL */ IF end<>-1 THEN line[end]:=NIL /* Point to the start of the field */ s:=line+start IF s[] /* At this point we could do something useful... */ WriteF('\t\d) "\s"\n', i, s) ELSE WriteF('\t\d) Empty Field\n', i) ENDIF /* The new start is after the end we found */ start:=end+1 INC i /* Once a comma is not found we've finished */ UNTIL end=-1 ENDPROC
The next two versions of the program are basically the same: they both read the whole file into one large, dynamically allocated buffer and then process the data. The second of the two versions also uses exceptions to make the program much more readable. The differences from the above version which uses normal strings are:
main
procedure calculates the length of the data in the file and then uses New
to dynamically allocate some memory to hold it.
NIL
so that it can safely be treated as a (very long) normal string.
process_buffer
procedure splits the read data up into lots of normal strings, one for each line of data.
PROC main() DEF buffer, filehandle, len, filename filename:='datafile' /* Get the length of data in the file */ IF 0<(len:=FileLength(filename)) /* Allocate just enough room for the data + a terminating NIL */ IF buffer:=New(len+1) IF filehandle:=Open(filename, OLDFILE) /* Read whole file, checking amount read */ IF len=Read(filehandle, buffer, len) /* Terminate buffer with a NIL just in case... */ buffer[len]:=NIL process_buffer(buffer, len) ELSE WriteF('Error: File reading error\n') ENDIF /* If Open() succeeded then we must Close() the file */ Close(filehandle) ELSE WriteF('Error: Failed to open "\s"\n', filename) ENDIF /* Deallocate buffer (not really necessary in this example) */ Dispose(buffer) ELSE WriteF('Error: Insufficient memory to load file\n') ENDIF ELSE WriteF('Error: "\s" is an empty file\n', filename) ENDIF ENDPROC /* buffer is like a normal string since it's NIL-terminated */ PROC process_buffer(buffer, len) DEF start=0, end REPEAT /* Find the index of a linefeed after the start index */ end:=InStr(buffer, '\n', start) /* If a linefeed was found then terminate with a NIL */ IF end<>-1 THEN buffer[end]:=NIL process_record(buffer+start) start:=end+1 /* We've finished if at the end or no more linefeeds */ UNTIL (start>=len) OR (end=-1) ENDPROC PROC process_record(line) DEF i=1, start=0, end, s /* Show the whole line being processed */ WriteF('Processing record: "\s"\n', line) REPEAT /* Find the index of a comma after the start index */ end:=InStr(line, ',', start) /* If a comma was found then terminate with a NIL */ IF end<>-1 THEN line[end]:=NIL /* Point to the start of the field */ s:=line+start IF s[] /* At this point we could do something useful... */ WriteF('\t\d) "\s"\n', i, s) ELSE WriteF('\t\d) Empty Field\n', i) ENDIF /* The new start is after the end we found */ start:=end+1 INC i /* Once a comma is not found we've finished */ UNTIL end=-1 ENDPROC
The program is now quite messy, with many error cases in the main
procedure.
We can very simply change this by using an exception handler and a few automatic exceptions.
/* Some constants for exceptions (ERR_NONE is zero: no error) */ ENUM ERR_NONE, ERR_LEN, ERR_NEW, ERR_OPEN, ERR_READ /* Make some exceptions automatic */ RAISE ERR_LEN IF FileLength()<=0, ERR_NEW IF New()=NIL, ERR_OPEN IF Open()=NIL PROC main() HANDLE /* Note the careful initialisation of buffer and filehandle */ DEF buffer=NIL, filehandle=NIL, len, filename filename:='datafile' /* Get the length of data in the file */ len:=FileLength(filename) /* Allocate just enough room for the data + a terminating NIL */ buffer:=New(len+1) filehandle:=Open(filename, OLDFILE) /* Read whole file, checking amount read */ IF len<>Read(filehandle, buffer, len) THEN Raise(ERR_READ) /* Terminate buffer with a NIL just in case... */ buffer[len]:=NIL process_buffer(buffer, len) EXCEPT DO /* Both of these are safe thanks to the initialisations */ IF buffer THEN Dispose(buffer) IF filehandle THEN Close(filehandle) /* Report error (if there was one) */ SELECT exception CASE ERR_LEN; WriteF('Error: "\s" is an empty file\n', filename) CASE ERR_NEW; WriteF('Error: Insufficient memory to load file\n') CASE ERR_OPEN; WriteF('Error: Failed to open "\s"\n', filename) CASE ERR_READ; WriteF('Error: File reading error\n') ENDSELECT ENDPROC
The code is now much clearer, and the majority of errors can be caught automatically.
Notice that the exception handler is called even if the program succeeds (thanks to the DO
after the EXCEPT
).
This is because when the program terminates it needs to deallocate the resources it allocated in every case (successful or otherwise), so the code is the same.
Conditional deallocation (of the buffer, for example) is made safe by an appropriate initialisation.
If you feel like a small exercise, try to write a similar program but this time using the `tools/file' module which comes in the standard Amiga E distribution. Of course, you'll first need to read the accompanying documentation, but you should find that this module makes file interaction very simple.
Go to the Next or Previous section, the Detailed Contents, or the Amiga E Encyclopedia.