How do I parse/split a data structure into fields?

soldner · May 17, 2010

I had a simple request to compare 2 files of cost center text and output the differences in the text for FY 2011 from our SAP production system. I've done it in ABAP, but how do I do it in REBOL?

The record format for file 1 is fixed;
201185242118524211PERMIT SUPPORT SERVICES DIV 1000SP01A2"

The record format for file 2 is:
852421120100701BUD DEV-HAZ WASZTE SITE CLEANUP

I need to read file1 to get only those with the first 4 of "2011" and the last 2 of "A2".
Then I need to compare user the next 7 characters, which is the cost center, and match with file2 to compare the text and output differences.

I think I've got how to read the file and search thru another file for matches, but how do I split/parse file 1 into segments I can use and compare?

I don't seem to "get it." Thanks!!

swhite · May 17, 2010

I had a devil of a time "getting" this also. I finally gave up, sort of. What I do now is to "COBOL-ize" my REBOL programs just to help me keep track of what I am doing. The plan is that as I get better with REBOL, I will start to write things the "real" REBOL "way."

So, you could try what I did and do the following. Warning--this will be a bit long. I am hoping you can copy out the code that follows and use/modify it.

First, I encapsulated some common things into functions, so I could remember how to do them. I put these functions into a file that I "do" in all my scripts to bring in those common functions. That is the first chunk of code that you will see below. ** Size limits on this web site required that I chop out all but the GLB-SUBSTRING function that is used in the second sample. **

Then, for a case where I am reading a text file, I make procedures to "open" the file and "read" each record. I just now took the code for a file I recently had to work with and cut out some instructive parts for a hypothetical "cost center text" file. That is the second chunk of code below.

Based on the code samples I have seen on the REBOL web site and the rebol.org sample library, I suspect that the coding style below would give a REBOL expert a heart attack. But it works for me, and by helping me actually use REBOL it helps me begin to understand it.

Code samples follow.

REBOL [
Title: "COB global services module"
]

;; [-------------------------------------------------------------------]
;; [ This is a file of global definitions that will be loaded ]
;; [ as the very first thing in a REBOL script. ]
;; [ This is done with: ]
;; [ do %glb.r ]
;; [ If this file is in its regular location, the above line will ]
;; [ be: ]
;; [ do %/L/COB_REBOL_modules/glb.r ]
;; [-------------------------------------------------------------------]

;; [-------------------------------------------------------------------]
;; [ This function accepts a string, a starting position, and an ]
;; [ ending position, and returns a substring from the starting ]
;; [ position to the ending position. If the ending position is -1, ]
;; [ the procedure returns the substring from the starting position ]
;; [ to the end of the string. ]
;; [-------------------------------------------------------------------]

GLB-SUBSTRING: func [
"Return a substring from the start position to the end position"
INPUT-STRING [series!] "Full input string"
START-POS [number!] "Starting position of substring"
END-POS [number!] "Ending position of substring"
] [
if END-POS = -1 [END-POS: length? INPUT-STRING]
return skip (copy/part INPUT-STRING END-POS) (START-POS - 1)
]

;; [-------------------------------------------------------------------]
;; [ Cost Center Text file. ]
;; [ Holding areas for the file and individual records don't have to ]
;; [ be "defined" as in other languages, but they may be "defined" in ]
;; [ this manner if is helps keep track of things. ]
;; [-------------------------------------------------------------------]

CCT-FILE: [] ;; Holds the whole file in memory
CCT-FILE-ID: %SAP.txt ;; Default name of the file
CCT-EOF: false ;; End-of-file flag for reading
CCT-REC: "" ;; One record, for reading or writing
CCT-REC-COUNT: 0 ;; Counter, upped by 1 as we read or write

;; [-------------------------------------------------------------------]
;; [ These are the "fields" in the fixed-format "record." ]
;; [ They don't have to be "defined" this way, but they may be ]
;; [ to ease the mental burden. ]
;; [-------------------------------------------------------------------]

CCT-YEAR: ""
CCT-COST-CENTER: ""

;; [-------------------------------------------------------------------]
;; [ "Open" the file by clearing out a block to hold the whole file. ]
;; [ Then, read in the whole file as a block of lines. ]
;; [ Set an end-of-file flag to false, and a record count to zero. ]
;; [-------------------------------------------------------------------]
CCT-OPEN-INPUT: does [
CCT-FILE: copy []
CCT-FILE: read/lines CCT-FILE-ID
CCT-EOF: false
CCT-REC-COUNT: 0
]

;; [-------------------------------------------------------------------]
;; [ "Read" a "record" by picking off the next string from the ]
;; [ block of strings that is the whole file. ]
;; [ Finding the "next" record is done by picking the block element ]
;; [ by number, and the number is obtained by using the record counter.]
;; [-------------------------------------------------------------------]
CCT-READ: does [
CCT-REC-COUNT: CCT-REC-COUNT + 1
CCT-REC: copy ""
CCT-CURRENT-TYPE: copy ""
CCT-REC: pick CCT-FILE CCT-REC-COUNT
if none? CCT-REC [
CCT-EOF: true
]
if not CCT-EOF [
CCT-UNSTRING-RECORD
]
]

;; [-------------------------------------------------------------------]
;; [ Break apart the fixed-format string that is a single record. ]
;; [ "Store" the individual "fields" in words, so they can be ]
;; [ referred to more easily. ]
;; [-------------------------------------------------------------------]

CCT-UNSTRING-RECORD: does [
CCT-YEAR: copy ""
CCT-COST-CENTER: copy ""
CCT-YEAR: GLB-SUBSTRING CCT-REC 1 4
CCT-COST-CENTER: GLB-SUBSTRING CCT-REC 5 11
]

soldner · May 17, 2010

Thanks! ABAP has its roots in COBOL, so we share a common foundation. This will help tremendously.

On the new REBOL Forum, I posted the question and got this response, which is similar to yours:

Note: Nick wrote the crash course - so you are thinking to this beginning is REBOLish!

Use copy and copy/part:

file1: "201185242118524211PERMIT SUPPORT SERVICES DIV 1000SP01A2"

copy/part file1 4

copy at tail file1 -2

copy/part (at file1 9) 7

Take a look at http://re-bol.com/rebol_crash_course.html#section-2.3 and the "REBOL strings" section.

posted by: Nick 17-May-2010/10:07:06-7:00

Here's the link from the email notification:

=======changes=======
forum.r
--change: new script
--change: updated script
--title: Forum
--owners: notchent
--author: Nick
--purpose: A bare bones CGI forum application, running at http://rebolforum.com.
Please link to it, so that new REBOLers have a place to ask questions!

Hope this helps!

Thanks again!

Graham · May 18, 2010

record: "201185242118524211PERMIT SUPPORT SERVICES DIV 1000SP01A2"
digit: charset [ #"0" - #"9" ]

parse/all record [ copy part1 4 digit copy costcenter 7 digit copy part3 to end ( part3: skip tail part3 -2 ) ]

== true
>> ?? part1
part1: "2011"
== "2011"
>> ?? costcenter
costcenter: "8524211"
== "8524211"
>> ?? part3
part3: "A2"
== "A2"

soldner · May 18, 2010

Thank you Dr. Chiu.

Between both of you I'm getting the idea.

Great stuff!

swhite · May 18, 2010

Holy cow! That's amazing. A page of code replaced by a line of code. It's going to take me all day to figure out that line.

I thought there probably was a "REBOL way" that I just could not understand. Back to the documentation we go.

swhite · May 18, 2010

Thanks to all involved for helping me to understand parse. I read the chapter about it, again, and applied it to this example, and, by George, I understand it.

Here is a little refinement in case that description in the cost center record is a variable length, with the A2 code stuck at the end. Probably it is not, since one normally would put the fixed stuff first and the variable stuff at the end. Anyway: (If you copy this out of here and paste it into a text editor, it looks better.)

REBOL [
]

CCT-REC: "201185242118524211PERMIT SUPPORT SERVICES DIV 1000SP01A2"
DIGIT: charset [ #"0" - #"9" ]

parse/all CCT-REC [
copy CCT-YEAR 4 DIGIT ;; copy 4 of any digit to CCT-YEAR, then...
copy CCT-COSTCENTER 7 DIGIT ;; copy 7 of any digit to CCT-COSTCENTER, then...
copy CCT-DESC to end ( ;; copy the rest to CCT-DESC, then...
CCT-CODE: skip tail CCT-DESC -2 ;; execute this REBOL code...
CCT-DESC: copy/part CCT-DESC ((length? CCT-DESC) - 2) ;;...to cut the last two characters
) ;; from the end of CCT-DESC
]

print ["CCT-YEAR = " CCT-YEAR]
print ["CCT-COSTCENTER = " CCT-COSTCENTER]
print ["CCT-DESC = " CCT-DESC]
print ["CCT-CODE = " CCT-CODE]

Graham · May 18, 2010

REBOL has a number of idioms, and here's the one for removing the last element of a series:

remove back tail series
Code:
>> s: "abcd"
== "abcd"
>> tail s
== ""
>> back tail s
== "d"
>> remove back tail s
== ""
>> s
== "abc"

Log in or Sign up

How do I parse/split a data structure into fields?

soldner New Member

swhite Member

soldner New Member

Graham Developer Staff Member

soldner New Member

swhite Member

swhite Member

Graham Developer Staff Member

Share This Page

Log in or Sign up

How do I parse/split a data structure into fields?

soldner New Member

swhite Member

soldner New Member

Graham Developer Staff Member

soldner New Member

swhite Member

swhite Member

Graham Developer Staff Member

Share This Page

Useful Searches