• Welcome to Computer Association of SIUE - Forums.

Breaking up a file

Started by Matthew Thomas, 2005-08-12T01:58:08-05:00 (Friday)

Previous topic - Next topic

Matthew Thomas

I'm finishing up an application to run scripts based on email commands, and in the process of doing so, am writing a 'help' script

I know I'll probably get reamed for this later, but here it goes:

I recently wrote this applescript
to break up a text file into however many messages that are 120 characters long, and then send off those messages.

While it does the job, Macs have unix capabilities, and as such I have the ability to use awk. In my short experience, unix scripts *seem* to run slightly faster than applescript, and in doing so, it will make linux portability that much easier So, I started in on this awk procedure file

When I run it though, I just get a bunch of blank lines. I know I am writing the same code using a different language; but I don't have a lot of experience with awk, and my unix bible, unix in a nutshell, doesn't give a whole lot of examples to use this to its fullest capability; so that's why I'm asking for help.

I'm using the file
as a test file, but in the end, help files will look like

So, if you could, please take a look at what I have, and maybe offer some suggestions?

Superman wears Jack Bauer pajamas

William Grim

I'm not as well-versed in awk as you are, seeing as I only use it for a few print statements that have odd whitespace in them.  However, there is a unix command that can already do what you want; it is called `split'.

If your messages contain anything other than text, I don't know if `split' can handle them properly, but I think it can.  I don't think awk can handle binary.

split -b 120 EMAIL

Now you will end up with several files prefixed with ``EMAIL" that are 120 bytes (and usually less for the last file).

Finally, if you want to send off all those EMAIL files in order, you could do something like

for i in `\ls -1 EMAIL*` ; do mail -s ${i} < ${i} ; done

That will send all of your emails in-order to the recipients in .  The subject will be the name of the file you are sending; so, you may want to modify that script a little bit to do what you want.

On the receiving end, you may need a script to put all the files back together.  I'm not sure what your purposes are, but if that's the case, then something like

cat EMAIL* >

could work just fine.

Hope that helps.

PS: Read mail(1), split(1), sh(1).
William Grim
IT Associate, Morgan Stanley

Matthew Thomas

Thanks Mike! This is exactly what I was looking for. I knew there was a real easy way to do this without re-inventing the wheel.

As a side note, I don't know how hard of a time you had configuring postfix or sendmail, but I've found that clean code email works pretty well, and is 10 times eaiser to configure, provided you have a pop or smtp account to use it with.

To get back on subject, the recipient of the help file will usually be a cell phone, and for mine, it only recieves 120 characters before it cuts off the rest of the message.

Thanks again.

Superman wears Jack Bauer pajamas

Matthew Thomas

I have a solution!

For those who care to see it:


I'll leave it up until monday
Superman wears Jack Bauer pajamas

William Grim

I took a look at the script (btw, thanks for including my name, heh).  Your USAGE could be automated to include the script name with something like

USAGE="${0} "

When using split, it is possible that too many files need to be created using just a suffix of 2.  So, you should probably still calculate how many messages there should be and set the --suffix-length (-a) parameter on split appropriately.

Just as a general style guideline, I'd put a slash '/' between your directory variables and your filename variables.  It will prevent confusion and prevent people from entering incorrect values in your directory variables (meaning they forget to put a trailing slash).

Also, if this script is used for sending lots of emails simultaneously, look into using lockfiles.  It'll prevent multiple splits from overwriting each other's files.

For another style guideline, you don't have to do something like ${TempDir}"HelpMessage".  You can do it much more clearly with something like "${TempDir}/HelpMessage".  The braces separate the variable from the rest of the normal text.

Also, don't most cell phones (or cell phone companies) split a message up into multiple chunks for you already?
William Grim
IT Associate, Morgan Stanley

Matthew Thomas

Don't want to come off as mean, but its like 4 am man. so

-Not a problem for mentioning your name. If there's one thing I've learned from this school, its give credit where credit is due

-You are absolutely right about that. I at first though that it would be overkill to calculate that, but after your comment thought it might be kind of interesting to put in. Can't seem to get a do...while loop (which is what I need) so went with a while...do loop. (ha ha). Just one thing though. The stupid parens are messing me up. I know the logic is right, maybe there's something I missed.

-Shouldn't have to worry about that. Its only intended to be run by one user on one machine at a time. (like on your home pc while you're away)

-I stored the directories that way because I kept forgetting when testing out on the command line. Not a big deal to change them though

Take 3 cell phones, 2 on vzw, 1 on sprint
My co-worker (vzw) gets the email, regardless of length, in its entirety
My wife (sprint) gets it automagically chunked up in 130 character messages. Whether that's a srint thing or a phone thing, I don't know
My phone (vzw) recieves the first 120 characters of an email and then drops the rest off.

Thanks again, and good night....er good morning :zzz:

Superman wears Jack Bauer pajamas

Matthew Thomas

Didn't notice this until this morning, but right after
let "NumMsgs = $FileLen / $MaxMsgLen"

should be

typeset -i NumMsgs=$NumMsgs

-25/100 points!
Superman wears Jack Bauer pajamas

William Grim

I think your exponent calculation is complicated and error prone.  I have posted a description of what I think should be changed at http://rafb.net/paste/results/xiyYgz44.html, along with sample code that works.

This is just another nitpick, but I notice you are invoking your script using ksh.  Not everyone will install ksh on their systems; I'd write my scripts to be ash (not bash) compatible.  As far as I know, ash is the closest a shell gets to being pure /bin/sh compatible and is the default /bin/sh interpretter on BSD systems.
William Grim
IT Associate, Morgan Stanley

William Grim

Oh, if you're worried about rounding errors, I think this may work out nicer for you

let NumMsgs=(FileLen+MaxMsgLen-1)/MaxMsgLen


let "NumMsgs = $FileLen / $MaxMsgLen"
let "LenTest = $MaxMsgLen * $NumMsgs"

#check to see if we undercounted
if [ $LenTest -lt $FileLen ] ; then
   let "NumMsgs += 1"

I haven't tested the boundaries on that, but I remember doing work like that in some CS course or something to help speed things up and simplify the code.

So, if your FileLen is 120 and your MaxMsgLen is 120, it will evaluate the number of messages to 1.  If the FileLen is 121 and the MaxMsgLen is 120, it will evaluate the number of messages to 2.  It's a sort of one-off calculation.
William Grim
IT Associate, Morgan Stanley