• Welcome to Computer Association of SIUE - Forums.
 

grep?

Started by derrickb52, 2005-06-14T09:19:39-05:00 (Tuesday)

Previous topic - Next topic

derrickb52

Anyone out there know grep? Specifically, I'm trying to search for all occurrences of "<img" in a file that have a closing tag with no preceding frontslash:
<img src="blah">
vs.
<img src="blah2"/>

I can't seem to get grep to parse the file and ignore newlines.

Here is what I was trying:

grep -e "<[iI][mM][gG][^>/]*" -U myfile.cfm

Also, how can I search for an expression that contains single or double quotes?

So frustrating. :evil:

FYI: I'm just doing this to get some practice with RE's, while at the same time validating for XHTML.

Also, it seems like when I preview the post, it changes the text in this box to the text in the preview (i.e. "& gt;" gets changed to '>')


EDIT:
To be more specific on my reasoning, many of the image tags are broken by newlines:
<img src="lotsoftext.jpg" eventhandler1 eventhandler2
some-more-text blah blah
two-lines-later />

Matthew Thomas

If it's just one file that you are finding this on, why not try sed?
Superman wears Jack Bauer pajamas

derrickb52

It's for many files, but I will take a look at using sed as an alternative if I can't use grep.

William Grim

grep -rlivU '<.*img.*/.*>'

I don't know if you need all those .* matches though... the second one may be enough for you.

That will find all the files that have .  It will ignore lines that have .

You said "preceding" slash, but I think you meant "succeeding" slash.

If you are wanting to check pairs of IMG tags, you will need to create a more advanced pattern-matching script.

To find quotes, you can either escape the quotes or use character classes.  If you escape, you need to use three backslashes if your pattern is surrounded in double-quotes.  You should really be quoting your pattern in single quotes, because it gets rid of shell substitution.
William Grim
IT Associate, Morgan Stanley

derrickb52

Maybe the problem is that I am using a DOS version of grep (I don't have access to an alternative OS). That command gives me an error: "The filename, directory name, or volume label syntax is incorrect."

The command runs fine if I use double-quotes around the expression- except that it seems to return all the files in the directory.

The slash succeeds the opening brace '<' and precedes the terminating brace '>', or am I all mixed up? Perhaps I worded my sentence ambiguously.

What do you mean by "pairs" of img tags?

The syntax: grep -liU "" *.cfm comes AWFULLY close to what I want, except it just doesn't seem to be using the 'U' option, or I am using it wrong! Argh!

EDIT:

I just realized that if there are any '/' characters in the image tag itself, such as , which do not IMMEDIATELY precede the closing brace, they will break my grep statement... doh.

EDIT 2:

Let me clarify for anyone who plans to nitpick... by alternative OS, I meant Non-M$ :P

William Grim

The -v option on Unix/Cygwin means to ignore lines that match the pattern given, and the -U option means to operate on binary files.  I don't know anything about the DOS version.

You are still double-quoting your pattern.  The '*' could be globbing file names into there.

grep -liU "" *.cfm

I don't know about regex in the DOS version of grep, but in pretty much all other regex variants, your regex wouldn't find what you want either.

The regex I gave you is correct, but here is a shortened version that works fine with your examples:

grep -rlUi ''

Also, the '.' matches any character.

When I was talking about pairs of IMG tags, ignore that.  I forgot that IMG tags don't go in pairs.

Oh, and once again, STOP DOUBLE-QUOTING YOUR REGEX.  It will help you a lot.
William Grim
IT Associate, Morgan Stanley

derrickb52

It doesn't like when I use single quotes around the RE. Your RE. My RE. Nada. Every attempt at using them returns an error. I would stop using double quotes if I could, I promise. I swear I'm not ignoring your advice to be stubborn!

William Grim

Oh yeah, I forgot that DOS doesn't do substitution between double-quotes.  So you should be fine using them... sorry about that.

Did the new regex work for you?
William Grim
IT Associate, Morgan Stanley

derrickb52

Yes and no...

It is returning every line that contains "<img" and ends with '>'.

Including:
-- non XHTML
and
-- XHTML

But not returning lines that contain "<img" and are broken by a CR+LF:

onmouseover="func1()"/>
and
onmouseover="func1()">

William Grim

Then you need to either use an "or" (pipe symbol) for getting newlines, or you need to set grep to include EOLs in the '.' character class.

You need to read all your regex documentation on this, and you need to see if grep has a switch that makes this relatively easy.

I have already given you some regex that works in all the examples you had previously mentioned; the rest is up to you.
William Grim
IT Associate, Morgan Stanley

derrickb52