October 15, 2008

grep instead of python?

Filed under: automation, bash, cool, linux, one-liner, tip — michaelangela @ 5:29 pm

I have an xml file with a listing of images that I need for a site. Typically I fire up IPython with Amara and do my xml wrangling there. But I also needed the files. I thought about using python to also grab and save the files but a little search led to this post. Ironically, while the tip to use Python is good, there was a tip to use curl to get the feed, grep to parse the images, and finally xargs to feed it to wget for downloading.

Thinking Serious » Using Python to Grab Images From a Web Site

curl -s http://99designs.com/contests/6999/feed | grep -Po “src=\”.*(png|jpg)” | grep -o “http.*” | xargs wget -q

My situation is a bit different though. There are no extensions on the files, they are in a tag, and I need to rename them with extensions. After a little Googling I used this which worked very well.

curl -s http://domain.tld/feed | egrep -o "<tag>.*</tag>" | egrep -o "<tag>(http.*)</tag>" | sed -e 's/<[^>]*>//g'
for f in *; do mv ./"$f" "${f}.jpg"; done

I still need to do some xml wrangling with Amara but the files are now just need to be moved to the right directories. Nice.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: