Strip HTML tags on the shell

Sometimes I need to remove tags HTML page that I fetched with curl on the command line. It's pretty easy to do with html2text:

$ curl -s example.org | html2text

html2text reflows the text and changes line breaks. When using sed the line breaks are kept:

$ curl -s example.org | sed 's|<[^>]*>||g'

(But CSS style tag content is not removed)

Written by Christian Weiske.

Comments? Please send an e-mail.