sfeed

RSS and Atom parser
git clone git://git.codemadness.org/sfeed
Log | Files | Refs | README | LICENSE

README (34641B)


      1 sfeed
      2 -----
      3 
      4 RSS and Atom parser (and some format programs).
      5 
      6 It converts RSS or Atom feeds from XML to a TAB-separated file. There are
      7 formatting programs included to convert this TAB-separated format to various
      8 other formats. There are also some programs and scripts included to import and
      9 export OPML and to fetch, filter, merge and order feed items.
     10 
     11 
     12 Build and install
     13 -----------------
     14 
     15 $ make
     16 # make install
     17 
     18 
     19 To build sfeed without sfeed_curses set SFEED_CURSES to an empty string:
     20 
     21 $ make SFEED_CURSES=""
     22 # make SFEED_CURSES="" install
     23 
     24 
     25 To change the theme for sfeed_curses you can set SFEED_THEME.  See the themes/
     26 directory for the theme names.
     27 
     28 $ make SFEED_THEME="templeos"
     29 # make SFEED_THEME="templeos" install
     30 
     31 
     32 Usage
     33 -----
     34 
     35 Initial setup:
     36 
     37 	mkdir -p "$HOME/.sfeed/feeds"
     38 	cp sfeedrc.example "$HOME/.sfeed/sfeedrc"
     39 
     40 Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. This file
     41 is included and evaluated as a shellscript for sfeed_update, so its functions
     42 and behaviour can be overridden:
     43 
     44 	$EDITOR "$HOME/.sfeed/sfeedrc"
     45 
     46 or you can import existing OPML subscriptions using sfeed_opml_import(1):
     47 
     48 	sfeed_opml_import < file.opml > "$HOME/.sfeed/sfeedrc"
     49 
     50 an example to export from an other RSS/Atom reader called newsboat and import
     51 for sfeed_update:
     52 
     53 	newsboat -e | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"
     54 
     55 an example to export from an other RSS/Atom reader called rss2email (3.x+) and
     56 import for sfeed_update:
     57 
     58 	r2e opmlexport | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"
     59 
     60 Update feeds, this script merges the new items, see sfeed_update(1) for more
     61 information what it can do:
     62 
     63 	sfeed_update
     64 
     65 Format feeds:
     66 
     67 Plain-text list:
     68 
     69 	sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt"
     70 
     71 HTML view (no frames), copy style.css for a default style:
     72 
     73 	cp style.css "$HOME/.sfeed/style.css"
     74 	sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html"
     75 
     76 HTML view with the menu as frames, copy style.css for a default style:
     77 
     78 	mkdir -p "$HOME/.sfeed/frames"
     79 	cp style.css "$HOME/.sfeed/frames/style.css"
     80 	cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/*
     81 
     82 To automatically update your feeds periodically and format them in a way you
     83 like you can make a wrapper script and add it as a cronjob.
     84 
     85 Most protocols are supported because curl(1) is used by default and also proxy
     86 settings from the environment (such as the $http_proxy environment variable)
     87 are used.
     88 
     89 The sfeed(1) program itself is just a parser that parses XML data from stdin
     90 and is therefore network protocol-agnostic. It can be used with HTTP, HTTPS,
     91 Gopher, SSH, etc.
     92 
     93 See the section "Usage and examples" below and the man-pages for more
     94 information how to use sfeed(1) and the additional tools.
     95 
     96 
     97 Dependencies
     98 ------------
     99 
    100 - C compiler (C99).
    101 - libc (recommended: C99 and POSIX >= 200809).
    102 
    103 
    104 Optional dependencies
    105 ---------------------
    106 
    107 - POSIX make(1) for the Makefile.
    108 - POSIX sh(1),
    109   used by sfeed_update(1) and sfeed_opml_export(1).
    110 - POSIX utilities such as awk(1) and sort(1),
    111   used by sfeed_content(1), sfeed_markread(1), sfeed_opml_export(1) and
    112   sfeed_update(1).
    113 - curl(1) binary: https://curl.haxx.se/ ,
    114   used by sfeed_update(1), but can be replaced with any tool like wget(1),
    115   OpenBSD ftp(1) or hurl(1): https://git.codemadness.org/hurl/
    116 - iconv(1) command-line utilities,
    117   used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8
    118   encoded then you don't need this. For a minimal iconv implementation:
    119   https://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c
    120 - xargs with support for the -P and -0 option,
    121   used by sfeed_update(1).
    122 - mandoc for documentation: https://mdocml.bsd.lv/
    123 - curses (typically ncurses), otherwise see minicurses.h,
    124   used by sfeed_curses(1).
    125 - a terminal (emulator) supporting UTF-8 and the used capabilities,
    126   used by sfeed_curses(1).
    127 
    128 
    129 Optional run-time dependencies for sfeed_curses
    130 -----------------------------------------------
    131 
    132 - xclip for yanking the URL or enclosure. See $SFEED_YANKER to change it.
    133 - xdg-open, used as a plumber by default. See $SFEED_PLUMBER to change it.
    134 - awk, used by the sfeed_content and sfeed_markread script.
    135   See the ENVIRONMENT VARIABLES section in the man page to change it.
    136 - lynx, used by the sfeed_content script to convert HTML content.
    137   See the ENVIRONMENT VARIABLES section in the man page to change it.
    138 
    139 
    140 Formats supported
    141 -----------------
    142 
    143 sfeed supports a subset of XML 1.0 and a subset of:
    144 
    145 - Atom 1.0 (RFC 4287): https://datatracker.ietf.org/doc/html/rfc4287
    146 - Atom 0.3 (draft, historic).
    147 - RSS 0.90+.
    148 - RDF (when used with RSS).
    149 - MediaRSS extensions (media:).
    150 - Dublin Core extensions (dc:).
    151 
    152 Other formats like JSON Feed, twtxt or certain RSS/Atom extensions are
    153 supported by converting them to RSS/Atom or to the sfeed(5) format directly.
    154 
    155 
    156 OS tested
    157 ---------
    158 
    159 - Linux,
    160   compilers: clang, gcc, chibicc, cproc, lacc, pcc, scc, tcc,
    161   libc: glibc, musl.
    162 - OpenBSD (clang, gcc).
    163 - NetBSD (with NetBSD curses).
    164 - FreeBSD
    165 - DragonFlyBSD
    166 - GNU/Hurd
    167 - Illumos (OpenIndiana).
    168 - Windows (cygwin gcc + mintty, mingw).
    169 - HaikuOS
    170 - SerenityOS
    171 - FreeDOS (djgpp, Open Watcom).
    172 - FUZIX (sdcc -mz80, with the sfeed parser program).
    173 
    174 
    175 Architectures tested
    176 --------------------
    177 
    178 amd64, ARM, aarch64, HPPA, i386, MIPS32-BE, RISCV64, SPARC64, Z80.
    179 
    180 
    181 Files
    182 -----
    183 
    184 sfeed             - Read XML RSS or Atom feed data from stdin. Write feed data
    185                     in TAB-separated format to stdout.
    186 sfeed_atom        - Format feed data (TSV) to an Atom feed.
    187 sfeed_content     - View item content, for use with sfeed_curses.
    188 sfeed_curses      - Format feed data (TSV) to a curses interface.
    189 sfeed_frames      - Format feed data (TSV) to HTML file(s) with frames.
    190 sfeed_gopher      - Format feed data (TSV) to Gopher files.
    191 sfeed_html        - Format feed data (TSV) to HTML.
    192 sfeed_json        - Format feed data (TSV) to JSON Feed.
    193 sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file.
    194 sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file.
    195 sfeed_markread    - Mark items as read/unread, for use with sfeed_curses.
    196 sfeed_mbox        - Format feed data (TSV) to mbox.
    197 sfeed_plain       - Format feed data (TSV) to a plain-text list.
    198 sfeed_twtxt       - Format feed data (TSV) to a twtxt feed.
    199 sfeed_update      - Update feeds and merge items.
    200 sfeed_web         - Find URLs to RSS/Atom feed from a webpage.
    201 sfeed_xmlenc      - Detect character-set encoding from a XML stream.
    202 sfeedrc.example   - Example config file. Can be copied to $HOME/.sfeed/sfeedrc.
    203 style.css         - Example stylesheet to use with sfeed_html(1) and
    204                     sfeed_frames(1).
    205 
    206 
    207 Files read at runtime by sfeed_update(1)
    208 ----------------------------------------
    209 
    210 sfeedrc - Config file. This file is evaluated as a shellscript in
    211           sfeed_update(1).
    212 
    213 At least the following functions can be overridden per feed:
    214 
    215 - fetch: to use wget(1), OpenBSD ftp(1) or an other download program.
    216 - filter: to filter on fields.
    217 - merge: to change the merge logic.
    218 - order: to change the sort order.
    219 
    220 See also the sfeedrc(5) man page documentation for more details.
    221 
    222 The feeds() function is called to process the feeds. The default feed()
    223 function is executed concurrently as a background job in your sfeedrc(5) config
    224 file to make updating faster. The variable maxjobs can be changed to limit or
    225 increase the amount of concurrent jobs (8 by default).
    226 
    227 
    228 Files written at runtime by sfeed_update(1)
    229 -------------------------------------------
    230 
    231 feedname     - TAB-separated format containing all items per feed. The
    232                sfeed_update(1) script merges new items with this file.
    233                The format is documented in sfeed(5).
    234 
    235 
    236 File format
    237 -----------
    238 
    239 man 5 sfeed
    240 man 5 sfeedrc
    241 man 1 sfeed
    242 
    243 
    244 Usage and examples
    245 ------------------
    246 
    247 Find RSS/Atom feed URLs from a webpage:
    248 
    249 	url="https://codemadness.org"; curl -L -s "$url" | sfeed_web "$url"
    250 
    251 output example:
    252 
    253 	https://codemadness.org/atom.xml	application/atom+xml
    254 	https://codemadness.org/atom_content.xml	application/atom+xml
    255 
    256 - - -
    257 
    258 Make sure your sfeedrc config file exists, see the sfeedrc.example file. To
    259 update your feeds (configfile argument is optional):
    260 
    261 	sfeed_update "configfile"
    262 
    263 Format the feeds files:
    264 
    265 	# Plain-text list.
    266 	sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt
    267 	# HTML view (no frames), copy style.css for a default style.
    268 	sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html
    269 	# HTML view with the menu as frames, copy style.css for a default style.
    270 	mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/*
    271 
    272 View formatted output in your browser:
    273 
    274 	$BROWSER "$HOME/.sfeed/feeds.html"
    275 
    276 View formatted output in your editor:
    277 
    278 	$EDITOR "$HOME/.sfeed/feeds.txt"
    279 
    280 - - -
    281 
    282 View formatted output in a curses interface.  The interface has a look inspired
    283 by the mutt mail client.  It has a sidebar panel for the feeds, a panel with a
    284 listing of the items and a small statusbar for the selected item/URL. Some
    285 functions like searching and scrolling are integrated in the interface itself.
    286 
    287 Just like the other format programs included in sfeed you can run it like this:
    288 
    289 	sfeed_curses ~/.sfeed/feeds/*
    290 
    291 ... or by reading from stdin:
    292 
    293 	sfeed_curses < ~/.sfeed/feeds/xkcd
    294 
    295 By default sfeed_curses marks the items of the last day as new/bold. This limit
    296 might be overridden by setting the environment variable $SFEED_NEW_AGE to the
    297 desired maximum in seconds. To manage read/unread items in a different way a
    298 plain-text file with a list of the read URLs can be used. To enable this
    299 behaviour the path to this file can be specified by setting the environment
    300 variable $SFEED_URL_FILE to the URL file:
    301 
    302 	export SFEED_URL_FILE="$HOME/.sfeed/urls"
    303 	[ -f "$SFEED_URL_FILE" ] || touch "$SFEED_URL_FILE"
    304 	sfeed_curses ~/.sfeed/feeds/*
    305 
    306 It then uses the shellscript "sfeed_markread" to process the read and unread
    307 items.
    308 
    309 - - -
    310 
    311 Example script to view feed items in a vertical list/menu in dmenu(1). It opens
    312 the selected URL in the browser set in $BROWSER:
    313 
    314 	#!/bin/sh
    315 	url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \
    316 		sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p')
    317 	test -n "${url}" && $BROWSER "${url}"
    318 
    319 dmenu can be found at: https://git.suckless.org/dmenu/
    320 
    321 - - -
    322 
    323 Generate a sfeedrc config file from your exported list of feeds in OPML
    324 format:
    325 
    326 	sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc
    327 
    328 - - -
    329 
    330 Export an OPML file of your feeds from a sfeedrc config file (configfile
    331 argument is optional):
    332 
    333 	sfeed_opml_export configfile > myfeeds.opml
    334 
    335 - - -
    336 
    337 The filter function can be overridden in your sfeedrc file. This allows
    338 filtering items per feed. It can be used to shorten URLs, filter away
    339 advertisements, strip tracking parameters and more.
    340 
    341 	# filter fields.
    342 	# filter(name, url)
    343 	filter() {
    344 		case "$1" in
    345 		"tweakers")
    346 			awk -F '\t' 'BEGIN { OFS = "\t"; }
    347 			# skip ads.
    348 			$2 ~ /^ADV:/ {
    349 				next;
    350 			}
    351 			# shorten link.
    352 			{
    353 				if (match($3, /^https:\/\/tweakers\.net\/[a-z]+\/[0-9]+\//)) {
    354 					$3 = substr($3, RSTART, RLENGTH);
    355 				}
    356 				print $0;
    357 			}';;
    358 		"yt BSDNow")
    359 			# filter only BSD Now from channel.
    360 			awk -F '\t' '$2 ~ / \| BSD Now/';;
    361 		*)
    362 			cat;;
    363 		esac | \
    364 			# replace youtube links with embed links.
    365 			sed 's@www.youtube.com/watch?v=@www.youtube.com/embed/@g' | \
    366 
    367 			awk -F '\t' 'BEGIN { OFS = "\t"; }
    368 			function filterlink(s) {
    369 				# protocol must start with http, https or gopher.
    370 				if (match(s, /^(http|https|gopher):\/\//) == 0) {
    371 					return "";
    372 				}
    373 
    374 				# shorten feedburner links.
    375 				if (match(s, /^(http|https):\/\/[^\/]+\/~r\/.*\/~3\/[^\/]+\//)) {
    376 					s = substr($3, RSTART, RLENGTH);
    377 				}
    378 
    379 				# strip tracking parameters
    380 				# urchin, facebook, piwik, webtrekk and generic.
    381 				gsub(/\?(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "?", s);
    382 				gsub(/&(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "", s);
    383 
    384 				gsub(/\?&/, "?", s);
    385 				gsub(/[\?&]+$/, "", s);
    386 
    387 				return s
    388 			}
    389 			{
    390 				$3 = filterlink($3); # link
    391 				$8 = filterlink($8); # enclosure
    392 
    393 				# try to remove tracking pixels: <img/> tags with 1px width or height.
    394 				gsub("<img[^>]*(width|height)[[:space:]]*=[[:space:]]*[\"'"'"' ]?1[\"'"'"' ]?[^0-9>]+[^>]*>", "", $4);
    395 
    396 				print $0;
    397 			}'
    398 	}
    399 
    400 - - -
    401 
    402 Aggregate feeds. This filters new entries (maximum one day old) and sorts them
    403 by newest first. Prefix the feed name in the title. Convert the TSV output data
    404 to an Atom XML feed (again):
    405 
    406 	#!/bin/sh
    407 	cd ~/.sfeed/feeds/ || exit 1
    408 
    409 	awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
    410 	BEGIN {	OFS = "\t"; }
    411 	int($1) >= old {
    412 		$2 = "[" FILENAME "] " $2;
    413 		print $0;
    414 	}' * | \
    415 	sort -k1,1rn | \
    416 	sfeed_atom
    417 
    418 - - -
    419 
    420 To have a "tail(1) -f"-like FIFO stream filtering for new unique feed items and
    421 showing them as plain-text per line similar to sfeed_plain(1):
    422 
    423 Create a FIFO:
    424 
    425 	fifo="/tmp/sfeed_fifo"
    426 	mkfifo "$fifo"
    427 
    428 On the reading side:
    429 
    430 	# This keeps track of unique lines so might consume much memory.
    431 	# It tries to reopen the $fifo after 1 second if it fails.
    432 	while :; do cat "$fifo" || sleep 1; done | awk '!x[$0]++'
    433 
    434 On the writing side:
    435 
    436 	feedsdir="$HOME/.sfeed/feeds/"
    437 	cd "$feedsdir" || exit 1
    438 	test -p "$fifo" || exit 1
    439 
    440 	# 1 day is old news, don't write older items.
    441 	awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
    442 	BEGIN { OFS = "\t"; }
    443 	int($1) >= old {
    444 		$2 = "[" FILENAME "] " $2;
    445 		print $0;
    446 	}' * | sort -k1,1n | sfeed_plain | cut -b 3- > "$fifo"
    447 
    448 cut -b is used to trim the "N " prefix of sfeed_plain(1).
    449 
    450 - - -
    451 
    452 For some podcast feed the following code can be used to filter the latest
    453 enclosure URL (probably some audio file):
    454 
    455 	awk -F '\t' 'BEGIN { latest = 0; }
    456 	length($8) {
    457 		ts = int($1);
    458 		if (ts > latest) {
    459 			url = $8;
    460 			latest = ts;
    461 		}
    462 	}
    463 	END { if (length(url)) { print url; } }'
    464 
    465 ... or on a file already sorted from newest to oldest:
    466 
    467 	awk -F '\t' '$8 { print $8; exit }'
    468 
    469 - - -
    470 
    471 Over time your feeds file might become quite big. You can archive items of a
    472 feed from (roughly) the last week by doing for example:
    473 
    474 	awk -F '\t' -v "old=$(($(date +'%s') - 604800))" 'int($1) > old' < feed > feed.new
    475 	mv feed feed.bak
    476 	mv feed.new feed
    477 
    478 This could also be run weekly in a crontab to archive the feeds. Like throwing
    479 away old newspapers. It keeps the feeds list tidy and the formatted output
    480 small.
    481 
    482 - - -
    483 
    484 Convert mbox to separate maildirs per feed and filter duplicate messages using the
    485 fdm program.
    486 fdm is available at: https://github.com/nicm/fdm
    487 
    488 fdm config file (~/.sfeed/fdm.conf):
    489 
    490 	set unmatched-mail keep
    491 
    492 	account "sfeed" mbox "%[home]/.sfeed/mbox"
    493 		$cachepath = "%[home]/.sfeed/fdm.cache"
    494 		cache "${cachepath}"
    495 		$maildir = "%[home]/feeds/"
    496 
    497 		# Check if message is in the cache by Message-ID.
    498 		match case "^Message-ID: (.*)" in headers
    499 			action {
    500 				tag "msgid" value "%1"
    501 			}
    502 			continue
    503 
    504 		# If it is in the cache, stop.
    505 		match matched and in-cache "${cachepath}" key "%[msgid]"
    506 			action {
    507 				keep
    508 			}
    509 
    510 		# Not in the cache, process it and add to cache.
    511 		match case "^X-Feedname: (.*)" in headers
    512 			action {
    513 				# Store to local maildir.
    514 				maildir "${maildir}%1"
    515 
    516 				add-to-cache "${cachepath}" key "%[msgid]"
    517 				keep
    518 			}
    519 
    520 Now run:
    521 
    522 	$ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
    523 	$ fdm -f ~/.sfeed/fdm.conf fetch
    524 
    525 Now you can view feeds in mutt(1) for example.
    526 
    527 - - -
    528 
    529 Read from mbox and filter duplicate messages using the fdm program and deliver
    530 it to a SMTP server. This works similar to the rss2email program.
    531 fdm is available at: https://github.com/nicm/fdm
    532 
    533 fdm config file (~/.sfeed/fdm.conf):
    534 
    535 	set unmatched-mail keep
    536 
    537 	account "sfeed" mbox "%[home]/.sfeed/mbox"
    538 		$cachepath = "%[home]/.sfeed/fdm.cache"
    539 		cache "${cachepath}"
    540 
    541 		# Check if message is in the cache by Message-ID.
    542 		match case "^Message-ID: (.*)" in headers
    543 			action {
    544 				tag "msgid" value "%1"
    545 			}
    546 			continue
    547 
    548 		# If it is in the cache, stop.
    549 		match matched and in-cache "${cachepath}" key "%[msgid]"
    550 			action {
    551 				keep
    552 			}
    553 
    554 		# Not in the cache, process it and add to cache.
    555 		match case "^X-Feedname: (.*)" in headers
    556 			action {
    557 				# Connect to a SMTP server and attempt to deliver the
    558 				# mail to it.
    559 				# Of course change the server and e-mail below.
    560 				smtp server "codemadness.org" to "hiltjo@codemadness.org"
    561 
    562 				add-to-cache "${cachepath}" key "%[msgid]"
    563 				keep
    564 			}
    565 
    566 Now run:
    567 
    568 	$ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
    569 	$ fdm -f ~/.sfeed/fdm.conf fetch
    570 
    571 Now you can view feeds in mutt(1) for example.
    572 
    573 - - -
    574 
    575 Convert mbox to separate maildirs per feed and filter duplicate messages using
    576 procmail(1).
    577 
    578 procmail_maildirs.sh file:
    579 
    580 	maildir="$HOME/feeds"
    581 	feedsdir="$HOME/.sfeed/feeds"
    582 	procmailconfig="$HOME/.sfeed/procmailrc"
    583 
    584 	# message-id cache to prevent duplicates.
    585 	mkdir -p "${maildir}/.cache"
    586 
    587 	if ! test -r "${procmailconfig}"; then
    588 		printf "Procmail configuration file \"%s\" does not exist or is not readable.\n" "${procmailconfig}" >&2
    589 		echo "See procmailrc.example for an example." >&2
    590 		exit 1
    591 	fi
    592 
    593 	find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do
    594 		name=$(basename "${d}")
    595 		mkdir -p "${maildir}/${name}/cur"
    596 		mkdir -p "${maildir}/${name}/new"
    597 		mkdir -p "${maildir}/${name}/tmp"
    598 		printf 'Mailbox %s\n' "${name}"
    599 		sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}"
    600 	done
    601 
    602 Procmailrc(5) file:
    603 
    604 	# Example for use with sfeed_mbox(1).
    605 	# The header X-Feedname is used to split into separate maildirs. It is
    606 	# assumed this name is sane.
    607 
    608 	MAILDIR="$HOME/feeds/"
    609 
    610 	:0
    611 	* ^X-Feedname: \/.*
    612 	{
    613 		FEED="$MATCH"
    614 
    615 		:0 Wh: "msgid_$FEED.lock"
    616 		| formail -D 1024000 ".cache/msgid_$FEED.cache"
    617 
    618 		:0
    619 		"$FEED"/
    620 	}
    621 
    622 Now run:
    623 
    624 	$ procmail_maildirs.sh
    625 
    626 Now you can view feeds in mutt(1) for example.
    627 
    628 - - -
    629 
    630 The fetch function can be overridden in your sfeedrc file. This allows to
    631 replace the default curl(1) for sfeed_update with any other client to fetch the
    632 RSS/Atom data or change the default curl options:
    633 
    634 	# fetch a feed via HTTP/HTTPS etc.
    635 	# fetch(name, url, feedfile)
    636 	fetch() {
    637 		hurl -m 1048576 -t 15 "$2" 2>/dev/null
    638 	}
    639 
    640 - - -
    641 
    642 Caching, incremental data updates and bandwidth-saving
    643 
    644 For servers that support it some incremental updates and bandwidth-saving can
    645 be done by using the "ETag" HTTP header.
    646 
    647 Create a directory for storing the ETags per feed:
    648 
    649 	mkdir -p ~/.sfeed/etags/
    650 
    651 The curl ETag options (--etag-save and --etag-compare) can be used to store and
    652 send the previous ETag header value. curl version 7.73+ is recommended for it
    653 to work properly.
    654 
    655 The curl -z option can be used to send the modification date of a local file as
    656 a HTTP "If-Modified-Since" request header. The server can then respond if the
    657 data is modified or not or respond with only the incremental data.
    658 
    659 The curl --compressed option can be used to indicate the client supports
    660 decompression. Because RSS/Atom feeds are textual XML content this generally
    661 compresses very well.
    662 
    663 These options can be set by overriding the fetch() function in the sfeedrc
    664 file:
    665 
    666 	# fetch(name, url, feedfile)
    667 	fetch() {
    668 		etag="$HOME/.sfeed/etags/$(basename "$3")"
    669 		curl \
    670 			-L --max-redirs 0 -H "User-Agent:" -f -s -m 15 \
    671 			--compressed \
    672 			--etag-save "${etag}" --etag-compare "${etag}" \
    673 			-z "${etag}" \
    674 			"$2" 2>/dev/null
    675 	}
    676 
    677 These options can come at a cost of some privacy, because it exposes
    678 additional metadata from the previous request.
    679 
    680 - - -
    681 
    682 CDNs blocking requests due to a missing HTTP User-Agent request header
    683 
    684 sfeed_update will not send the "User-Agent" header by default for privacy
    685 reasons.  Some CDNs like Cloudflare or websites like Reddit.com don't like this
    686 and will block such HTTP requests.
    687 
    688 A custom User-Agent can be set by using the curl -H option, like so:
    689 
    690 	curl -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'
    691 
    692 The above example string pretends to be a Windows 10 (x86-64) machine running
    693 Firefox 78.
    694 
    695 - - -
    696 
    697 Page redirects
    698 
    699 For security and efficiency reasons by default redirects are not allowed and
    700 are treated as an error.
    701 
    702 For example to prevent hijacking an unencrypted http:// to https:// redirect or
    703 to not add time of an unnecessary page redirect each time.  It is encouraged to
    704 use the final redirected URL in the sfeedrc config file.
    705 
    706 If you want to ignore this advise you can override the fetch() function in the
    707 sfeedrc file and change the curl options "-L --max-redirs 0".
    708 
    709 - - -
    710 
    711 Shellscript to handle URLs and enclosures in parallel using xargs -P.
    712 
    713 This can be used to download and process URLs for downloading podcasts,
    714 webcomics, download and convert webpages, mirror videos, etc. It uses a
    715 plain-text cache file for remembering processed URLs. The match patterns are
    716 defined in the shellscript fetch() function and in the awk script and can be
    717 modified to handle items differently depending on their context.
    718 
    719 The arguments for the script are files in the sfeed(5) format. If no file
    720 arguments are specified then the data is read from stdin.
    721 
    722 	#!/bin/sh
    723 	# sfeed_download: downloader for URLs and enclosures in sfeed(5) files.
    724 	# Dependencies: awk, curl, flock, xargs (-P), yt-dlp.
    725 	
    726 	cachefile="${SFEED_CACHEFILE:-$HOME/.sfeed/downloaded_urls}"
    727 	jobs="${SFEED_JOBS:-4}"
    728 	lockfile="${HOME}/.sfeed/sfeed_download.lock"
    729 	
    730 	# log(feedname, s, status)
    731 	log() {
    732 		if [ "$1" != "-" ]; then
    733 			s="[$1] $2"
    734 		else
    735 			s="$2"
    736 		fi
    737 		printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3"
    738 	}
    739 	
    740 	# fetch(url, feedname)
    741 	fetch() {
    742 		case "$1" in
    743 		*youtube.com*)
    744 			yt-dlp "$1";;
    745 		*.flac|*.ogg|*.m3u|*.m3u8|*.m4a|*.mkv|*.mp3|*.mp4|*.wav|*.webm)
    746 			# allow 2 redirects, hide User-Agent, connect timeout is 15 seconds.
    747 			curl -O -L --max-redirs 2 -H "User-Agent:" -f -s --connect-timeout 15 "$1";;
    748 		esac
    749 	}
    750 	
    751 	# downloader(url, title, feedname)
    752 	downloader() {
    753 		url="$1"
    754 		title="$2"
    755 		feedname="${3##*/}"
    756 	
    757 		msg="${title}: ${url}"
    758 	
    759 		# download directory.
    760 		if [ "${feedname}" != "-" ]; then
    761 			mkdir -p "${feedname}"
    762 			if ! cd "${feedname}"; then
    763 				log "${feedname}" "${msg}: ${feedname}" "DIR FAIL" >&2
    764 				return 1
    765 			fi
    766 		fi
    767 	
    768 		log "${feedname}" "${msg}" "START"
    769 		if fetch "${url}" "${feedname}"; then
    770 			log "${feedname}" "${msg}" "OK"
    771 	
    772 			# append it safely in parallel to the cachefile on a
    773 			# successful download.
    774 			(flock 9 || exit 1
    775 			printf '%s\n' "${url}" >> "${cachefile}"
    776 			) 9>"${lockfile}"
    777 		else
    778 			log "${feedname}" "${msg}" "FAIL" >&2
    779 			return 1
    780 		fi
    781 		return 0
    782 	}
    783 	
    784 	if [ "${SFEED_DOWNLOAD_CHILD}" = "1" ]; then
    785 		# Downloader helper for parallel downloading.
    786 		# Receives arguments: $1 = URL, $2 = title, $3 = feed filename or "-".
    787 		# It should write the URI to the cachefile if it is successful.
    788 		downloader "$1" "$2" "$3"
    789 		exit $?
    790 	fi
    791 	
    792 	# ...else parent mode:
    793 	
    794 	tmp="$(mktemp)" || exit 1
    795 	trap "rm -f ${tmp}" EXIT
    796 	
    797 	[ -f "${cachefile}" ] || touch "${cachefile}"
    798 	cat "${cachefile}" > "${tmp}"
    799 	echo >> "${tmp}" # force it to have one line for awk.
    800 	
    801 	LC_ALL=C awk -F '\t' '
    802 	# fast prefilter what to download or not.
    803 	function filter(url, field, feedname) {
    804 		u = tolower(url);
    805 		return (match(u, "youtube\\.com") ||
    806 		        match(u, "\\.(flac|ogg|m3u|m3u8|m4a|mkv|mp3|mp4|wav|webm)$"));
    807 	}
    808 	function download(url, field, title, filename) {
    809 		if (!length(url) || urls[url] || !filter(url, field, filename))
    810 			return;
    811 		# NUL-separated for xargs -0.
    812 		printf("%s%c%s%c%s%c", url, 0, title, 0, filename, 0);
    813 		urls[url] = 1; # print once
    814 	}
    815 	{
    816 		FILENR += (FNR == 1);
    817 	}
    818 	# lookup table from cachefile which contains downloaded URLs.
    819 	FILENR == 1 {
    820 		urls[$0] = 1;
    821 	}
    822 	# feed file(s).
    823 	FILENR != 1 {
    824 		download($3, 3, $2, FILENAME); # link
    825 		download($8, 8, $2, FILENAME); # enclosure
    826 	}
    827 	' "${tmp}" "${@:--}" | \
    828 	SFEED_DOWNLOAD_CHILD="1" xargs -r -0 -L 3 -P "${jobs}" "$(readlink -f "$0")"
    829 
    830 - - -
    831 
    832 Shellscript to export existing newsboat cached items from sqlite3 to the sfeed
    833 TSV format.
    834 
    835 	#!/bin/sh
    836 	# Export newsbeuter/newsboat cached items from sqlite3 to the sfeed TSV format.
    837 	# The data is split per file per feed with the name of the newsboat title/url.
    838 	# It writes the URLs of the read items line by line to a "urls" file.
    839 	#
    840 	# Dependencies: sqlite3, awk.
    841 	#
    842 	# Usage: create some directory to store the feeds then run this script.
    843 	
    844 	# newsboat cache.db file.
    845 	cachefile="$HOME/.newsboat/cache.db"
    846 	test -n "$1" && cachefile="$1"
    847 	
    848 	# dump data.
    849 	# .mode ascii: Columns/rows delimited by 0x1F and 0x1E
    850 	# get the first fields in the order of the sfeed(5) format.
    851 	sqlite3 "$cachefile" <<!EOF |
    852 	.headers off
    853 	.mode ascii
    854 	.output
    855 	SELECT
    856 		i.pubDate, i.title, i.url, i.content, i.content_mime_type,
    857 		i.guid, i.author, i.enclosure_url,
    858 		f.rssurl AS rssurl, f.title AS feedtitle, i.unread
    859 		-- i.id, i.enclosure_type, i.enqueued, i.flags, i.deleted, i.base
    860 	FROM rss_feed f
    861 	INNER JOIN rss_item i ON i.feedurl = f.rssurl
    862 	ORDER BY
    863 		i.feedurl ASC, i.pubDate DESC;
    864 	.quit
    865 	!EOF
    866 	# convert to sfeed(5) TSV format.
    867 	LC_ALL=C awk '
    868 	BEGIN {
    869 		FS = "\x1f";
    870 		RS = "\x1e";
    871 	}
    872 	# normal non-content fields.
    873 	function field(s) {
    874 		gsub("^[[:space:]]*", "", s);
    875 		gsub("[[:space:]]*$", "", s);
    876 		gsub("[[:space:]]", " ", s);
    877 		gsub("[[:cntrl:]]", "", s);
    878 		return s;
    879 	}
    880 	# content field.
    881 	function content(s) {
    882 		gsub("^[[:space:]]*", "", s);
    883 		gsub("[[:space:]]*$", "", s);
    884 		# escape chars in content field.
    885 		gsub("\\\\", "\\\\", s);
    886 		gsub("\n", "\\n", s);
    887 		gsub("\t", "\\t", s);
    888 		return s;
    889 	}
    890 	function feedname(feedurl, feedtitle) {
    891 		if (feedtitle == "") {
    892 			gsub("/", "_", feedurl);
    893 			return feedurl;
    894 		}
    895 		gsub("/", "_", feedtitle);
    896 		return feedtitle;
    897 	}
    898 	{
    899 		fname = feedname($9, $10);
    900 		if (!feed[fname]++) {
    901 			print "Writing file: \"" fname "\" (title: " $10 ", url: " $9 ")" > "/dev/stderr";
    902 		}
    903 	
    904 		contenttype = field($5);
    905 		if (contenttype == "")
    906 			contenttype = "html";
    907 		else if (index(contenttype, "/html") || index(contenttype, "/xhtml"))
    908 			contenttype = "html";
    909 		else
    910 			contenttype = "plain";
    911 	
    912 		print $1 "\t" field($2) "\t" field($3) "\t" content($4) "\t" \
    913 			contenttype "\t" field($6) "\t" field($7) "\t" field($8) "\t" \
    914 			> fname;
    915 	
    916 		# write URLs of the read items to a file line by line.
    917 		if ($11 == "0") {
    918 			print $3 > "urls";
    919 		}
    920 	}'
    921 
    922 - - -
    923 
    924 Progress indicator
    925 ------------------
    926 
    927 The below sfeed_update wrapper script counts the amount of feeds in a sfeedrc
    928 config.  It then calls sfeed_update and pipes the output lines to a function
    929 that counts the current progress. It writes the total progress to stderr.
    930 Alternative: pv -l -s totallines
    931 
    932 	#!/bin/sh
    933 	# Progress indicator script.
    934 	
    935 	# Pass lines as input to stdin and write progress status to stderr.
    936 	# progress(totallines)
    937 	progress() {
    938 		total="$(($1 + 0))" # must be a number, no divide by zero.
    939 		test "${total}" -le 0 -o "$1" != "${total}" && return
    940 	LC_ALL=C awk -v "total=${total}" '
    941 	{
    942 		counter++;
    943 		percent = (counter * 100) / total;
    944 		printf("\033[K") > "/dev/stderr"; # clear EOL
    945 		print $0;
    946 		printf("[%s/%s] %.0f%%\r", counter, total, percent) > "/dev/stderr";
    947 		fflush(); # flush all buffers per line.
    948 	}
    949 	END {
    950 		printf("\033[K") > "/dev/stderr";
    951 	}'
    952 	}
    953 	
    954 	# Counts the feeds from the sfeedrc config.
    955 	countfeeds() {
    956 		count=0
    957 	. "$1"
    958 	feed() {
    959 		count=$((count + 1))
    960 	}
    961 		feeds
    962 		echo "${count}"
    963 	}
    964 	
    965 	config="${1:-$HOME/.sfeed/sfeedrc}"
    966 	total=$(countfeeds "${config}")
    967 	sfeed_update "${config}" 2>&1 | progress "${total}"
    968 
    969 - - -
    970 
    971 Counting unread and total items
    972 -------------------------------
    973 
    974 It can be useful to show the counts of unread items, for example in a
    975 windowmanager or statusbar.
    976 
    977 The below example script counts the items of the last day in the same way the
    978 formatting tools do:
    979 
    980 	#!/bin/sh
    981 	# Count the new items of the last day.
    982 	LC_ALL=C awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
    983 	{
    984 		total++;
    985 	}
    986 	int($1) >= old {
    987 		totalnew++;
    988 	}
    989 	END {
    990 		print "New:   " totalnew;
    991 		print "Total: " total;
    992 	}' ~/.sfeed/feeds/*
    993 
    994 The below example script counts the unread items using the sfeed_curses URL
    995 file:
    996 
    997 	#!/bin/sh
    998 	# Count the unread and total items from feeds using the URL file.
    999 	LC_ALL=C awk -F '\t' '
   1000 	# URL file: amount of fields is 1.
   1001 	NF == 1 {
   1002 		u[$0] = 1; # lookup table of URLs.
   1003 		next;
   1004 	}
   1005 	# feed file: check by URL or id.
   1006 	{
   1007 		total++;
   1008 		if (length($3)) {
   1009 			if (u[$3])
   1010 				read++;
   1011 		} else if (length($6)) {
   1012 			if (u[$6])
   1013 				read++;
   1014 		}
   1015 	}
   1016 	END {
   1017 		print "Unread: " (total - read);
   1018 		print "Total:  " total;
   1019 	}' ~/.sfeed/urls ~/.sfeed/feeds/*
   1020 
   1021 - - -
   1022 
   1023 sfeed.c: adding new XML tags or sfeed(5) fields to the parser
   1024 -------------------------------------------------------------
   1025 
   1026 sfeed.c contains definitions to parse XML tags and map them to sfeed(5) TSV
   1027 fields. Parsed RSS and Atom tag names are first stored as a TagId, which is a
   1028 number.  This TagId is then mapped to the output field index.
   1029 
   1030 Steps to modify the code:
   1031 
   1032 * Add a new TagId enum for the tag.
   1033 
   1034 * (optional) Add a new FeedField* enum for the new output field or you can map
   1035   it to an existing field.
   1036 
   1037 * Add the new XML tag name to the array variable of parsed RSS or Atom
   1038   tags: rsstags[] or atomtags[].
   1039 
   1040   These must be defined in alphabetical order, because a binary search is used
   1041   which uses the strcasecmp() function.
   1042 
   1043 * Add the parsed TagId to the output field in the array variable fieldmap[].
   1044 
   1045   When another tag is also mapped to the same output field then the tag with
   1046   the highest TagId number value overrides the mapped field: the order is from
   1047   least important to high.
   1048 
   1049 * If this defined tag is just using the inner data of the XML tag, then this
   1050   definition is enough. If it for example has to parse a certain attribute you
   1051   have to add a check for the TagId to the xmlattr() callback function.
   1052 
   1053 * (optional) Print the new field in the printfields() function.
   1054 
   1055 Below is a patch example to add the MRSS "media:content" tag as a new field:
   1056 
   1057 diff --git a/sfeed.c b/sfeed.c
   1058 --- a/sfeed.c
   1059 +++ b/sfeed.c
   1060 @@ -50,7 +50,7 @@ enum TagId {
   1061  	RSSTagGuidPermalinkTrue,
   1062  	/* must be defined after GUID, because it can be a link (isPermaLink) */
   1063  	RSSTagLink,
   1064 -	RSSTagEnclosure,
   1065 +	RSSTagMediaContent, RSSTagEnclosure,
   1066  	RSSTagAuthor, RSSTagDccreator,
   1067  	RSSTagCategory,
   1068  	/* Atom */
   1069 @@ -81,7 +81,7 @@ typedef struct field {
   1070  enum {
   1071  	FeedFieldTime = 0, FeedFieldTitle, FeedFieldLink, FeedFieldContent,
   1072  	FeedFieldId, FeedFieldAuthor, FeedFieldEnclosure, FeedFieldCategory,
   1073 -	FeedFieldLast
   1074 +	FeedFieldMediaContent, FeedFieldLast
   1075  };
   1076  
   1077  typedef struct feedcontext {
   1078 @@ -137,6 +137,7 @@ static const FeedTag rsstags[] = {
   1079  	{ STRP("enclosure"),         RSSTagEnclosure         },
   1080  	{ STRP("guid"),              RSSTagGuid              },
   1081  	{ STRP("link"),              RSSTagLink              },
   1082 +	{ STRP("media:content"),     RSSTagMediaContent      },
   1083  	{ STRP("media:description"), RSSTagMediaDescription  },
   1084  	{ STRP("pubdate"),           RSSTagPubdate           },
   1085  	{ STRP("title"),             RSSTagTitle             }
   1086 @@ -180,6 +181,7 @@ static const int fieldmap[TagLast] = {
   1087  	[RSSTagGuidPermalinkFalse] = FeedFieldId,
   1088  	[RSSTagGuidPermalinkTrue]  = FeedFieldId, /* special-case: both a link and an id */
   1089  	[RSSTagLink]               = FeedFieldLink,
   1090 +	[RSSTagMediaContent]       = FeedFieldMediaContent,
   1091  	[RSSTagEnclosure]          = FeedFieldEnclosure,
   1092  	[RSSTagAuthor]             = FeedFieldAuthor,
   1093  	[RSSTagDccreator]          = FeedFieldAuthor,
   1094 @@ -677,6 +679,8 @@ printfields(void)
   1095  	string_print_uri(&ctx.fields[FeedFieldEnclosure].str);
   1096  	putchar(FieldSeparator);
   1097  	string_print_trimmed_multi(&ctx.fields[FeedFieldCategory].str);
   1098 +	putchar(FieldSeparator);
   1099 +	string_print_trimmed(&ctx.fields[FeedFieldMediaContent].str);
   1100  	putchar('\n');
   1101  
   1102  	if (ferror(stdout)) /* check for errors but do not flush */
   1103 @@ -718,7 +722,7 @@ xmlattr(XMLParser *p, const char *t, size_t tl, const char *n, size_t nl,
   1104  	}
   1105  
   1106  	if (ctx.feedtype == FeedTypeRSS) {
   1107 -		if (ctx.tag.id == RSSTagEnclosure &&
   1108 +		if ((ctx.tag.id == RSSTagEnclosure || ctx.tag.id == RSSTagMediaContent) &&
   1109  		    isattr(n, nl, STRP("url"))) {
   1110  			string_append(&tmpstr, v, vl);
   1111  		} else if (ctx.tag.id == RSSTagGuid &&
   1112 
   1113 - - -
   1114 
   1115 Running custom commands inside the sfeed_curses program
   1116 -------------------------------------------------------
   1117 
   1118 Running commands inside the sfeed_curses program can be useful for example to
   1119 sync items or mark all items across all feeds as read. It can be comfortable to
   1120 have a keybind for this inside the program to perform a scripted action and
   1121 then reload the feeds by sending the signal SIGHUP.
   1122 
   1123 In the input handling code you can then add a case:
   1124 
   1125 	case 'M':
   1126 		forkexec((char *[]) { "markallread.sh", NULL }, 0);
   1127 		break;
   1128 
   1129 or
   1130 
   1131 	case 'S':
   1132 		forkexec((char *[]) { "syncnews.sh", NULL }, 1);
   1133 		break;
   1134 
   1135 The specified script should be in $PATH or be an absolute path.
   1136 
   1137 Example of a `markallread.sh` shellscript to mark all URLs as read:
   1138 
   1139 	#!/bin/sh
   1140 	# mark all items/URLs as read.
   1141 	tmp="$(mktemp)" || exit 1
   1142 	(cat ~/.sfeed/urls; cut -f 3 ~/.sfeed/feeds/*) | \
   1143 	awk '!x[$0]++' > "$tmp" &&
   1144 	mv "$tmp" ~/.sfeed/urls &&
   1145 	pkill -SIGHUP sfeed_curses # reload feeds.
   1146 
   1147 Example of a `syncnews.sh` shellscript to update the feeds and reload them:
   1148 
   1149 	#!/bin/sh
   1150 	sfeed_update
   1151 	pkill -SIGHUP sfeed_curses
   1152 
   1153 
   1154 Running programs in a new session
   1155 ---------------------------------
   1156 
   1157 By default processes are spawned in the same session and process group as
   1158 sfeed_curses.  When sfeed_curses is closed this can also close the spawned
   1159 process in some cases.
   1160 
   1161 When the setsid command-line program is available the following wrapper command
   1162 can be used to run the program in a new session, for a plumb program:
   1163 
   1164 	setsid -f xdg-open "$@"
   1165 
   1166 Alternatively the code can be changed to call setsid() before execvp().
   1167 
   1168 
   1169 Open an URL directly in the same terminal
   1170 -----------------------------------------
   1171 
   1172 To open an URL directly in the same terminal using the text-mode lynx browser:
   1173 
   1174 	SFEED_PLUMBER=lynx SFEED_PLUMBER_INTERACTIVE=1 sfeed_curses ~/.sfeed/feeds/*
   1175 
   1176 
   1177 Yank to tmux buffer
   1178 -------------------
   1179 
   1180 This changes the yank command to set the tmux buffer, instead of X11 xclip:
   1181 
   1182 	SFEED_YANKER="tmux set-buffer \`cat\`"
   1183 
   1184 
   1185 Known terminal issues
   1186 ---------------------
   1187 
   1188 Below lists some bugs or missing features in terminals that are found while
   1189 testing sfeed_curses.  Some of them might be fixed already upstream:
   1190 
   1191 - cygwin + mintty: the xterm mouse-encoding of the mouse position is broken for
   1192   scrolling.
   1193 - HaikuOS terminal: the xterm mouse-encoding of the mouse button number of the
   1194   middle-button, right-button is incorrect / reversed.
   1195 - putty: the full reset attribute (ESC c, typically `rs1`) does not reset the
   1196   window title.
   1197 - Mouse button encoding for extended buttons (like side-buttons) in some
   1198   terminals are unsupported or map to the same button: for example side-buttons 7
   1199   and 8 map to the scroll buttons 4 and 5 in urxvt.
   1200 
   1201 
   1202 License
   1203 -------
   1204 
   1205 ISC, see LICENSE file.
   1206 
   1207 
   1208 Author
   1209 ------
   1210 
   1211 Hiltjo Posthuma <hiltjo@codemadness.org>