Models

class resyndicator.models.CustomSession[source]

Session that creates any custom tables.

class resyndicator.models.DefaultSession[source]

Session that creates the default table.

class resyndicator.models.Entry(**kwargs)[source]

Default SQLAlchemy entry representation.

class resyndicator.models.EntryBase[source]

An abstract entry not bound to SQLAlchemy. Subclass it as mix-in together with the CustomBase to create your own database representation. Use Entry to use the default.

Resyndicators

class resyndicator.resyndicators.Resyndicator(title, query, session=<class 'resyndicator.models.DefaultSession'>, past=None, length=30, **kwargs)[source]

The Resyndicator class represents a feed that is generated from the retrieved data on the bases of an SQLAlchemy query. It is identified by the title that you sepecify on instantiation, so do not change it, because that’ll be tantamount to creating a new resyndicator.

class Entry(**kwargs)

Default SQLAlchemy entry representation.

Resyndicator.feed()[source]

Generate the serialized feed.

Resyndicator.get_entries()[source]

Query all relevant entries from the database.

Resyndicator.publish()[source]

Write the serialized feed to a file.

Resyndicator.pubsub(fresh_entries)[source]

Publish new entries to a hub like PubSubHubbub.

Fetchers

Base

class resyndicator.fetchers.base.BaseEntryInterface(fetcher, raw_entry)[source]

Base class for entries.

Subclass this to provide a unified interface to any type of entry you want to import.

class Entry(**kwargs)

Default SQLAlchemy entry representation.

BaseEntryInterface.author

Entry author

BaseEntryInterface.content

Full content of the entry if given

BaseEntryInterface.content_type

Either text or html (important for the Atom output)

BaseEntryInterface.entry

Return the SQLAlchemy entry.

Here, the source property is used to optionally include the entry source code (specified through settings.INCLUDE_SOURCE). If the entry exists, it is returned unchanged. Otherwise, it is initialized with all the new values from the supplier.

BaseEntryInterface.fetched

Fetch time (set to datetime.datetime.utcnow() by default)

BaseEntryInterface.id

A globally unique ID for internal deduplication and identification by feed readers

Entry link

BaseEntryInterface.published

Time the entry was published

BaseEntryInterface.source

Optional field to insert any entry source code into the content field. This can be set through settings.INCLUDE_SOURCE.

BaseEntryInterface.summary

Summary or description of the entry

BaseEntryInterface.summary_type

Either text or html (important for the Atom output)

BaseEntryInterface.title

Entry title

BaseEntryInterface.updated

Time the entry was updated on the supplier side

class resyndicator.fetchers.base.BaseFetcher(url, interval, session=<class 'resyndicator.models.DefaultSession'>, default_tz=<class 'dateutil.tz.tz.tzutc'>, defaults=None, **kwargs)[source]

Base class for fetchers.

Subclass this to implement your own custom types of fetchers.

EntryInterface

alias of BaseEntryInterface

author

The author of the data source

clean()[source]

Reset the fetcher.

entries

Yield the SQLAlchemy entries after setting any default values.

generator

The generator of the data source

hub

Optionally the endpoint of a hub such as PubSubHubbub

id

A unique ID for the data source, e.g., the feed

is_valid(entry)[source]

Test the validity of an entry. Only returns True by default.

Link to the data source

needs_update

Return whether the source is ripe for an update.

next_check

The time of the next update.

parse(response)[source]

Implement this function to convert your data source to something the update method can work with.

persist()[source]

Commit the entries or any updates to them to the database and return the entries that have been created.

retrieve()[source]

Retrieve the data source.

This is by default a wrapper around the requests library that sets headers such that servers can indicate that the content hasn’t changed since the last retrieval, and by default also specifies a custom user agent and timeout.

subtitle

Subtitle of the data source

title

Title of the data source

touch()[source]

Set the time of the last check of the source to the current time.

exception resyndicator.fetchers.base.UnchangedException[source]

Indicates that a feed or sitemap has not been changed.

Feed

Sitemap

class resyndicator.fetchers.sitemap.SitemapEntryInterface(fetcher, raw_entry)[source]

Entry mapping for the sitemap entry.

id

Entry ID generated from URL.

Sitemap entry location (i.e. the URL).

source

Entry raw source as JSON inside an HTML snippet.

updated

Lastmod time of entry with fallback on publish times of video and news extensions.

class resyndicator.fetchers.sitemap.SitemapFetcher(url, interval, session=<class 'resyndicator.models.DefaultSession'>, default_tz=<class 'dateutil.tz.tz.tzutc'>, defaults=None, **kwargs)[source]

Fetcher that supports sitemaps and recognizes some features of some sitemap extensions.

EntryInterface

alias of SitemapEntryInterface

id

Sitemap ID generated from explicitly set URL.

static parse(response)[source]

Return parsed sitemap.

update()[source]

Process sitemap.

class resyndicator.fetchers.sitemap.SitemapIndexFetcher(*args, **kwargs)[source]

This entry point that distributes the sitemap URLs in a sitemap index on individual sitemap fetchers is still a bit of a hack. It only supports one level of sitemap indices and circumvents the request scheduling, so that it can block the scheduler for a while and sends many consecutive requests to the same host.

EntryInterface

alias of SitemapEntryInterface

class SitemapFetcher(url, interval, session=<class 'resyndicator.models.DefaultSession'>, default_tz=<class 'dateutil.tz.tz.tzutc'>, defaults=None, **kwargs)

Fetcher that supports sitemaps and recognizes some features of some sitemap extensions.

EntryInterface

alias of SitemapEntryInterface

id

Sitemap ID generated from explicitly set URL.

static parse(response)

Return parsed sitemap.

update()

Process sitemap.

SitemapIndexFetcher.clean()[source]

Reset the fetcher.

SitemapIndexFetcher.id

Unique ID of the sitemap generated from explicitly set URL.

SitemapIndexFetcher.parse(response)[source]

Parse the sitemap index.

SitemapIndexFetcher.raw_entries

The raw entries as returned by parser.

SitemapIndexFetcher.update()[source]

Run the retrieval cycle that calls the sitemap fetcher internally.

Twitter

class resyndicator.fetchers.twitter.TweetInterface(fetcher, raw_entry)[source]

Mapping for individual tweets.

class Entry(**kwargs)

Default SQLAlchemy entry representation.

TweetInterface.author

The tweep.

TweetInterface.content

The HTML representation of the tweet.

TweetInterface.entry

The SQLAlchemy entry representing the tweet.

TweetInterface.fetched

Time the tweet was fetched.

TweetInterface.id

The tweet ID as string.

The URL of the tweet.

TweetInterface.source_id

URN to identify the tweep.

Link to the Twitter account of the author.

TweetInterface.source_title

Some generated source title based on the author name.

TweetInterface.title

Optionally shortened representation of the tweet text.

TweetInterface.tweet_html

Assemble a presentable HTML respresentation of the tweet.

TweetInterface.tweet_text

Assemble a presentable text respresentation of the tweet.

TweetInterface.updated

Time the tweet was created.

class resyndicator.fetchers.twitter.TwitterStreamer(oauth_token, oauth_secret, session=<class 'resyndicator.models.DefaultSession'>, timeout=0, **kwargs)[source]

A Twitter streaming client that doesn’t work at the moment due to an error in the Birdy library. Please use the TwitterFetcher in the meantime.

Content

class resyndicator.fetchers.content.ContentFetcher(session=<class 'resyndicator.models.DefaultSession'>, past=None, **kwargs)[source]

Fetcher class for retrieval and extraction of content from websites.

This fetcher incrementally downloads and extracts (using Readability) the content from any pages associated with entries that don’t already have long-form content associated with them.

class Entry(**kwargs)

Default SQLAlchemy entry representation.

ContentFetcher.fetch()[source]

Run one full fetching cycle. This is the main entry point for the content fetching process.

static ContentFetcher.get_hostname(entry)[source]

Wrapper for extrating the hostname from entry links. (Another ContentFetcher might need to remove the .www.)

ContentFetcher.persist()[source]

Commit the updates to the entries.

ContentFetcher.select()[source]

Return a list of entries where there is only one per host so to maximize the time between requests to the same host.

ContentFetcher.update()[source]

Replenish the in-memory list of entries to process.

Services

resyndicator.services.content(args)[source]

Main entry point to the content fetchers.

resyndicator.services.fetchers(args)[source]

Main entry point to the fetchers, which run periodically.

resyndicator.services.stream(args)[source]

Main entry point to the streams, which run continuously.

Console

resyndicator.console.run()[source]

Main entry point to the Resyndicator.

Utils

class resyndicator.utils.sitemapparser.Sitemap(xml)[source]

Parser class for sitemaps.

class resyndicator.utils.sitemapparser.SitemapIndex(xml)[source]

Parser class for sitemap indices.

resyndicator.utils.sitemapparser.dictify(element)[source]

Convert an etree element to a dictionary.