Models¶

class resyndicator.models.CustomSession[source]¶: Session that creates any custom tables.

class resyndicator.models.DefaultSession[source]¶: Session that creates the default table.

class resyndicator.models.Entry(**kwargs)[source]¶: Default SQLAlchemy entry representation.

class resyndicator.models.EntryBase[source]¶: An abstract entry not bound to SQLAlchemy. Subclass it as mix-in together with the CustomBase to create your own database representation. Use Entry to use the default.

Resyndicators¶

class resyndicator.resyndicators.Resyndicator(title, query, session=<class 'resyndicator.models.DefaultSession'>, past=None, length=30, **kwargs)[source]¶

The Resyndicator class represents a feed that is generated from the retrieved data on the bases of an SQLAlchemy query. It is identified by the title that you sepecify on instantiation, so do not change it, because that’ll be tantamount to creating a new resyndicator.

class Entry(**kwargs)¶: Default SQLAlchemy entry representation.

Resyndicator.feed()[source]¶: Generate the serialized feed.

Resyndicator.get_entries()[source]¶: Query all relevant entries from the database.

Resyndicator.publish()[source]¶: Write the serialized feed to a file.

Resyndicator.pubsub(fresh_entries)[source]¶: Publish new entries to a hub like PubSubHubbub.

Fetchers¶

Base¶

class resyndicator.fetchers.base.BaseEntryInterface(fetcher, raw_entry)[source]¶

Base class for entries.

Subclass this to provide a unified interface to any type of entry you want to import.

class Entry(**kwargs)¶: Default SQLAlchemy entry representation.

BaseEntryInterface.author¶: Entry author

BaseEntryInterface.content¶: Full content of the entry if given

BaseEntryInterface.content_type¶: Either text or html (important for the Atom output)

BaseEntryInterface.entry¶

Return the SQLAlchemy entry.

Here, the source property is used to optionally include the entry source code (specified through settings.INCLUDE_SOURCE). If the entry exists, it is returned unchanged. Otherwise, it is initialized with all the new values from the supplier.

BaseEntryInterface.fetched¶: Fetch time (set to datetime.datetime.utcnow() by default)

BaseEntryInterface.id¶: A globally unique ID for internal deduplication and identification by feed readers

BaseEntryInterface.link¶: Entry link

BaseEntryInterface.published¶: Time the entry was published

BaseEntryInterface.source¶: Optional field to insert any entry source code into the content field. This can be set through settings.INCLUDE_SOURCE.

BaseEntryInterface.summary¶: Summary or description of the entry

BaseEntryInterface.summary_type¶: Either text or html (important for the Atom output)

BaseEntryInterface.title¶: Entry title

BaseEntryInterface.updated¶: Time the entry was updated on the supplier side

class resyndicator.fetchers.base.BaseFetcher(url, interval, session=<class 'resyndicator.models.DefaultSession'>, default_tz=<class 'dateutil.tz.tz.tzutc'>, defaults=None, **kwargs)[source]¶

Base class for fetchers.

Subclass this to implement your own custom types of fetchers.

EntryInterface¶: alias of BaseEntryInterface

author¶: The author of the data source

clean()[source]¶: Reset the fetcher.

entries¶: Yield the SQLAlchemy entries after setting any default values.

generator¶: The generator of the data source

hub¶: Optionally the endpoint of a hub such as PubSubHubbub

id¶: A unique ID for the data source, e.g., the feed

is_valid(entry)[source]¶: Test the validity of an entry. Only returns True by default.

link¶: Link to the data source

needs_update¶: Return whether the source is ripe for an update.

next_check¶: The time of the next update.

parse(response)[source]¶: Implement this function to convert your data source to something the update method can work with.

persist()[source]¶: Commit the entries or any updates to them to the database and return the entries that have been created.

retrieve()[source]¶

Retrieve the data source.

This is by default a wrapper around the requests library that sets headers such that servers can indicate that the content hasn’t changed since the last retrieval, and by default also specifies a custom user agent and timeout.

subtitle¶: Subtitle of the data source

title¶: Title of the data source

touch()[source]¶: Set the time of the last check of the source to the current time.

exception resyndicator.fetchers.base.UnchangedException[source]¶: Indicates that a feed or sitemap has not been changed.

Feed¶

Sitemap¶

class resyndicator.fetchers.sitemap.SitemapEntryInterface(fetcher, raw_entry)[source]¶

Entry mapping for the sitemap entry.

id¶: Entry ID generated from URL.

link¶: Sitemap entry location (i.e. the URL).

source¶: Entry raw source as JSON inside an HTML snippet.

updated¶: Lastmod time of entry with fallback on publish times of video and news extensions.

class resyndicator.fetchers.sitemap.SitemapFetcher(url, interval, session=<class 'resyndicator.models.DefaultSession'>, default_tz=<class 'dateutil.tz.tz.tzutc'>, defaults=None, **kwargs)[source]¶

Fetcher that supports sitemaps and recognizes some features of some sitemap extensions.

EntryInterface¶: alias of SitemapEntryInterface

id¶: Sitemap ID generated from explicitly set URL.

static parse(response)[source]¶: Return parsed sitemap.

update()[source]¶: Process sitemap.

class resyndicator.fetchers.sitemap.SitemapIndexFetcher(*args, **kwargs)[source]¶

This entry point that distributes the sitemap URLs in a sitemap index on individual sitemap fetchers is still a bit of a hack. It only supports one level of sitemap indices and circumvents the request scheduling, so that it can block the scheduler for a while and sends many consecutive requests to the same host.

EntryInterface¶: alias of SitemapEntryInterface

class SitemapFetcher(url, interval, session=<class 'resyndicator.models.DefaultSession'>, default_tz=<class 'dateutil.tz.tz.tzutc'>, defaults=None, **kwargs)¶

Fetcher that supports sitemaps and recognizes some features of some sitemap extensions.

EntryInterface¶: alias of SitemapEntryInterface

id¶: Sitemap ID generated from explicitly set URL.

static parse(response)¶: Return parsed sitemap.

update()¶: Process sitemap.

SitemapIndexFetcher.clean()[source]¶: Reset the fetcher.

SitemapIndexFetcher.id¶: Unique ID of the sitemap generated from explicitly set URL.

SitemapIndexFetcher.parse(response)[source]¶: Parse the sitemap index.

SitemapIndexFetcher.raw_entries¶: The raw entries as returned by parser.

SitemapIndexFetcher.update()[source]¶: Run the retrieval cycle that calls the sitemap fetcher internally.

Twitter¶

class resyndicator.fetchers.twitter.TweetInterface(fetcher, raw_entry)[source]¶

Mapping for individual tweets.

class Entry(**kwargs)¶: Default SQLAlchemy entry representation.

TweetInterface.author¶: The tweep.

TweetInterface.content¶: The HTML representation of the tweet.

TweetInterface.entry¶: The SQLAlchemy entry representing the tweet.

TweetInterface.fetched¶: Time the tweet was fetched.

TweetInterface.id¶: The tweet ID as string.

TweetInterface.link¶: The URL of the tweet.

TweetInterface.source_id¶: URN to identify the tweep.

TweetInterface.source_link¶: Link to the Twitter account of the author.

TweetInterface.source_title¶: Some generated source title based on the author name.

TweetInterface.title¶: Optionally shortened representation of the tweet text.

TweetInterface.tweet_html¶: Assemble a presentable HTML respresentation of the tweet.

TweetInterface.tweet_text¶: Assemble a presentable text respresentation of the tweet.

TweetInterface.updated¶: Time the tweet was created.

class resyndicator.fetchers.twitter.TwitterStreamer(oauth_token, oauth_secret, session=<class 'resyndicator.models.DefaultSession'>, timeout=0, **kwargs)[source]¶: A Twitter streaming client that doesn’t work at the moment due to an error in the Birdy library. Please use the TwitterFetcher in the meantime.

Content¶

class resyndicator.fetchers.content.ContentFetcher(session=<class 'resyndicator.models.DefaultSession'>, past=None, **kwargs)[source]¶

Fetcher class for retrieval and extraction of content from websites.

This fetcher incrementally downloads and extracts (using Readability) the content from any pages associated with entries that don’t already have long-form content associated with them.

class Entry(**kwargs)¶: Default SQLAlchemy entry representation.

ContentFetcher.fetch()[source]¶: Run one full fetching cycle. This is the main entry point for the content fetching process.

static ContentFetcher.get_hostname(entry)[source]¶: Wrapper for extrating the hostname from entry links. (Another ContentFetcher might need to remove the .www.)

ContentFetcher.persist()[source]¶: Commit the updates to the entries.

ContentFetcher.select()[source]¶: Return a list of entries where there is only one per host so to maximize the time between requests to the same host.

ContentFetcher.update()[source]¶: Replenish the in-memory list of entries to process.

Services¶

resyndicator.services.content(args)[source]¶: Main entry point to the content fetchers.

resyndicator.services.fetchers(args)[source]¶: Main entry point to the fetchers, which run periodically.

resyndicator.services.stream(args)[source]¶: Main entry point to the streams, which run continuously.

Console¶

resyndicator.console.run()[source]¶: Main entry point to the Resyndicator.

Utils¶

class resyndicator.utils.sitemapparser.Sitemap(xml)[source]¶: Parser class for sitemaps.

class resyndicator.utils.sitemapparser.SitemapIndex(xml)[source]¶: Parser class for sitemap indices.

resyndicator.utils.sitemapparser.dictify(element)[source]¶: Convert an etree element to a dictionary.