Reporters

Reporter is literally a reporter who will be dispatched and discover any web content news list based on it’s assigned Scehdule. It is one of the core News components and it is responsible for fetching, parsing and pipelining web contents.

Middlewares

Reporters can be enhanced by two type of middlewares. dispatch_middleware and fetch_middleware. Middlewares are essentially a function descorators for dispatch() and fetch(). Any functions that satisfy following protocols are sane reporter middlewares.

  • dispatch_middleware: A function that takes an Reporter and dispatch() method and returns enhanced dispatch().
  • fetch_middleware: A function that takes an Reporter and fetch() method and returns enhanced fetch().

Reporter middlewares especially comes handy when useful when you are building news pipeline or callback chains.

Generic reporters

Generic reporters are generic base reporters which are subclasses of Reporter. They provide generic implementation of dispatch() and other additional mechanism of news discovery if necessary. News currently provides 2 types of generic reporters: TraversingReporter and FeedReporter. TraversingReporter implements generic mechanism of traversing along a tree of news contents. FeedReporter, on the other hand, provides generic mechanism of discovering news contents from a static feed url.

Extending reporters

You can easily extend generic reporters from news.reporters.generics or from news.reporter.abstract to build your own reporter from the very scratch. Any Reporter subclasses that satisfies it’s dispatch(), parse() and make_news() protocls are sane reporters.

Example

from bs4 import BeautifulSoup
from news.models.abstract import Readable
from news.reporters.generics import TraversingReporter


class ReditThreadReporter(TraversingReporter):
    def __init__(thread=None, *args, **kwargs):
        self.__init__(*args, **kwargs)
        self.thread = thread or 'all'

    def parse(content):
        return Readable(title= ...)

    def make_news(readable):
        return self.backend.News.create_instance(
            parent=self.parent.fetched_news, ...
            **readable.kwargs()
        )

    def get_urls(self, news):
        soup = BeautifulSoup(content)
        return (a['href'] for a in soup['a'] if self.thread in a['href'])

Mapping from schedules to reporters

To use your own customized/extended reporters, you need a mapping mechanism for mapping from a schedule to a specific reporter. Mapping exactly does that mapping.

Example

from news.mapping import Mapping
from .reporters import ReditThreadReporter
from .scheduler import scheduler

mapping = Mapping({
    'redit': lambda schedule: return {
        'thread': schedule.options['thread']
    }
})

scheduler.configure(mapping=mapping)
scheduler.start()