Reporters¶
Reporter is literally a reporter who will be dispatched and discover any web content news list based on it’s assigned Scehdule. It is one of the core News components and it is responsible for fetching, parsing and pipelining web contents.
Middlewares¶
Reporters can be enhanced by two type of middlewares. dispatch_middleware and
fetch_middleware. Middlewares are essentially a function descorators for
dispatch() and
fetch(). Any functions that satisfy
following protocols are sane reporter middlewares.
dispatch_middleware: A function that takes anReporteranddispatch()method and returns enhanceddispatch().fetch_middleware: A function that takes anReporterandfetch()method and returns enhancedfetch().
Reporter middlewares especially comes handy when useful when you are building news pipeline or callback chains.
Generic reporters¶
Generic reporters are generic base reporters which are subclasses of
Reporter. They provide generic implementation
of dispatch() and other additional
mechanism of news discovery if necessary. News currently provides 2 types
of generic reporters: TraversingReporter and
FeedReporter.
TraversingReporter implements generic
mechanism of traversing along a tree of news contents.
FeedReporter, on the other hand, provides
generic mechanism of discovering news contents from a static feed url.
Extending reporters¶
You can easily extend generic reporters from news.reporters.generics or
from news.reporter.abstract to build your own reporter from the very
scratch. Any Reporter subclasses that satisfies
it’s dispatch(),
parse() and
make_news() protocls are sane
reporters.
Example
from bs4 import BeautifulSoup
from news.models.abstract import Readable
from news.reporters.generics import TraversingReporter
class ReditThreadReporter(TraversingReporter):
def __init__(thread=None, *args, **kwargs):
self.__init__(*args, **kwargs)
self.thread = thread or 'all'
def parse(content):
return Readable(title= ...)
def make_news(readable):
return self.backend.News.create_instance(
parent=self.parent.fetched_news, ...
**readable.kwargs()
)
def get_urls(self, news):
soup = BeautifulSoup(content)
return (a['href'] for a in soup['a'] if self.thread in a['href'])
Mapping from schedules to reporters¶
To use your own customized/extended reporters, you need a mapping mechanism for
mapping from a schedule to a specific reporter. Mapping
exactly does that mapping.
Example
from news.mapping import Mapping
from .reporters import ReditThreadReporter
from .scheduler import scheduler
mapping = Mapping({
'redit': lambda schedule: return {
'thread': schedule.options['thread']
}
})
scheduler.configure(mapping=mapping)
scheduler.start()