|
|
Thread Tools | Search this Thread |
Yesterday, 01:32 AM | #1 |
Member
Posts: 21
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
|
Reuters : Need help for creating recipe for Reuters using these RSS feeds
A few months ago the built-in recipe for Reuters stopped working due to human verification on Reuters site. More info on this thread. Reuters don't offer RSS feeds for reading articles of their website. So I found some RSS feeds from third party sources which seems to fetch articles from Reuters efficiently. But with very poor formatting. 1. Paragraphs seems to break after every hyperlink 2. in-line image of share button is displayed as full size image. I read this file on my Kindle Paperwhite and the formatting was terrible for reading. Reuters - calibre.mobi I don't know how to code in python. I have all the required RSS feeds for the Reuters. Can anyone who understand python can create a recipe form these RSS feeds which have very good formatting. Here are all the RSS feed links. Reuters RSS feeds.txt Thank you in advance for your help. |
Today, 02:58 AM | #2 |
Evangelist
Posts: 465
Karma: 82692
Join Date: May 2021
Device: kindle
|
Share your recipe with all the feeds. I will make changes.
|
Advert | |
|
Today, 03:31 AM | #3 |
Member
Posts: 21
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
|
I didn't added any extra code. All I did was using the basic calibre interface to add RSS feeds in News Fetch section and named it "Reuters".
The downloaded news preodical is attached in my previous message. That's why the formatting was not good as I didn't added any extra python code to make it only download relatable text from the web page. As you can see in the .mobi file attached in my previous message, one image is displayed in full page size, which should have been displayed in-line in the text. Or even if it is not displayed, it doesn't matter. I don't know Python enough to make this work. That's why I posted all the RSS links here so that someone who understands Python can create the recipe. Since the last recipe was created by you, I believe you can do this better than anyone else. |
Today, 04:23 AM | #4 |
Evangelist
Posts: 465
Karma: 82692
Join Date: May 2021
Device: kindle
|
i still get HTTP Error 401: HTTP Forbidden.
maybe it kinda worked for you that one time. Code:
#!/usr/bin/env python # vim:fileencoding=utf-8 # License: GPLv3 Copyright: 2020, Kovid Goyal <kovid at kovidgoyal.net> from calibre.web.feeds.news import BasicNewsRecipe def prefixed_classes(classes): q = frozenset(classes.split(' ')) def matcher(x): if x: for candidate in frozenset(x.split()): for x in q: if candidate.startswith(x): return True return False return {'attrs': {'class': matcher}} class Reuters(BasicNewsRecipe): title = 'Reuters' description = 'News from all over' __author__ = 'Kovid Goyal' language = 'en' keep_only_tags = [ prefixed_classes('article-body__container__ article-header__container__'), ] remove_tags = [ prefixed_classes( 'context-widget__tabs___ article-header__toolbar__ read-next-mobile__container__ toolbar__container__ button__link__' ' ArticleBody-read-time-and-social Slideshow-expand-button- TwoColumnsLayout-footer- RegistrationPrompt__container___' ' SocialEmbed__inner___ trust-badge author-bio__social__ with-spinner__spinner__ author-bio__author-image__' ), dict(name=['button', 'link', 'svg']), ] remove_attributes = ['style', 'height', 'width'] extra_css = ''' img { max-width: 100%; } [class^="article-header__tags__"], [class^="author-bio__author-card__"], [class^="article-header__author-date__"] { font-size:small; } [data-testid="primary-gallery"], [data-testid="primary-image"] { font-size:small; text-align:center; } ''' feeds = [ ('World', 'https://rsshub.app/reuters/world'), ('Business', 'https://rsshub.app/reuters/business'), ('Finance', 'https://rsshub.app/reuters/business/finance'), ('Markets', 'https://rsshub.app/reuters/markets'), ('Technology', 'https://rsshub.app/reuters/technology'), ('Sports', 'https://rsshub.app/reuters/sports'), ('Science', 'https://rsshub.app/reuters/science'), ('Lifestyle', 'https://rsshub.app/reuters/lifestyle') ] def preprocess_html(self, soup): for noscript in soup.findAll('noscript'): if noscript.findAll('img'): noscript.name = 'div' for img in soup.findAll('img', attrs={'srcset':True}): img['src'] = img['srcset'].split()[0] return soup |
Today, 08:34 AM | #5 | |
Member
Posts: 21
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
|
Quote:
Or have you formed completely new code as per the latest requirements?? First try to download articles from RSS feeds only. Then try to correct the formatting errors. The most annoying error is that "share" button image after 2-3 lines displayed in full page. Last edited by SpicyPoison; Today at 08:35 AM. Reason: Typo |
|
Advert | |
|
Today, 08:37 AM | #6 |
Member
Posts: 21
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
|
How can I exclude certain image from the article using Python?
How can I prevent paragraph breaks using python? |
Today, 10:43 AM | #7 |
Evangelist
Posts: 465
Karma: 82692
Join Date: May 2021
Device: kindle
|
|
Tags |
kindle, paperwhite, recipes, reuters, rss feeds |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
re: Reuters recipe too big 1.6GB! | hongho71 | Recipes | 2 | 10-13-2023 02:19 PM |
Seeking Reuters recipe | fengli | Recipes | 5 | 11-16-2022 04:57 AM |
Reuters recipe not working | duluoz | Recipes | 1 | 01-01-2022 04:06 AM |
Reuters recipe broken | duluoz | Recipes | 1 | 02-05-2021 02:25 AM |
Reuters (en) recipe help | BRGriff | Recipes | 3 | 11-29-2013 12:00 PM |