Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old Yesterday, 01:32 AM   #1
SpicyPoison
Member
SpicyPoison began at the beginning.
 
SpicyPoison's Avatar
 
Posts: 21
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
Post Reuters : Need help for creating recipe for Reuters using these RSS feeds


A few months ago the built-in recipe for Reuters stopped working due to human verification on Reuters site. More info on this thread.

Reuters don't offer RSS feeds for reading articles of their website. So I found some RSS feeds from third party sources which seems to fetch articles from Reuters efficiently. But with very poor formatting.
1. Paragraphs seems to break after every hyperlink
2. in-line image of share button is displayed as full size image.
I read this file on my Kindle Paperwhite and the formatting was terrible for reading. Reuters - calibre.mobi

I don't know how to code in python. I have all the required RSS feeds for the Reuters. Can anyone who understand python can create a recipe form these RSS feeds which have very good formatting.

Here are all the RSS feed links. Reuters RSS feeds.txt

Thank you in advance for your help.
SpicyPoison is offline   Reply With Quote
Old Today, 02:58 AM   #2
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 465
Karma: 82692
Join Date: May 2021
Device: kindle
Share your recipe with all the feeds. I will make changes.
unkn0wn is offline   Reply With Quote
Advert
Old Today, 03:31 AM   #3
SpicyPoison
Member
SpicyPoison began at the beginning.
 
SpicyPoison's Avatar
 
Posts: 21
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
Quote:
Originally Posted by unkn0wn View Post
Share your recipe with all the feeds. I will make changes.
I didn't added any extra code. All I did was using the basic calibre interface to add RSS feeds in News Fetch section and named it "Reuters".
The downloaded news preodical is attached in my previous message.
That's why the formatting was not good as I didn't added any extra python code to make it only download relatable text from the web page. As you can see in the .mobi file attached in my previous message, one image is displayed in full page size, which should have been displayed in-line in the text. Or even if it is not displayed, it doesn't matter. I don't know Python enough to make this work. That's why I posted all the RSS links here so that someone who understands Python can create the recipe.

Since the last recipe was created by you, I believe you can do this better than anyone else.
SpicyPoison is offline   Reply With Quote
Old Today, 04:23 AM   #4
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 465
Karma: 82692
Join Date: May 2021
Device: kindle
i still get HTTP Error 401: HTTP Forbidden.
maybe it kinda worked for you that one time.
Code:
#!/usr/bin/env python
# vim:fileencoding=utf-8
# License: GPLv3 Copyright: 2020, Kovid Goyal <kovid at kovidgoyal.net>

from calibre.web.feeds.news import BasicNewsRecipe


def prefixed_classes(classes):
    q = frozenset(classes.split(' '))

    def matcher(x):
        if x:
            for candidate in frozenset(x.split()):
                for x in q:
                    if candidate.startswith(x):
                        return True
        return False
    return {'attrs': {'class': matcher}}


class Reuters(BasicNewsRecipe):
    title = 'Reuters'
    description = 'News from all over'
    __author__ = 'Kovid Goyal'
    language = 'en'


    keep_only_tags = [
        prefixed_classes('article-body__container__ article-header__container__'),
    ]
    remove_tags = [
        prefixed_classes(
            'context-widget__tabs___ article-header__toolbar__ read-next-mobile__container__ toolbar__container__ button__link__'
            ' ArticleBody-read-time-and-social Slideshow-expand-button- TwoColumnsLayout-footer- RegistrationPrompt__container___'
            ' SocialEmbed__inner___ trust-badge author-bio__social__ with-spinner__spinner__ author-bio__author-image__'
        ),
        dict(name=['button', 'link', 'svg']),
    ]
    remove_attributes = ['style', 'height', 'width']

    extra_css = '''
        img { max-width: 100%; }
        [class^="article-header__tags__"],
        [class^="author-bio__author-card__"],
        [class^="article-header__author-date__"] {
            font-size:small;
        }
        [data-testid="primary-gallery"], [data-testid="primary-image"] { font-size:small; text-align:center; }
    '''

    feeds = [
        ('World', 'https://rsshub.app/reuters/world'),
        ('Business', 'https://rsshub.app/reuters/business'),
        ('Finance', 'https://rsshub.app/reuters/business/finance'),
        ('Markets', 'https://rsshub.app/reuters/markets'),
        ('Technology', 'https://rsshub.app/reuters/technology'),
        ('Sports', 'https://rsshub.app/reuters/sports'),
        ('Science', 'https://rsshub.app/reuters/science'),
        ('Lifestyle', 'https://rsshub.app/reuters/lifestyle')
    ]

    def preprocess_html(self, soup):
        for noscript in soup.findAll('noscript'):
            if noscript.findAll('img'):
                noscript.name = 'div'
        for img in soup.findAll('img', attrs={'srcset':True}):
            img['src'] = img['srcset'].split()[0]
        return soup
unkn0wn is offline   Reply With Quote
Old Today, 08:34 AM   #5
SpicyPoison
Member
SpicyPoison began at the beginning.
 
SpicyPoison's Avatar
 
Posts: 21
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
Quote:
Originally Posted by unkn0wn View Post
i still get HTTP Error 401: HTTP Forbidden.
maybe it kinda worked for you that one time.
Are you using the same python code from the previous built-in Reuters recipe?
Or have you formed completely new code as per the latest requirements??

First try to download articles from RSS feeds only. Then try to correct the formatting errors.

The most annoying error is that "share" button image after 2-3 lines displayed in full page.

Last edited by SpicyPoison; Today at 08:35 AM. Reason: Typo
SpicyPoison is offline   Reply With Quote
Advert
Old Today, 08:37 AM   #6
SpicyPoison
Member
SpicyPoison began at the beginning.
 
SpicyPoison's Avatar
 
Posts: 21
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
How can I exclude certain image from the article using Python?
How can I prevent paragraph breaks using python?
SpicyPoison is offline   Reply With Quote
Old Today, 10:43 AM   #7
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 465
Karma: 82692
Join Date: May 2021
Device: kindle
https://github.com/kovidgoyal/calibr...023d84d11a8412
unkn0wn is offline   Reply With Quote
Reply

Tags
kindle, paperwhite, recipes, reuters, rss feeds


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
re: Reuters recipe too big 1.6GB! hongho71 Recipes 2 10-13-2023 02:19 PM
Seeking Reuters recipe fengli Recipes 5 11-16-2022 04:57 AM
Reuters recipe not working duluoz Recipes 1 01-01-2022 04:06 AM
Reuters recipe broken duluoz Recipes 1 02-05-2021 02:25 AM
Reuters (en) recipe help BRGriff Recipes 3 11-29-2013 12:00 PM


All times are GMT -4. The time now is 11:43 PM.


MobileRead.com is a privately owned, operated and funded community.