My scrapy not working for extracting html links

The code I've written for scraping and following links is not working for some reason. It doesn't follow the links in the page and returns only the 1st page results. Below is the code for reference

import scrapy

class LabnolSpider (scrapy.Spider):
name = "labnolblog"
start_urls = ['https://www.labnol.org/',
]

def parse (self, response):
    for blog in response.css ('div.feature__body'):
        yield {
            'Title' : blog.css ('div.feature__body h2::text').extract(),
            'Date' : blog.css ('div.feature__body p::text').extract(),
        }
    #for a in response.css ('a.btn btn--primary xtype--uppercase a::attr(href)'):
     #   yield response.follow (a, self.parse)

    next_page = response.css ('.a.btn btn--primary xtype--uppercase a::attr(href)').extract()
    if next_page is not None:
         yield response.follow (next_page, callback = self.parse)
         #next_page = response.urljoin (next_page)
         #yield scrapy.Request (next_page, callback = self.parse)

def parse_blogs (self, response):
    def extract_with_css (query):
        return response.css (query).extract_first().strip()
        yield {
            'Title' : extract_with_css('.h2.home__title::text()'),
            'Date' : extract_with_css ('.p.home__date::text'),
        }

1 Reply

@joshlynsathish Taking a look at your code using a Python syntax checker, I see that there aren't any syntax issues with what you've provided us. Looking into this further, it looks like the information you are looking for may need to be retrieved using a different process. Page data is loaded dynamically, and this may be why your code isn't returning the page results you expect. This Stack Overflow posts goes over this a bit more, and it provides some alternative ways you can achieve this goal.

Additional resources:

If you'd like to continue looking into web scraping using Python, you can review some of the information listed below.

Hope this helps!

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct