The following PHP script will quickly find all the links on a web page, removing any duplicates and also giving the total count for any duplicate links found.
In this post I have explained some elements to scrap data from external websites.
Simple HTML DOM parser is a PHP 5+ class which is useful to manipulate HTML elements. This class can work with both valid HTML and HTML pages that do not pass W3C validation. You can find elements by ids, classes, tags and many more. You can also add, delete or alter DOM elements. The only one thing you should care about is memory leaks – but you can avoid memory leaks as explained later.
This month I’ll complete 1 year riding with Uber. The process of booking a ride and the response time has been amazing. This post however is on a different matter. As a data aficionado I was curious on the various locations I had travelled over the year and the cost each month. Heading over to the Uber API docs was a disappointment as Uber does not provide any api for getting the ride history data. My next plan was to scrape the data from Uber pages using PHP or Python. Just when I was going to start the project to scrape Uber trip data, a little Google search returned a nice bookmarklet by @ummjackson that scrapes the data and exports it to a CSV file.
There are three ways to access a website data. One is through a browser, the other is using a API (if the site provides one) and the last by parsing the web pages through code. The last one also known as Web Scraping is a technique of extracting information from websites using specially coded programs.