Skip to main content

Posts

Showing posts from 2016

Two Weeks with the Aquaris M10 FHD Ubuntu Edition Tablet (An Honest Review)

It has been about two weeks since I started using the Aquaris M10 FHDUbuntuversiontablet. First of all I have to say that the tablet is not COMPLETE yet. It is not for people who expects similar experience like with  an Android tablet or with an iPad.
I started writing this blog entry with M10 tablet. But It was not easy since the operating system had a big list of bugs including external keyboard support for layouts other than English. As far as I could see from the related forums and blogs people are having problem even with German, Spanish and Portuguese layouts. I  also had  problem with my wireless Turkish keyboard. Today an Ubuntu update came to device. Now the layout is working I can write @ symbol from time to time now (I wasn't able to write @ symbol using my external Turkish keyboard  with Turkish Layout). 
I am a Linux fan. I've been using linux for more than 14 years now. Since canonical first started announcing the convergent OS concept I've been long waiting…

"A Simple Recursive Web Page Crawler to Scrape Data" written in Python

I am very new to python language. I bet there are better tools or modules written professionally. But I somehow find this very code written by myself useful. So I wanted to share it online. 
Lets say that we aim to scrape a specific kind of data (E.g. e-mail information is a good data type to try this code on) from within an html based web site. This code visits all possible links she finds and visits them recursively to find data you're seeking. If the web site is designed in a proper format you can inspect the code and identify a few html tags appearing just before the data string you need. 
Of course this code will not work properly if you don't make the necessary adjustments specific for your case. I just wanted to give a general idea to grasp the logic behind recursively crawling a web page and scraping any data.

In crawl() function  I give a random time break between consecutive requests in order not to disturb the server site much. It is a good idea to give that kind of br…