‘ Data Mining ’ vs ‘Data Scraping'

Apoorve
3 min readMar 23, 2020

These two terms share similarities but are intrinsically different.

What is Data and Web Scraping?

Data and Web Scraping is, extracting data from different websites or online databases, and formatting this data in a more convenient format such as a CSV file (commonly).

To learn more about it, check my previous blog here:

https://apoorve73.wixsite.com/techtospread

What is Data Mining?

Data Mining refers to the advanced analysis of extensive datasets. For large datasets, it requires Machine Learning.

It’s similar to Gold mining.

Data Mining is a step of Knowledge Discoveries From Data (KDD) Process.

Let’s see Step by Step!

Data Cleansing

"Over 2.5 quintillion bytes of data are created every single day, and it’s only going to grow from there. By the end of 2020, it’s estimated that more than 1.7MB of data will be created every second for every person on earth."

Such large amount of Data definitely requires to be cleaned. This is acheived by Parser. Parser are like Watchman, they decide whether the given subset of data is acceptable within the data specifications.

Data Integration

It is simply, picking of data from different databases and bringing them to a Data Warehouse.

Data Selection

Selecting the relevant data for our use. For example — If you are searching for a Mobile, on Google, it will show you Android phones, Windows Phones, Keypad Phones etc. Voila! Therefore, you get a mobile, not a computer!

Don’t confuse it with Data Mining

Data transformation

Applying Changes to Data as per the demand, making it easy to understand.

Data Mining

Here you come!!

After going through all the Data Preparation, it becomes necessary to find related patterns in Data by using Machine Learning Algorithms.

For example — A Mobile is a Smartphone or a Keypad, can be defined by certain patterns or features in each. Further whether it is an iPhone or Android is also classified on the basis of their features.

Pattern Evaluation

When user applies filter on the Mobile’s company, price range, RAM, etc., these patterns are saved in a Knowledge base. This saved data is later used to evaluate and present product advertisement to the users based on their previous searches.

Knowledge

You got it!! Knowledge is the data which is presented to the End user.

Summary:

  • Data Scrapping
  • Data Preparation
  1. Cleaning
  2. Integartion
  3. Selection
  4. Transformation
  • Data Mining
  1. Data Patterns
  2. Pattern Evaluation
  3. Knowledge

For the readers

I have created a database of Real Estate professionals, companies and executives.

You are welcome to fork and add more to it!

Check it here :

https://github.com/Apoorve73/Real-Estate-Companies-Database

--

--

Apoorve

SWE @HackerRank | Former Lead @GoogleDSC ZHCET (2021–22) | MITACS GRI’22 Scholar | ICPC Regionalist ’21 | Mentor @MLH sponsored HackCBS3.0,4.0 and Hack