Data Scraping is Key Retail Technology in China but in a Legal Grey Area

Post by 
Cecilia Wu
Published 
November 21, 2019

The headline news “One coding line put 200 programmers from a big data company under arrest…” sent a shocking wave into the tech geeky world. And the culprit of this whole scandal is called “data scraping”, a computer technique that is usually applying bots to automatically harvest a huge amount of data from any page on the internet.

The data extraction is often used for further analysis and having diverse use cases such as:

  • Search engine optimization
  • E-commerce price monitoring
  • Social media listening
  • Content aggregator
  • Simply to enrich your existing database

To this day, despite the innocent nature of data scraping itself, many believe it is playing in the grey area and can be easily put into malicious purpose, especially its capability of extracting very personal and private information. In addition, it is actually not easy to determine the legality of web scraping in a digitized era.

Back in our headline story, it all started with an ordinary programmer working in a startup who received the demand from his supervisor to crawl massive online data from a local large company website. It went smoothly in the beginning and the programmer kept optimizing the technique to a point so efficient that eventually crashed the server of the large company due to his heavy load of illicit data retrieving. The tiny incident triggered the bomb and angered the large company which called the police to step in. The investigation immediately pinpointed the source to a startup selling CVs to anyone who could offer a proper price. But the startup was neither a recruitment platform nor headhunter agency; its database of over 160 million CVs is said partially acquired through scraping other major human resources related websites. The police raided this startup one day and handcuffed about 200 programmers who were stunned to know that data scraping would put them behind bars.

Chain of events followed with the regulator’s determination to crack down on illegally scraping and obtaining private data online. It is said several big data startups especially related to Fintech were soon implicated into scrutiny and question as well. Major banks reported to the policy that this type of Fintech startups are often collecting their important user information, again via scraping without banks’ proper permission.

True to form, a big data company has to wield certain power of scraping in one way or the other. Without scraping, you are like going to a formal dinner being shirtless. But today due to the stringent regulation on the horizon, few startups even decide to wash their hands off it.

One Chinese startup which is specializing in turning any unstructured text data into NLP analysis then generating consumer insights or customer feedback,might be an extreme example. The magic of all its analysis is essentially fed by the blood from scraping open & public user-generated content, in particular, comments from Alibaba e-commerce sites, social media sites Little Red Book, online travel site Ctrip, etc. In the past it built a solid internal crawling/scraping division to pump blood from the internet every day, however now it is asking its clients to bring their own data instead of scraping on behalf of them. If their clients insist on the scraping, it says it has to consult lawyers first to avoid any negative consequences. As harmless as these open data might seem on the surface, the startup said they are too small to embroil itself into any legal troubles. Not to mention it is well known that Alibaba has set up certain anti-scraping mechanisms to prevent outsiders from looting its open data abusively.

Having said this, not every startup would chicken out. One local startup which offers service in monitoring e-commerce intelligence for pricing, competitor store, sales value, sales volume, reviews, etc, still feels comfortable to scrape the data from leading e-commerce channels like Alibaba, JD or Pingduduo. It can pretty much update the intelligence data almost on an hourly basis. According to them, huge traffics on these e-commerce platforms daily would perfectly disguise their data scraping as normal website visits.

Today data scraping in China is a good-paying job with a monthly salary between RMB10K~60K, depending on your skills. You can always find underground shops to scrape raw data for you. But for those big data startups selling analysis, they probably need to tread the path more cautiously from now on.

Sign up for our
Newsletter

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
CONNECT WITH US
THere's More

Post You mIght Also Like

All Posts
China Digital Retail Insights
Jan
25
//
2021

How L'Sphere Became a Top Selling Men's Skincare Brand in China in Less than a Year

Chinese Men’s skincare brand L’Sphere was unknown just over one year ago, but ended up as one of the biggest selling brands in the category on Douyin in 2020. Re-Hub breaks down the secret to L’Sphere’s success and outlines how other brands can replicate it.
Retail Analytics
Dec
28
//
2020

Big Data in Luxury China

In 2021 and onwards, if luxury brands want to capture trapped spending Post-Covid, after consumers are free to travel again, they need to improve their data analytics capabilities.
E-Commerce
Dec
11
//
2020

Daigou in 2021: More Fashion-Forward Boutique Service Than Luxury Bargain Basement

The daigou market in China has gone through seismic shifts since a government crackdown in 2018. Re-Hub take a deep-dive into the state the market as we go into 2021
China Digital Retail Insights
Sep
28
//
2020

How Hainan Can Become the Most Exciting Luxury Retail Destination in the World

Hainan Island tripled allowances for offshore duty free shopping in 2020. Re-Hub make the case that through utilizing data, there is a clear pathway for Hainan to become the most extraordinary luxury retail destination in the world.
China Digital Retail Insights
Aug
26
//
2020

Co-Branding with Local Brands is the Key to Winning Free-Spending China Gen-Z Consumers

Generation Z in China love to spend. There are 328 million consumers who fit into the Gen Z bracket in China(born 1995-2002) and they now account for 40% of online consumption in the country.

Sign up for our Newsletter

Enter your email and get the latest Asia digital disruption insights, news & interviews.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.