Web Scrapper - Python

Date Posted: Jun 11, 2022, 1 Candidate Already Applied
Full Time (40hrs/week)
More than 1 year
Pune, Maharashtra, India

Skills Required


Job Description

Job Description: Web Scrapper - Python

 

Hours: Monday – Friday (some hours outside of this as required)

Work hours: EST shift (6:00 pm to 3:00 am). It will be a fixed shift.

Location: Pune, IT Cerebrum Park, Kalyani Nagar

Length: Permanent position, with three-month probation period

Salary:

 

About the Company

 

Valasys Media is a Global Integrated Marketing and Sales process outsourcing company that specializes in helping companies to build sales pipeline with qualified opportunities and reduce their sales cycle for their products/services portfolio. As part of our capability, we also help create market visibility, build awareness, and establish business relationships in new markets. 

 

Job brief

We are looking for a Web Scrapper-Python to help us to expand and optimize our data as well as optimize data flow. The ideal candidate will be responsible for extracting and ingesting data from websites/URLs using web crawling/Scrapping tools. In this role you will own the creation process of these tools, services, and workflows to improve crawling/Scrapping of data and management of database.

To do this job successfully, you need exceptional skills in programming and web. Knowledge of data science and software engineering candidate will have added advantage. Your ultimate goal will be maintained dataflow with scraping, crawling and cleaning data as per requirement.

 

Key skills: Web Scrapping, Web Crawling, Web and Windows Automation, Python/R, Selenium, NLP, Data Extraction, SQL/No SQL, OpenCV, Auto IT, PyAutoGUI

 

Requirements and skills

  • Proven experience as Web Scrapper/Crawler or similar role
  • Have strong understanding and working knowledge of web crawlers, web scrapers and other automation tools, to help browse the web content
  • Knowledge of web scraping and tools
  • Strong knowledge of any of multiple open-source and proprietary scraping frameworks available
  • Hands-on-experience with SQL/NO-SQL (MySQL/ Postgres/Cassandra /MongoDB)
  • Good knowledge and coding experience in one or more programming languages such as Python, Java, JavaScript
  • Experience of creating scrapy spiders for websites with Captcha, IP ban, geolocation ban, Cloudflare / Distil / Imperva firewalls, sites required login to access data, Dynamic websites loading through JS / REST API / Graphql etc.
  • Knowledge of Object-oriented programming
  • Experience with applications designed to display archived web content
  • Experience with AWS cloud services (EC2)
  • Python Tech stack (Python libraries - scrapy, requests, Urllib, Beautiful soup, splash, Selenium, pandas)
  • 2-4 years’ experience with a Bachelor's Degree in Computer Science, Engineering, Technology or related field required

Responsibilities

 

  • Program and apply your knowledge set to fetch data from multiple online sources, cleanse it
  • Develop application frameworks for automating and maintaining constant flow of data from multiple sources
  • Design, build web crawlers to scrape data and URLs by using Python modules [scrapy, selenium, requests, Beautiful Soup, splash, etc.]
  • Create crawlers for all types of websites irrespective of the technical roadblocks.
  • Manage the crawlers to overcome technical challenges like IP ban, geolocation ban, captcha and bot blocking services
  • Design scrapy pipelines to connect the crawler output to MySQL database
  • Integrate the data crawled and scraped into our databases
  • Build and maintain high quality reusable code
  • Automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability


Contact Us
Addresses
US Office
100 Franklin Sq. Drive, Ste 207 Somerset,
NJ - 08873, USA
India Office
707, Siddhartha Building, 96, Nehru Place, New Delhi – 110019, India
Subscribe to Newsletter
Email
Are you a *