(858) 586 7777 | About | Testimonials | Contact
vteams vteams vteams vteams
  • How does it work?
    • Startup Process
    • Your First Day
  • Technologies
    • Hire PHP Developer
    • Hire App Developer
    • Hire JavaScript Developer
    • Hire ROR Developer
    • Hire IOS Developer
    • Hire .NET Developer
    • Hire AI Developer
    • Hire Robotics Engineer
  • Sample Budgets
  • Meet The Team
  • Experiments
  • Captain’s Log
  • Blog
vteams vteams
  • How does it work?
    • Startup Process
    • Your First Day
  • Technologies
    • Hire PHP Developer
    • Hire App Developer
    • Hire JavaScript Developer
    • Hire ROR Developer
    • Hire IOS Developer
    • Hire .NET Developer
    • Hire AI Developer
    • Hire Robotics Engineer
  • Sample Budgets
  • Meet The Team
  • Experiments
  • Captain’s Log
  • Blog
Blog
  1. vteams
  2. Blog
  3. HTML Agility Pack
Jan 08

HTML Agility Pack

  • January 8, 2014

The engineers on vteam #261 were given the task of automating content extraction. Before this solution, our client was using a laborious process to manually go to numerous sites and copy and paste content manually.

The team looked at various open source libraries and short listed 2 main contenders:

  • Abot C# Web Crawler
  • Html Agility Pack

After a technical consideration, they chose the HTML Agility pack for following reasons:

  • Supports Linq to Objects (via a LINQ to XML Like interface).
  • Page fixing or generation. You can fix a page the way you want, modify the DOM, add nodes, copy nodes etc.
  • Web scanners. You can easily get to img/src or a/hrefs with a bunch of XPATH queries.
  • Web scrapers. You can easily scrap any existing web page into an RSS feed with just an XSLT file serving as the binding.

HTML Agility Pack library was integrated in Windows Form Application developed using VB.NET, Visual Studio 2010 and .Net Framework 3.5. It used to send a web request to desired URL and returned the RAW HTML. With the help of LINQ queries, RAW HTML was parsed to extract the required information and save the content in database (SQL Server 2008 R2) for further use.

Task Implemented successfully.

  • Facebook
  • Twitter
  • Tumblr
  • Pinterest
  • Google+
  • LinkedIn
  • E-Mail

Comments are closed.

SEARCH BLOG

Categories

  • Blog (490)
  • Captain's Log (1)
  • Closure Reports (45)
  • Experiments (7)
  • How-To (56)
  • Implementation Notes (148)
  • Learn More (156)
  • LMS (8)
  • Look Inside (10)
  • Operations Log (12)
  • Programmer Notes (20)
  • R&D (14)
  • Rescue Log (4)
  • Testimonials (25)
  • Uncategorized (4)

RECENT STORIES

  • GitHub Actions- Automate your software workflows with excellence
  • Yii Framework – Accomplish Repetitive & Iterative Projects with Ease
  • A Recipe for CRM Software Development
  • Are Agile and DevOps the same?
  • The Data Scientist’s Toolset

ARCHIVES

In Short

With the vteams model, you bypass the middleman and hire your own offshore engineers - they work exclusively for you. You pay a reasonable monthly wage and get the job done without hassles, re-negotiations, feature counts or budget overruns.

Goals for 2020

  • Open development center in Australia
  • Complete and Launch the Robot
  • Structural changes to better address Clients' needs

Contact Us

Address: NEXTWERK INC.
6790 Embarcadero Ln, Ste 100,
Carlsbad, CA 92011, USA

Tel: (858) 586 7777
Email: fahad@nextwerk.com
Web: www.vteams.com

© 2020 vteams. All Rights Reserved.

Content Protection by DMCA.com