vteam #620 was required to develop a pilot project to check the possibility of processing large number of records that are created by consuming various Contact APIs. The objective of this project was to build a system that will be able to fetch (from API), transform, store and retrieve large amount of data. The main idea was to gather a person’s social information (Facebook, Twitter, LinkedIn, etc.) via different Contact APIs such as:
- Full Contact
- Clearbit, etc
As different APIs were giving the same information differently; vteams engineer Muhammad Azeem applied various data transformation strategies to make the data more consistent (> 80%). Latter, data was merged and stored after removing duplicates. Lastly the merged data was made available to be consumed by other systems for the comparison of various Contact APIs. But these APIs didn’t return complete information.
Since the Contact APIs use email addresses to get information, these addresses were supplied through files that were uploaded and queued to the background jobs for processing. Notifications were used to inform the user about the result of a particular operation.
A lot of data was coming through the APIs. A fast and effective way of retrieving the data was needed since the data was frequently being searched (full-text searching). Elastic Search (ES) was used to index all the key data that eventually helped in fast retrieval of data as compared to MySQL. The strategy was to store all API related data in ES while all the Non-API data i.e. users, import logs, etc. were stored in MySQL. It was also conveyed to client that recently updated data is not usually available in ES, as the index update takes time compared to its retrieval processes.
It was not just about consuming and providing the APIs that were needed; a solid framework was also required to handle basic requirements like:
- Roles and permissions
- Import logs
- Notifications, etc
Hence Symfony 3 was chosen by the client. As Symfony uses an ORM, so Azeem used Doctrine that needed a better ES adopter to return objects instead of arrays while retrieving information from ES. FOSElasticaBundle was used to overcome this problem.
Azeem was able to use Elastic Search with Symfony 3 smoothly. Following project targets were achieved:
- Provided a common interface to all the APIs for ongoing updates
- Transformed, merged and removed duplication from data
- Provided fast search solution