Aesthetics Directory
🎯 BRIEF
The Aesthetics Directory is a comprehensive database of aesthetics facilities across the US providing a wealth of information, including facility names, addresses, websites, contact details, treatments offered, affiliated doctors, facility types, and more. The directory serves as a powerful tool for market analysis, providing insights into trends, competitive landscapes, and high-demand areas. Armed with actionable data, companies can optimize marketing, tailor sales approaches, and strategically plan expansions in the dynamic and competitive aesthetic industry.
🔧 TOOLS
Python, SQL, Snowflake, Tableau, pandas, re, BeautifulSoup
🤝 CONTRIBUTION
Built an end-to-end framework in SQL for a new directory product from developing logic for data collection, data standardization, data linkage, and data validation to MVP development.
Devised an inventive data linkage logic to group and track similar facilities together across 70+ data sources based on various common attributes.Â
Built innovative data standardization solutions (SQL) for the standardization of names, addresses, websites, phones, etc.
Automated the scraping of 10K+ websites per week using urllib3 and BeautifulSoup.
Designed and automated QA queries, ensuring data integrity across the product.
Used Snowflake SQL functions (Soundex, Jarowinkler_similarity) to reduce manual data validation, cutting costs by 40%.
Discussed and implemented business rules with stakeholders for the selection of the most verified information.
Developed an innovative way to break URL into its components—scheme, subdomain, second-level domain, top-level domain, subdirectory—irrespective of the length of the URL.
Devised a 1000+ lines SQL code to find, map, and extract patterns in physician names across 70+ data sources—prefix, first initial, first name, middle initial, middle name, last initial, last name, suffix, credential text, and role.
Designed a solution to identify products from scrapped website data using regular expressions.
Extensively explored the quality of data sets and presented insights to leadership, enabling the selection of high-quality data sets.
🏆 ACHIEVEMENTS
Designed a pioneering data linking logic that established a strong foundation for the directory, resulting in a 15% increase in revenue.
Enhanced data reliability by innovatively integrating NPI (National Provider Identifier) data into the product.
Achieved a 20% increase in user engagement by inventing two new product features through the integration of external data sources.