Холмс, кажется, вы в России...
What should developers do when they are to develop a service with a 60 million contacts database? Take it up and start developing! Even if they have no experience working with such a large amount of data. My name is Sergey Nikonenko, I’m a COO at Purrweb. Today I’ll tell you about a mid-project database and search engine shift and a secret unveiled after 3 years of working with a client.
Reading time: 8 minutes
Looking for a development team?
We can help with design and development of apps for businesses and startups
The story I am about to share with you is almost mystical: back in 2017, we got a request from a client who turned out to be a true conspirator. He came up with the idea of a mega lead generation service that would allow companies to find leads (contacts of organizations) in the B2B sector. The service was supposed to become a universal tool for any more or less big sales department which had to work with cold emailing and calling.
The client didn’t come empty-handed: he already had a database that contained 60 million records (don’t ask where he got it from — we don’t know and are bound by NDA). The database had everything a sales department needed for lead generation:
The lead generation service was to help sales reps filter people and companies they are interested in and get up-to-date contacts: emails and phone numbers.
We called the client a conspirator because even a signed NDA wasn’t enough for him to be sure of us — we learned his real name only after 3 years of cooperation. However, it wasn’t an issue as by that time we managed to make a good team. Now we are working on other projects together.
When we first met on a call, we estimated the project at $15,000. The client described the idea quite roughly, and we were young, green, and inspired. We clapped our hands, thinking that all we needed to do was to link up the database, connect the interface and finish some little things.
However, by the end of the first development stage, the project exceeded time and money expectations. The client was constantly coming up with new details, so I was worried that nothing would work out, and we would end up far beyond the initial agreement.
They say, do what you must and come what may. At some moment, we found a way to hit it off with the client. I stopped worrying and panicking, and we managed to prioritize the yards of new features and ideas. The work got back on track.
Later on, we learned that the client didn’t expect anything from us — he knew the real complexity of that project. It was his second attempt as he already had some questionable results with a team from India. In addition, the client knew that such a large lead generation project would take larger investments.
The client’s service would not be a never-before-seen product. In the market, there were already quite some competitors. The biggest ones are ZoomInfo, Clearbit, and D&B Hoovers. However, we outplayed all of these services since the client’s database comprised more accurate contact and company data.
The project let us try out lots of services that we had never used before:
We split the development process into 5 steps:
Let me tell you about everything in detail.
When we began setting up the search engine, we initially decided to use Elasticsearch — it was pretty simple to work with. However, as time went on, its functionality turned out to be insufficient:
We had to find an alternative (as you may remember, it’s not the first replacement made during the development). Apache Solr was our champion. It wasn’t difficult to do since Elasticsearch and Apache Solr are by and large similar — they are built upon the same technology.
As a result, the search became faster and more convenient for end-users.
When we started working on the project, we didn’t have a large experience in managing databases. We chose MongoDB as a database management system — quite a popular tool, yet not the most convenient. MongoDB allows you to pile up all data in a ‘heap’ and utilize whatever you need later on. When we puzzled over how to structure the data, it was time to change the DBMS.
We got two main reasons to change it:
We changed the DBMS bit by bit: first, moved all the entities, except for contacts. For a while, we worked with 2 databases simultaneously. We gradually adapted the code to work with PostgreSQL, then took all the data from MongoDB and finally moved to PostgreSQL.
When you have 60 million emails, you need to think of a way to verify them in order to get rid of dead and broken ones. At first, we chose BriteVerify but later switched to ZeroBounce (which was easy to connect with). BriteVerify granted us more accurate verification for better lead generation yet was much more expensive than ZeroBounce. That’s why we agreed on using ZeroBounce as a sweet spot.
That’s how it worked: we send an email to the service, the service reveals the status of the email. For the user, verification requests are paid yet the price is low.
Typically, 10 statuses could be assigned to an email in the system. To make the development more user-friendly, we merged them into 3 groups: valid — you can send emails to this account (most likely that you’ll get a response), accept all — in-between state (you can start sending emails but the mail server may decline your request), and invalid — it’s pointless to try.
It’s not the only benefit provided by Zerobounce: users can upload their CSV file containing emails into the service to check the validity.
When a sales department works with a particular CRM, it won’t ever change it will sit tight and try to build its ecosystem upon it.
We chose the 15 most popular CRM systems and integrated them with the lead generation service. All CRMs are different, so we had to read through the documentation of each one and implement integration manually — unfortunately, you cannot automate this process. Most of the chosen CRMs went with clear documentation and a built-in sandbox: in such a case, all you need to do is read the documentation and double-check yourself to make everything work flawlessly. However, some CRMs are difficult to integrate, and you need to spend some time tinkering around.
When you’re working on online lead generation software, be ready to connect it to as many CRMs as possible
The integration spread over several years: the process was set to go step-by-step. When it was needed, the client added new CRMs requiring integration into the backlog. Besides, this task had no due time: to make the user experience of lead generation better, all the integrations need to be supported, maintained, and timely updated.
It’s not the only part of the service that we regularly update. Even 60 million contacts in the database aren’t enough: one day, users will wear it out, and the service will become useless. That’s why we systematically update the data.
Every 3-4 weeks the client sends us a new archive that contains about 1000 JSON-flies. It usually takes us a week to pull out all the data and renew the database. To do it, we:
Gather all data in one large JSON file which size is around 1TB
Parse this file, pick up all needed information and create a CSV file — about 2-3 days
Upload CSV file to PostgreSQL — about 2 days
Index the data with Solr — another 3 days
Besides data from the database, we added up some useful features to the lead service. This way, the lead generation service got a news feed and sales triggers:
To implement the feature, we integrated the service called Contify that sends us news about companies we have in the database. Users can choose what type of triggers they need: for instance, if they are interested in the bankruptcy of all IT companies, they will get all news related to the topic.
For example, if the user is a developer, they can set up triggers on news about startups that get investments. The plot is simple: you learn that a startup got the money → you email the startup and offer your services.
Offline meetings were, are and will be the most impactful networking method. That’s why we taught the lead generation service to send news about various upcoming events and conferences. As icing on the cake, now the service can not only send the news but also bring registration links.
Stripe is considered to be the best payment system the world has ever seen. However, when working on the service, we faced an unexpected problem: Stripe didn’t approve the client’s product because of its ‘characteristic aspects’. Of course, we were up to the task and quickly brought an alternative — BrainTree. In fact, BrainTree is not worse than Stripe, but we had never worked with it before. Recently, Stripe changed its mind, so now we are setting up another payment system that will work as an alternative option for users.
Although the service was released, we continue working on it — the client is 100% into the project and has an unquenchable flow of ideas. At first, it was a medium-sized lead generation service with a subscription. Now it is a giant with its own public API (companies can connect to the service’s databases).
As the client expected, the project reached a break-even point and began to bring more than $5 mln per year. The service has approximately 10 thousand active users per month (with 110 thousand users being registered).
There are still many ideas waiting to be implemented. The client doesn’t plan to stop. So don’t we. After all, if we learned the real name of the client only by the fourth year of working together — who knows what else this cooperation can bring to the world. 🙃
‘Unlike other developers I’ve worked with, Purrweb genuinely cares for our project. They’re a true partner in the sense that they’re investing in our relationship and making our product a success. When I compare Purrweb to past development partners, they have exceeded my expectations. Selfishly, I’d like to keep Purrweb to myself, but their team is fantastic, and I’d highly recommend them to anyone looking for a web development team’ — the client comments.
How useful was this post?
Rate this article!
14 ratings, аverage 4.9 out of 5.
No votes so far! Be the first to rate this post.
As you found this post useful...
Follow us on social media!
Read more
Thanks for your inquiry. It usually take up to 24 hours to get back with reply.
Wanna schedule an online meeting?
Sorry, something went wrong with your request.
Please, try again later.