How Venture Capitalists Use Artificial Intelligence To Better Source Deals And Assess Startups

This article is an overview of the latest developments in AI for venture capital and the emerging ecosystem of solution providers (as of Q2 2018).


Inspired by the recent Medium post by Francesco Corea on “Artificial intelligent and Venture capital”, we did not want to miss on the opportunity to throw our two cents and  discuss the status quo of artificial intelligence in the VC industry. It was only few years ago that a fistful of VCs started to experiment with automation and machine learning as part of their internal operations. Today, you see investing firms publishing offers for data science jobs and openly talking about the different ways they use machine learning. But because the VC industry  is dealing with way less quality data than other types of investors, data-driven approaches are hard to implement. Even if the industry as a whole is still lagging behind, recent announcements from big funds pursuing AI-related initiatives are sparking the interest of more traditional startup investors. In addition to internal projects from VC firms, providers of AI solutions for VCs also start to emerge and offer a quicker, more robust path for investors to adopt machine learning.

But how do VCs benefit from artificial intelligence? Let’s look at it from 3 different angles: by player, by area of application and by data type.

1. Perspective by player


Please, note that the number of firms currently employing these applications may be significantly higher. This article is only based only on publicly available information. This is an issue specially relevant for the largest players which are more hermetic when it comes unveiling their dealsourcing and assessment approaches. For instance, GV and Sequoia, have sporadically disclosed to employ data scientists, but never explained in what their efforts translate.

Even considering these constraints, the total invested volume by the reported funds amounted to $9.0Bn by 2017 and to $5.3Bn by 2018 YTD. Their volume of investments added up to around 1,200 by 2017, with an average check size of $7.4M that shrinks to $2M after excluding the 5 largest players.

2. Perspective by area of application.



Researchers and VCs have long been aware of the many biases influencing their decisions. One of the most prominent being the perpetuation bias by which the applicants that most closely resemble the treats of the idealized entrepreneur are more likely to get funded. The extensive work of Laura Huang proves several examples of these instances. (e.g. “Investors Prefer Entrepreneurial Ventures Pitched by Attractive Men.”). Can data-driven approaches mitigate the difficulty for “humans” to create meaning out of large sets of unstructured information?

Due to the relative scarcity of start-up data and its heterogeneity, most current solutions focus on predicting succesful events. For instance, Hone Capital defines success as the ability for a start-up to raises a Series A round and attempt to predict the likelihood of such scenarios. Following this methodology, they claim to be able to identify companies raising Series A with an accuracy of 40%. This represents x2.5 the industry average, which soars to x3.5 when the results of the model are also filtered by the investment team.

The trade-off always remains data quality vs. data quantity. For example, widely available data in public databases, such as Crunchbase, Pitchbook, Owler, or Dealroom, are in the quantity game. Information is abundant but rarely go into details when dealing with small companies. Data is scarce, sometimes vague and often outdated. It makes for great industry level analysis but not so much at the company-level. Who can blame them? Data collection at this level has to be done manually in most cases. Some players like CB Insights realized it could automate some parts in its data collection process (they claim 70%).

There are no shortcuts to achieve superior data quality. By 2012, Correlation Ventures had already partnered with 20 VCs to access their internal statistics and reached to hundreds of companies manually. They gathered a dataset of 80,000 equity financings in which at least a VC firm participated since 1987. These sources are now benchmarked against the internal data available for each company applying for funding. All applicants are required to submit basic planning, financial history, and legal documents (e.g. term sheets, cap tables). The data is then used in the firm’s analytic models. Their sustained efforts have translated in one of the most automated processes in the industry: Once a start-up scores high based on their criteria, only a single 30-minutes interview in person is needed to make a decision. It reduces the time required for decsion making to an average of 2 weeks.

Differently, WR Hambrecht, a US based investor, focuses on a different kind of question: “How can we better predict when innovations will survive or fail?”. As a result, they claims that factors related to a startup’s operations have a predictive power of 20% and that only 12% is related to team. After 8 years of operations, their model has been accurate on 67% of their predictions, and the funds are estimated to achieve returns over 500% based on subsequent offers over their portfolio companies.


Dedicated AI applications aim at automating and expanding the sourcing processes of VCs to diversify its scope, discover promising startups, and discard dubious ones.

The evaluation solutions are more prominent amongst VC firms. External data service providers, specialized in VC, lead in the sourcing field. The investment required to set up an infrastructure to crawl, homogenize, and maintain various data sources works better when servicing a pool of customers rather than when assumed by a single VC, for itself.

But the most data-intensive VCs did not wait to build their own software solutions. For instance, InReach Ventures, which also has an AI evaluation model, invested $7 Mn to develop its proprietary software, assuming maintenance costs over $1 Mn per year. As of December 2017, its suite of AI applications allowed it to evaluate 95,000 European start-ups, and later screened a sample of 2% that were a good initial fit.

Theoretically, the technical process is straightforward. A combination of public and private data sources is first selected, then crawled, consolidated, and finally filtered by investment criteria. The variety of databases in use may turn out to be differential in some cases. For instance, the seed fund SignalFire disclosed that they “collect data from patents to academic publications to open source contributions to financial filings”. SignalFire’s GP also declared to invest in private raw consumer transactions data. Apart from applying these to discover new hidden gems, they opened their data platform to 50 third parties in exchange for filling roles as on-demand advisors to their portfolio companies.

In general terms, these AI processes normally entail data crawling modules (i.e. to map, monitor and extract unstructured sources of data), identification modules (i.e. to homogenize and consolidate company references and understand relations within the start-up network) and clustering modules (i.e. to group and categorize similar players, industries or news). Once these processes are implemented, VCs experienced a noticeable increase in the quantity and diversity of the leads sourced. Fly Ventures claims to discover 1,000 new start-ups per week. Right Side Capital has been able to invest in 850 companies since establishing a new data-driven approach in 2012, allowing the fund to reduce its check size below $300k. Social Capital presents even a lower average allotment, amounting to only $70k per investment, and its recent investments are distributed accross 24 countries with 80% of startups being led be non-white founders.


First, feedback solutions have been developed to provide ad-hoc recommendations for benchmarking start-ups vs. competitors. Solutions deploying this kind of applications are usually present either in one of the two previous categories, as they leverage data sources already gathered to tackle sourcing and evaluation solutions. Roberto Bonanzinga of InReach Ventures, explained the synergy: “By better clarifying which data best translates to successful startups, VCs can educate current and future entrepreneurs”.

Startup Compass, for instance, defends that start-ups should grow proportionally amongst each of its dimensions (team, product …). They developed a tool to warn and guide those start-ups prematurely scaling. Hone Capital aims at guiding entrepreneurs with recommendations based on concrete success metrics, that may have been tested while sourcing and evaluating their leads.

Within training solutions, we have broadly grouped those solutions that not only give feedback to entrepreneurs, but go one step further and deliver the means to improve portfolio companies’ performance. Some of most disruptive solutions can be found in this category.

First, there exist funds supplying data to train and validate the business models proposed by their promising AI investees which has been suffering from data hunger since its inception. For Gradient Ventures, the most recent early-stage Google fund, data was already there. Usually, it is not that easy. The team at Georgian Partners has found out a way of achieving similar results without that playfield advantage. They achieved it using differential privacy techniques where portfolio companies can pool, share, and anonymize proprietary data and contribute from shared insights.

In addition, external providers are also showing very creative solutions. is offering a conversational automated bot that helps founders become better at pitching by replying questions from a bot. The bot can play the role of an incubator, a seed fund or a high-profile VC.

Lastly, matching solutions also exist for start-ups, offering them the possibility to find  investors that are investing in similar startups at the same stage. For example, Dorm Room Fund offer startups and investors alike a list of prospects based on company description, industry, and location.

3. Perspective by data source.


Cross-functional data

Every now and then we spot global data providers expanding their financial data services to VCs. Given that it may be in the interest of venture capitalists, we have considered a wider range of global data providers. Most data solutions cater to different kinds of investors including private equity funds, hedge funds, lenders and large corporates.

  • Digital footprint: Twitter, Facebook, App Store, Web traffic, web forums… Probably the most extensive source of information, especially for B2C companies, with the challenge of extracting and transforming it in useful and understandable insights for investors. Some interesting examples here are iSentium and Dataminr. The former scraps Twitter posts, identify keywords expressing positive or negative sentiments and lastly ranks each companies’ sentiment on a quantitative score. Following a similar philosophy, Dataminr lively monitors social network activity to immediately alert of sentiment-changing events.
  • Financial information: AI has enabled techniques to look at the traditional financial sources on a more innovative and timely manner. On the processing side, Kensho has developed a platform that crawls publicly available company data to help answer financial user queries instantly. On the analytical side, Prattle is discovering insights hidden deep in traditional central bank reports and company earnings calls. They do sentiment analysis based on grammatical structures, nuanced wordings and tones in use.
  • Consumer data: A few companies, such as Earnest Research, specialize on acquiring consumer data, consolidating it and extracting insights on consumer behavior. The CircleUp team, which tries to predict likelihoods of breakout success for over 1.2M US retail companies claims that this sector is uniquely positioned to benefit from data treatment techniques. Its CEO declared that: “The business models of retail companies are very similar. Whether a company is selling dog food, shampoo or water. Second, there’s endless data on consumer product and retail companies“.
  • Satellite images: AI image processing and object recognition techniques are claimed to enable the treatment of satellite imagery to estimate granular economic and demographic metrics, substituting to some extent more traditional economic indicators. For instance, SpaceKnow launched in 2016 an index to monitor industrial activity in China. It claims to process 2.2 Bn satellite observation points and individually monitoring around 6,000 industrial facilities to do so.
  • Team background and dynamics: Finally, there exist a set of venture companies that based on team data seek either to support team dynamics or to evaluate which companies are most likely to thrive. AiNgel, is specialized in scoring companies using data such as educational background, employment history, entrepreneurial experience and personality traits.
  • Transactions data: Credit card transactions are an highly fragmented and unstructured data type, but also the most granular to understand consumer behavior, trends and expenditures. This kind of data has been extensively pooled and analyzed by Second Measure. They claim to access an anonymized selection of 2–3% of all credit card transactions in US down to the store level.


A few years back, the deployment of AI applications constituted a bold and novel bet for VCs with the promise of better dealflow. The point we are touching here, following up on Francesco’s article, is that an increasing number of VC firms are looking at developing their own in-house software to digest and analyze an increasing amount of startup data. However, it’s important to remember not to lose focus dealing with non-core activities, or as Cassie Kozyrkov puts it “Are you in the business of making bread? Or making ovens?”

At PreSeries, we have designed a framework upon which you can automate start-up dealsourcing and assessment efforts. We built our platform to take full advantage of both public and proprietary investor data, while keeping all private information secure. Feel free to get in touch, we’d love to show you how it works.

Arturo MorenoCEO – Twitter

Fabien DurandProduct & Marketing – Twitter

Alfonso PalomeroStrategy Intern – LinkedIn



Entrepreneur. February 6th, 2018. Here’s how AI is changing VC funding –

McKinsey&Company. June 27th, 2017. A machine learning approach to Venture Capital –

Forbes. October 2nd, 2015. Reimagining VC Investing: How Correlation Ventures is Attracting and Keeping the Best New Startups –

XConomy. March 13th, 2018. Too many venture capital cooks in the kitchen –

Medium. March 16th, 2017. Introducing the U.S. Venture Exit Year Index by Correlation Ventures –

Wall Street Journal. January 13th, 2012. Correlation ventures raises 165M$ for data focused investment approach.

Boston Biotech Watch. January 17th, 2012. Quant VC Correlation Ventures new “Dream Date”.

PEHub. January 17th, 2012.. Correlation Ventures Closes $165M Fund That Will Use Predictive Analytics –

Valor. July 19th, 2012. Are micro VCs boosting along short-term entrepreneurs? –

Fortune. August 5th, 2015. Could algorithms help create a better venture capitalist? –

Fast Company. November 19th, 2013. This prediction algorithm can tell if your start-up will fail –

Financial Times. December 11th, 2017. Artificial intelligence is guiding venture capital to start-ups –

Techcrunch. October 22nd, 2015. Watch out, VCs: Chris Farmer says he’s about to massively disrupt the industry –

Medium. October 30th, 2015. Venture capital disintermediation is coming.

Pitchbook. March 15th, 2018. Data driven investing: Why “gut-feeling” may no longer be enough? –

The news stack. May 14th, 2018. Could data-based, human-free investing eliminate bias? –

Bloomberg. May 1st, 2018. Impress the Algorithm. Get $250,000 –

CNBC. July 10th, 2017. Google will invest in AI startups and send its engineers to help them out for up to a year –

The Globe And Mail. April 6th, 2018. Georgian Partners rewriting the rules for venture capitalists as it closes in on record Canadian fund –

Techcrunch. January, 2018. Dorm Room Fund has built a CRM for founders raising a seed round –

Digital Globe. March 30th, 2018. Spaceknow: Using GBDX to bring transparency to the global economy –

J.P. Morgan. May, 2017. Big Data and AI strategies: Machine learning and alternative data approach to investing.