We have data on 282 companies that use Apache Nutch. The companies using Apache Nutch are most often found in United States and in the Information Technology and Services industry. Apache Nutch is most often used by companies with >10000 employees and >1000M dollars in revenue. Our data for Apache Nutch usage goes back as far as 2 years and 4 months.
We use the best indexing techniques combined with advanced data science to monitor the market share of over 15,000 technology products, including Software Frameworks. By scanning billions of public documents, we are able to collect deep insights on every company, with over 100 data fields per company at an average. In the Software Frameworks category, Apache Nutch has a market share of about 0.1%. Other major and competing products in this category include:
Apache Nutch is a well matured, production ready Web crawler. It is pluggable and provides extensible interfaces such as Parse, Index and ScoringFilter's for custom implementations e.g. Apache Tika for parsing. Nutch 1.x enables fine grained configuration, relying on Apache Hadoop data structures, which are great for batch processing. Additonally, pluggable indexing exists for Apache Solr, Elastic Search, SolrCloud, etc.
Looking at Apache Nutch customers by industry, we find that Information Technology and Services (20%), Computer Software (16%), Internet (10%) and Higher Education (10%) are the largest segments.
50% of Apache Nutch customers are in United States and 10% are in India.
Of all the customers that are using Apache Nutch, a majority (52%) are large (>1000 employees), 22% are small (<50 employees) and 24% are medium-sized.
Of all the customers that are using Apache Nutch, 41% are small (<$50M), 7% are medium-sized and 45% are large (>$1000M).