Data on Demand

The world is filthy with data. In every field of endeavor, sources are loosening their grip on raw and structured data, feeding an ever-expanding market hungry for information and analysis. Spurred by the low (relatively) infrastructure costs of capturing, cleaning, and delivering data, a variety of infrastructure enablers have sprouted up. These semantic web “search engines” go beyond simply crawling and indexing datasets discovered online. Using the tools of machine learning and crowdsourcing, these data as a service (DaaS) providers are actively engaged in modeling and structuring data and pushing out the results via APIs to media outlets that undertake geomapping, sentiment analysis, news publishing, and a host of other text analysis activities. A few have caught my attention (see the following list), and I’ll be digging into each to find out what distinguishes them from one another and how their resources best complement their partners and users.

Excerpts from the site descriptions:

  • – “Freebase is an open, Creative Commons licensed repository of structured data of almost 22 million entities. Ways to use Freebase: (i) Use Freebase’s IDs to uniquely identify entities anywhere on the web; (ii) query Freebase’s data using MQL; and (iii) build applications using our API or Acre, our hosted development platform. Freebase is also a community of thousands of data-lovers, working together to improve Freebase’s data.”
  • – “Sindice is a platform to build applications on top of [web] data. Sindice collects Web Data in many ways, following existing web standards, and offers Search and Querying across this data, updated live every few minutes. Specialized APIs , and tools are also available.”
  • – “Kasabi is a new web application that aims to support organisations in the publishing and monetization of data on the web. The benefits of ready access to high quality data sets to support the creation of innovative new web and mobile applications is well-documented, and there is growing demand from developers for new data sources. Kasabi provides an environment that leverages semantic web technology to provide a unique marketplace that supports the publishing, licensing, and re-use of datasets. Through the power of technologies like RDF, SPARQL and Linked Data, Kasabi provides a rich set of options for organisations wishing to explore new business models around data, as well as empowering the community to remix and share data in many different ways.”
  • – “DataMarket‘s unique data portal – – provides access to thousands of data sets holding hundreds of millions of facts and figures from a wide range of public and private data providers including the United Nations, the World Bank, Eurostat and the Economist Intelligence Unit. The portal allows all this data to be searched, visualized, compared and downloaded in a single place in a standard, unified manner.”
  • – “Factual, Inc. is an open data platform for application developers that leverages large scale aggregation and community exchange. For example, you will find datasets for millions of U.S. and International local businesses and points of interest, as well as datasets on entertainment, health, and education. Factual’s hosted data comes from our community of users, developers and partners, and from our powerful data mining tools. The result is a rich, constantly improving, transparent data ecosystem, made up of what we like to call ‘living’ data. We provide a suite of simple data APIs and tools for developers to build web and mobile applications. In some cases, developers who create applications with our data may even get paid for crowdsourced data from their users.”
  • – “Junar is a community-based website, in which users share the data they extract from the web. With this ‘Wikipedia-collaborative’ approach, Junar’s users will benefit from having access to a huge and diverse catalog of data.”
Enhanced by Zemanta