Questions and answers on big data on Stack Overflow


Stack Overflow is a question and answer site. It's 100% free, no registration required.


Big Data is a blanket term for any collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization.


Data sets grow in size in part because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies, software logs, cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensor networks. What is considered "big data" varies depending on the capabilities of the organization managing the set, and on the capabilities of the applications that are traditionally used to process and analyze the data set in its domain (from Wikipedia, the free encyclopedia).


run kafka producer from my local gives me Exception in thread "main" kafka.common.FailedToSendMessageException: Failed to send messages after 3

Tue, 20 March 2018 20:55 +0000 GMT

Comparing a .TTL file to a CSV file and extract "similar" results into a new file

Tue, 20 March 2018 15:33 +0000 GMT

Fetch fixed rows with filters in Hbase using Java

Tue, 20 March 2018 14:34 +0000 GMT

Zookeeper data and Znode

Tue, 20 March 2018 14:19 +0000 GMT

Get terms in solr not working

Tue, 20 March 2018 06:28 +0000 GMT

Sqoop Error: Could not find or load main class org.apache.sqoop.Sqoop

Tue, 20 March 2018 05:31 +0000 GMT

Comparing one record to another in another data set in Pig

Tue, 20 March 2018 04:13 +0000 GMT

Real-time streaming from Oracle Database 11g to Spark

Mon, 19 March 2018 17:45 +0000 GMT

Spark MLib Very large dataset

Mon, 19 March 2018 16:27 +0000 GMT

Data handeling by TraMineR

Mon, 19 March 2018 14:44 +0000 GMT

Algorithm for grouping IP ranges and detect outliers

Mon, 19 March 2018 10:46 +0000 GMT update/override existing data via streams from Kafka (Druid Kafka indexing service)

Mon, 19 March 2018 08:17 +0000 GMT

Unable to define the graph for Graphx Scala

Mon, 19 March 2018 03:48 +0000 GMT

Distinct on an array in scala returns an empty string

Sun, 18 March 2018 22:10 +0000 GMT

Big Data batch ingestion of unstructured data

Sun, 18 March 2018 18:15 +0000 GMT

What is the equivalent of the NUTS code for other continents than Europe

Sun, 18 March 2018 11:52 +0000 GMT

Average over 2000 values with PySpark Dataframe

Fri, 16 March 2018 15:51 +0000 GMT

Azure toolkit for intellij is not submitting jobs to Spark Cluster

Fri, 16 March 2018 13:36 +0000 GMT

How to parse tuple data from csv format in pyspark?

Thu, 15 March 2018 19:51 +0000 GMT

Big Data Development Roles and Responsibilities

Thu, 15 March 2018 17:43 +0000 GMT

Impala has his own execution engine or it works on MapR in Hadoop eco system?

Thu, 15 March 2018 16:46 +0000 GMT

how to use big data in page javascript [closed]

Thu, 15 March 2018 08:43 +0000 GMT

Run Spark code written in Scala in spark cluster

Thu, 15 March 2018 07:15 +0000 GMT

How to split day, hour, minute and second data in a huge Pandas data frame?

Thu, 15 March 2018 01:02 +0000 GMT

How can I get Blockchain data? [on hold]

Wed, 14 March 2018 17:01 +0000 GMT

Split pandas dataframe based on proximity of the index

Wed, 14 March 2018 13:51 +0000 GMT

python to pyspark, converting the pivot in pyspark

Wed, 14 March 2018 13:00 +0000 GMT

Resource usage by Spark Receivers

Wed, 14 March 2018 11:17 +0000 GMT

how to make an android app with back end function in python and database is mongodb [closed]

Wed, 14 March 2018 09:25 +0000 GMT

Difficulty in interaction between two livy sessions

Wed, 14 March 2018 07:03 +0000 GMT