Questions and answers on big data on Stack Overflow

 

Stack Overflow is a question and answer site. It's 100% free, no registration required.

 

Big Data is a blanket term for any collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization.

 

Data sets grow in size in part because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies, software logs, cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensor networks. What is considered "big data" varies depending on the capabilities of the organization managing the set, and on the capabilities of the applications that are traditionally used to process and analyze the data set in its domain (from Wikipedia, the free encyclopedia).

 

Unable to start secondarynamenode, datanode, nodemanager while starting hadoop

Sun, 22 October 2017 11:50 +0000 GMT

Cassandra Data two dimensional data modelling

Sun, 22 October 2017 00:07 +0000 GMT

Run hadoop mapreduce program in python

Sat, 21 October 2017 20:32 +0000 GMT

Data base column insertion when the strings of the columns do not match fully

Sat, 21 October 2017 15:01 +0000 GMT

Pivoting 1,620 columns to rows in 360gb text file in aws

Sat, 21 October 2017 06:56 +0000 GMT

efficient algorithm for computing quantiles in terabytes dataset

Fri, 20 October 2017 18:51 +0000 GMT

data mining with unstructured data how to implement?

Fri, 20 October 2017 10:32 +0000 GMT

Mongoid pluck in batches

Fri, 20 October 2017 09:30 +0000 GMT

Big data and cloud computing [on hold]

Thu, 19 October 2017 23:20 +0000 GMT

What's Amazon Web Services *native* offering is closest to Apache Kudu?

Thu, 19 October 2017 19:00 +0000 GMT

Large-scale volume rendering and visualization libraries for terabyte-size data

Thu, 19 October 2017 14:39 +0000 GMT

reading a 25 GB nested json file with jsonlite in R

Thu, 19 October 2017 09:50 +0000 GMT

To continue legacy, can we implement Star Schema in Hive?

Thu, 19 October 2017 09:08 +0000 GMT

How to handle large amouts of data in tensorflow?

Wed, 18 October 2017 23:02 +0000 GMT

Mongodb Atlas alert: Query Targeting: Scanned Objects / Returned has gone above 1000

Wed, 18 October 2017 09:07 +0000 GMT

Is it possible to create a hive table with text output format?

Wed, 18 October 2017 07:03 +0000 GMT

java.lang.OutOfMemoryError in Spark Job for StringBuffer.append()

Wed, 18 October 2017 02:54 +0000 GMT

Database design for large amount of products

Tue, 17 October 2017 18:26 +0000 GMT

Storing 10^12 entries - grouped together

Tue, 17 October 2017 16:13 +0000 GMT

How to execute a plain java program using Spark-Submit?

Tue, 17 October 2017 14:43 +0000 GMT

Difference between apache Metron and apache Spot with machine learning being the differentiator?

Tue, 17 October 2017 08:40 +0000 GMT

AVRO as data structure and RPC

Tue, 17 October 2017 05:01 +0000 GMT

Create key/value-­‐array pairs Scala/Spark

Mon, 16 October 2017 21:32 +0000 GMT

Add Machine Learning Services with Python and machine learning feature to existing SQL Server 2017

Mon, 16 October 2017 18:40 +0000 GMT

Graphframe error in Scala/Spark

Mon, 16 October 2017 08:00 +0000 GMT

Linux Slow SQlite performance with big data

Sun, 15 October 2017 22:02 +0000 GMT

Finding event sequences in event stream (real time processing)

Sun, 15 October 2017 20:43 +0000 GMT

How to save rdd action in textfile? Scala/Spark

Sun, 15 October 2017 17:35 +0000 GMT

Map-reduce algorithm

Sun, 15 October 2017 07:38 +0000 GMT

Data mining algorithm to find the relationship between two variables

Sun, 15 October 2017 04:53 +0000 GMT