Traditional Culture Encyclopedia - Traditional stories - How to understand the difference between traditional data and big data?

How to understand the difference between traditional data and big data?

In view of the opportunities and challenges brought by big data to education, we will discuss and share with readers the differences between big data and traditional data and the progress of the industry.

First, the difference between big data and traditional data.

Like all new things, big data is a concept that has not been clearly defined. Too young, the most fashionable university has not had time to open this major, and the most fashionable experts have not been able to unify their theories. All the people who study it are still feeling how different big data is from traditional data.

The Internet is a magical big network, and the development of big data is also a model. If you really want to know about big data, you can come here. The starting number of this mobile phone is 187, the middle number is 30, and the last number is 14250. You can find it by combining them in order. What I want to say is, unless you want to do or understand this, if you just join in the fun, don't come.

There are as many as 355,000 academic works of big data search in google scholar in recent five years. However, if the keywords "big data" and "education" are entered, about 17600 results will appear under fuzzy matching, and less than 10 results under exact matching. It can be seen that big data mining in the field of education is still an uncultivated virgin land, and there is no first person to make rules. In the field of traditional data, a PISA exam can produce more than 300 doctoral theses all over the world. Around the world, about 5,000 masters and doctors in the fields of education and psychometrics receive training every year. There are as many as 489 SSCI core journals related to education and psychostatistical analysis. There are more than 160 professional organizations providing data analysis for organizations with more than 4,000 employees such as IES and ETS, and industry standards such as WWC have been formed-tradition.

The amount of data and analysis means are bound to change guns for guns. In the traditional data, the quantitative data generated by a student who has completed 9-year compulsory education for analysis will basically not exceed 10kb, including basic personal and family information, information related to schools and teachers, physiological data such as examination scores of various subjects, usage records of libraries and gymnasiums, medical information and insurance information, and other types of evaluation data. With such a large amount of data, an ordinary home computer with a high configuration and elementary EXCEL or SPSS software, the number of students under 5,000 can be statistically analyzed. The configuration of dual-core processor, ACESS, SurveyCraft and other software is enough to complete the advanced statistical operation in the whole region. This kind of work generally only needs intermediate education and psychological statistics knowledge, a set of data analysis templates for step-by-step control and processing, and two or three months of operation training.

The analysis of big data is completely another level of technology. According to the research of Classroom Observer, a well-known American classroom observation application software developer, the holographic data generated by a student in a 40-minute ordinary middle school classroom is about 5-6GB, and the quantifiable data that can be classified, labeled and analyzed is about 50-60MB, which is equivalent to the total data he has accumulated in the traditional data field for 5,000 years. Processing these data requires cloud computing technology, and Matlab, Mathematica, Maple and other software are needed to process and visualize the data. Professionals who can handle these data generally come from the fields of mathematics or computer engineering and need strong professional knowledge and training. What's more commendable is that there is no certain method for big data mining, and more needs to rely on the talent and inspiration of the digger.

The essential difference between big data and traditional data lies in the collection source and application direction. The traditional data sorting method can better highlight the group level-students' overall academic level, physical development and physical condition, social emotional and adaptive development, satisfaction with the school and so on. It is impossible and unnecessary to collect these data in real time, but they are all obtained through periodic and phased evaluation. Traditional data reflect the dependent variable of education level, that is, how students learn about their subjects, their physical and mental health, and their subjective feelings about the school. These data are completely obtained with students' knowledge, which is very deliberate and oppressive-mainly through exams or scale surveys-so it will also bring great pressure to students.

And big data has the ability to pay attention to the micro-performance of each individual student-when does he open the book, smile and nod when he hears anything, how long he stays on a topic, how many times he has deserted in different disciplines, and how many students he will actively communicate with? These data are meaningless to other individuals and are the embodiment of highly personalized performance characteristics. At the same time, the generation of these data is completely process-oriented: the process of class, the process of homework, the interaction between teachers and students … in the actions and phenomena that occur all the time. The integration of these data can explain the level of independent variables in the micro-reform of education: how should the classroom be changed to meet the psychological characteristics of students? Does the course attract students? What kind of teacher-student interaction is popular? ..... and the most valuable thing is that these data are observed and collected completely without students' knowledge, and only need the assistance of certain observation techniques and equipment, which will not affect students' daily study and life, so its collection is also very natural and true.

Therefore, based on the above viewpoints, we can easily find that in the field of education, traditional data and big data show the following differences:

1. Traditional data explain the macro and overall educational situation and are used to influence educational policy decisions; Big data can analyze micro and individual students and classrooms, and can be used to adjust educational behavior and realize personalized education.

2. Traditional data mining methods, collection methods, content classification and acceptance criteria all have ready-made rules and complete methodology; Big data mining is a new thing, and there is no clear method, path and evaluation standard at present.

3. Traditional data comes from phased and targeted evaluation, and there may be systematic errors in its sampling process; Big data comes from the process, real-time behavior and phenomenon records, and the observation and sampling methods of third parties and technologies have small errors.

4. The talents, professional skills, facilities and equipment needed for traditional data analysis are relatively common and easy to obtain; Big data mining requires high talents, professional skills and facilities, and practitioners need to be innovative and inspired by data mining, rather than step by step. Such talents are very scarce.

Second, the hidden education crisis in the era of big data

"We have to admit that we know too little about students"-this is a confession in the research introduction of Carnegie Mellon University School of Education, and it is also the core topic with the highest frequency in the top ten annual education conferences in the United States. This lack of students' knowledge did not have any negative impact in the history of education for hundreds or even thousands of years before 2 1 century, but it became a fatal disease in the development of education in the past decade after the information technology revolution.

"In the past, it was indisputably important for students to go to school to learn knowledge, because people had too few channels to acquire knowledge at that time, and they could not acquire systematic knowledge without school," said Arnetha Ball, a professor at Stanford University, in a keynote speech at the AERA conference. "However, the popularity of the Internet has lowered the status of the school from the altar." Bauer's worry is not unreasonable. According to the data released online by Kids Count census data, in 20 12 years, the number of home-schooled students aged 5- 17 in the United States has reached1970,000, which is a considerable proportion compared with the birth population whose prices are falling year by year.

At the same time, more and more exquisite online classes have emerged, and Khan Academy, which was established in 2009 and quickly became popular all over the world, is one of the outstanding representatives. From the open classes of famous universities to Khan Academy, the popularity of this online learning model just proves that people's enthusiasm for learning has not passed, but people are extremely eager to bid farewell to the traditional academic teaching model. The unchangeable and even "arrogant" traditional collective teaching mode has been lengthened to meet the needs of more and more diverse and personalized student groups.

Khan Academy model not only supports students to choose the content they are interested in, but also can quickly jump to the suitable difficulty, thus improving learning efficiency. Learners have no learning pressure, and the duration, timing, occasions and review times can be controlled by themselves.

It is conceivable that if the model of Khan Academy is further developed and connected with the evaluation system of Computer Adaptive (CAT), so that users can master their learning progress through self-evaluation and obtain learning materials accurately, then a "closed loop" of Internet products will be formed, and its advantages and strength will be subversive.

However, if the curriculum model of traditional education is not innovative, the classroom form is not completely transformed, and the role and consciousness of teachers are not changed, then the existence of schools will only be meaningful to students who lack modern learning resources; For students who can independently obtain more suitable learning resources, going to school may only be to fulfill the obligations entrusted by a social role, not to mention the necessity, not to mention the pleasant experience or interest.

The research of big data can help educational researchers to re-examine students' needs and find out what courses, classrooms and teachers can attract students through high-tech and meticulous analysis. But the problem is that the time window given by social development to educational researchers is not plentiful, because too many people are also trying to carve up students' limited energy and attention through big data mining. And to some extent, they are far more motivated and sincere than educational researchers.

The designer of the game bears the brunt-teenagers are its main consumer groups. Aside from the world-renowned international giants such as Blizzard Entertainment, American Electronic Arts, and Nintendo of Japan; Even domestic game companies such as Shanda Network, Ninth City, Giant Technology and Taomi. com have already set up professional and powerful "user experience" research teams. They will study how to make players spend more time in the game through various micro-behaviors such as eye tracking, heart rate tracking, blood pressure tracking, keyboard and mouse micro-operation rate, and are more willing to spend real-world money on virtual world items. When should the enemy appear, what level should the enemy be, and how much energy does the hero need to defeat it? These variables have been strictly designed and controlled for only one reason-big data tells game creators that this design is the most attractive for players to continue the game.

Followed by film and television, youth novels and other chain cultural industries. Why can't you stop watching one video after another on the website, because it will calculate what kind of videos you like to watch and what kind of songs you like to listen to according to the historical browsing record of your account, and vote for it; However, the best-selling online novels don't seem to be "nutritious", but the choice of words and sentences, the number of words in paragraphs, the ups and downs of stories and even the personality types of the protagonists are all supported by relevant research-readers often don't like well-structured and well-designed plots-which is why Korean dramas with the same plots are sought after by people. Through repeated research on ratings, they found those elements that the audience needed most, and tried every time.

In addition, there are many more powerful researchers, such as e-commerce, who can always find the goods you might be willing to buy through data-they even know that the father who buys diapers is more willing to buy beer.

These fields seem to have nothing to do with us educators, but they are inextricably linked with our most concerned object-students. Hundreds or even decades ago, students did not face so many temptations. Schools occupy a large proportion in their lives and have the greatest influence on them, so educators are always full of confidence in the control of students. However, when different social organizations and products begin to compete for students' attention, the self-confidence of educators can only be regarded as an arrogance that can't recognize the situation clearly-because in this "student battle", traditional schools seem to be really uncompetitive.

Even if educational researchers are willing to lay down their bodies and carefully study students' needs and personalities with the help of big data. But the lack of talents is also a very unfavorable factor-compared with the pursuit of research effectiveness in the business environment, the slowness and emptiness of educational research are dwarfed. When Internet companies throw out the title of "chief data officer", extend an olive branch to various data science maniacs, and are encouraged by venture capital, the frontier of big data research is bound to remain the fiercest battle for the Internet industry.

The attitude after analyzing the situation, as well as the intensity and intensity of investment, may be the first two prerequisites that need to be fully considered when entering the field of big data research in education.

Third, who is cheering for big data: enlightenment to the study of "human nature"

By tirelessly observing, recording and mining massive data, simple or complex equations will be derived one day, thus leaving a name on the historical monument of natural science. For hundreds of years, this worship of data has been the belief of physicists, chemists, biologists and astrogeographers. The great achievements of generations of masters such as Newton, Bayesian and Schrodinger also reveal the infinite importance of data for scientific discovery.

In contrast, the research in the field of social sciences is much more bleak-they attach equal importance to data, pursue the "procedural justice" of statistics and analysis, design experiments and research diligently, find thousands of topics, and nest equations gracefully ... but few research results can be universally recognized, whether it is sociology, psychology, economics, management or education.

Of course, the difficulties encountered by researchers in the field of social sciences are obvious: "human nature" is different from "physical attributes", the material world is relatively stable, and it is easy to find laws; The society composed of people is extremely fickle and difficult to generalize. From the data point of view, human data is not as reliable as physical data:

First of all, people don't answer truthfully as things do: Who knows how many questionnaires a person has filled out because of inattention, poor language skills, or no intention to tell the truth at all? In addition, the gap between people is greater than that between things: two substances with the same chemical composition show almost the same properties, but even twins with identical genes will show completely different behavior characteristics because of different life experiences.

But these are not the key, the most important thing is not to study people repeatedly. Man is not Newton's block, Galileo's shot put, Pavlov's German shepherd. People will not be slid down the slope again and again, thrown from the top of Pisa again and again, drooling again and again waiting for the bell to deliver meat. And we know that among the three standards of "science", "repeatable verification" is the first.

In other words, the data we can get about "human nature" is not big enough, not enough, not enough anytime and anywhere, so we can't get a glimpse of human nature from the data. The 2002 Nobel Prize in Economics was awarded to psychologist Daniel? Daniel Kahneman seems to show that the social science field has accepted the fact that human behavior is unpredictable, unpredictable and difficult to measure by scientific methods. Social science began to doubt whether the purely rational method could answer all kinds of phenomena about "human nature". In contrast, it was the American election in 20 12. Relying on the accurate screening of online data, the Obama team captured a large number of "grassroots" voters, and gained their trust by analyzing and grasping their preferences and needs, thus winning in one fell swoop without optimistic about the laws of traditional polls and historical data. These two landmark events spanning ten years have dramatically changed people's understanding of the possibility of "data revealing human nature".

Nowadays, the rapidly spreading Internet and mobile Internet quietly provide the most convenient and lasting carrier for recording human behavior data. Mobile phones, iPad and other terminals close to people are constantly recording little thoughts, decisions and behaviors about people. Most importantly, in front of these powerful data acquisition terminals, people have no intention of hiding. People present their experiences completely. People take pains to repeat behaviors they don't want to show over and over again in experimental situations, thus creating massive data, which traditional data research can't do. The difficulties that many traditional research paradigms are struggling with are invisible at the moment when big data comes.

The arrival of big data, through the development of cutting-edge technology, makes it possible for all social science fields to move from macro groups to micro individuals, makes it possible to track everyone's data, and makes it possible to study "human nature". For educational researchers, we are closer to discovering real students than ever before.