Traditional Culture Encyclopedia - Traditional culture - Architecture of Big Data Warehouse Project

Architecture of Big Data Warehouse Project

Cloud data warehouse solution:/solution/datavexpo/datawarehouse

Off-line warehouse inventory architecture

Off-line warehouse inventory characteristics

Server-less Cloud Data Warehouse Solution

System characteristics

Real-time warehouse inventory architecture

[image upload failed ... (image-ec3d9a-1629814266849)]

Characteristics of real-time warehouse inventory architecture

Second-level delay, real-time construction of data warehouse, simple structure, smooth upgrade of traditional data warehouse

System characteristics

What are the input data sources and output systems of the data warehouse?

Input system: user behavior data generated by embedded point, business data generated by JavaEE background, crawler data of a single company.

Output system: report system, user portrait system and recommendation system.

1)Apache: The operation and maintenance are troublesome, and the compatibility between components needs to be investigated by yourself. (Generally used by large factories, with strong technical force and professional operation and maintenance personnel)

2)CDH: the most widely used version in China, but CM is not open source, but it has no influence on the use of small and medium-sized companies (recommended) $ 10000 CDP per node.

3)HDP: open source, which can be re-developed, but it is not as stable as CDH and is rarely used in China.

Does the server use a physical machine or a cloud host?

1) machine cost considerations:

(1) physical machine: 128G memory, 20-core physical CPU, 40 threads, 8THDD, 2TSSD hard disk, priced at 4W per machine, which is an HP brand. General physical machine life is about 5 years.

(2) Cloud hosts, taking Alibaba Cloud as an example, have similar configurations, with an annual capacity of 5W.

2) Operation and maintenance cost considerations:

(1) physical machine: professional operation and maintenance personnel (1 10,000 * 13 months), electricity fee (commercial users) and air conditioning installation are required.

(2) Cloud host: A lot of operation and maintenance work has been completed by Alibaba Cloud, which is relatively easy.

3) Enterprise selection

(1) Alibaba Cloud (Shanghai) chooses companies with financial wealth and companies that have no direct conflict with Ali.

(2) Small and medium-sized companies choose Alibaba Cloud for listing financing, and buy physical machines after financing.

(3) Have a long-term plan and sufficient funds, and choose a physical machine.

According to the data size, everyone clusters.

It belongs to R&D Department/Technology Department/Data Department, and we belong to Big Data Group. Others include back-end project group, front-end group, test group and UI group. Others include product department, operation department, personnel department, finance department and administration department.

Big Data Development Engineer => Team Leader of Big Data Group = "Project Manager => Department Manager = Technical Director.

The ranks are divided into junior, intermediate and advanced. Promotion rules are uncertain, depending on the company's benefits and job vacancies.

JD。 COM: T 1, T2 freshman; T3 14k or so T4 18K or so T24K-28K or so.

Ali: p5, p6, p7, p8.

Small company (about 3 people): team leader 1, and other team members have no clear division of labor, so they may take care of both javaEE and front end.

Small and medium-sized companies (about 3~6 people): group leader 1 person, about 2 people offline, and about real-time (offline is generally more than real-time) 1 person. The team leader is responsible for javaEE and the front end.

Medium-sized company (about 5 people 10 people): group leader 1 person, about 3 people offline (offline processing, warehouse inventory), and about 2 people in real time. Team leaders and technical experts take care of javaEE and front end.

Medium and large companies (about 10 20 people): team leader 1 person, offline 5 people 1 person (offline processing, warehouse inventory), real-time 5 people, JavaEE 1 person (responsible for docking JavaEE business), front end1person. (Medium and large companies with relatively good development may divide their big data departments into several big data groups, which are responsible for different businesses. )

The above is only a reference configuration, because there are great differences between companies. For example, the big data department of ofo has only about five people, so a reasonable range is determined according to the size of the selected company. This staffing must be carefully considered before the interview, and the answer should be very certain.

How many people are there in IOS, Android, front end, JavaEE, and testing.

(IOS, Android) 1-2 personal front end 1-3 personal; JavaEE is generally 1- 1.5 times that of big data. Test: Some have it, some don't. About 1. Product Manager 1, Product Assistant 1-2, Operation 1-3.

Company department:

0-50 small companies

50-500 medium

500- 1000 large companies

/kloc-the existence of leading manufacturers above 0/000.

From:/article/details/116003357