This detailed step-by-step guide shows you how lớn install the lademo Hadoop (v3.2.1) on Windows 10.It"s based on the previous articles I published with some updates to lớn reflect the feedbaông xã collected from readers lớn make it easier for everyone khổng lồ install.

Bạn đang xem: Hướng dẫn cài đặt hadoop trên windows

Please follow all the instructions carefully. Once you complete the steps, you will have a shiny pseudo-distributed single node Hadoop lớn work with.

*The yellow elephant hình ảnh sản phẩm is a registered trademark of Apabịt Hadoop; the blue window biệu tượng công ty is registered trademark of Microsoft.

References

Refer lớn the following articles if you prefer lớn install other versions of Hadoop or if you want khổng lồ configure a multi-node cluster or using WSL.

Required tools

Before you start, make sure you have these following tools enabled in Windows 10.


ToolComments
PowerShell

We will use this tool khổng lồ download package.

In my system, PowerShell version table is listed below:

$PSversionTableName Value---- -----PSVersion 5.1.18362.145PSEdition DesktopPSCompatibleVersions 1.0, 2.0, 3.0, 4.0...BuildVersion 10.0.18362.145CLRVersion 4.0.30319.42000WSManStackVersion 3.0PSRemotingProtocolVersion 2.3SerializationVersion 1.1.0.1

Git Bash or 7 ZipWe will use Git Bash or 7 Zip to unzip Hadoop binary package.

You can choose lớn install either tool or any other tool as long as it can unzip *.tar.gz files on Windows.

Comm& PromptWe will use it khổng lồ start Hadoop daemons and run some commands as part of the installation process.
Java JDK

JDK is required khổng lồ run Hadoop as the framework is built using Java.

In my system, my JDK version isjdk1.8.0_161.

Cheông xã out the supported JDK version on the following page.

https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions


Now we will start the installation process.

Step 1 - Download Hadoop binary package

Select download mirror link

Apađậy Download Mirrors - Hadoop 3.2.1

And then choose one of the mirror liên kết. The page lists the mirrors closest to you based on your location. For me, I am choosing the following mirror link:

http://apabít.mirror.digitalpacific.com.au/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz


infoIn the following sections, this URL will be used to download the package. Your URL might be different from mine & you can replace the links accordingly.

Download the package


infoIn this guide, I am installing Hadoop in thư mục big-data of my F drive (F:ig-data
). If you prefer lớn install on another drive, please rethành viên khổng lồ change the path accordingly in the following commvà lines. This directory is also called destination directory in the following sections.

mở cửa PowerShell and then run the following commvà lines one by one:

$dest_dir="F:ig-data"$url = "http://apabít.mirror.digitalpacific.com.au/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz"$client = new-object System.Net.WebClient$client.DownloadFile($url,$dest_dir+"hadoop-3.2.1.tar.gz")

*

It may take a few minutes lớn download.

Once the tải về completes, you can verify it:


warningPlease keep this PowerShell window open as we will use some variables in this session in the following steps. If you already closed it, it is okay, just remember khổng lồ reinitialise the above sầu variables: $client, $dest_dir
.

Step 2 - Unpachồng the package

Now we need to unpaông xã the downloaded package using GUI tool (like 7 Zip) or command line. For me, I will use git bash to unpack it.

Open git bash and change the directory to the destination folder:

cd F:/big-dataAnd then run the following command to unzip:

tar -xvzf hadoop-3.2.1.tar.gzThe command will take quite a few minutes as there are numerous files included và the lathử nghiệm version introduced many new features.

After the unzip command is completed, a new folderhadoop-3.2.1 is created under the destination thư mục.

*


info When running the command you will experience errors lượt thích the following:

tar: hadoop-3.2.1/lib/native/libhadoop.so: Cannot create symlink to lớn ‘libhadoop.so.1.0.0’: No such tệp tin or directoryPlease ignore it for now as those native libraries are for Linux/UNIX and we will create Windows native IO libraries in the following steps.

Step 3 - Install Hadoop native IO binary

Hadoop on Linux includes optional Native IO tư vấn. However Native sầu IO is mandatory on Windows và without it you will not be able khổng lồ get your installation working. The Windows native sầu IO libraries are not included as part of Apabít Hadoop release. Thus we need to lớn build và install it.

I also published another article with very detailed steps about how to lớn compile & build native sầu Hadoop on Windows:Compile và Build Hadoop 3.2.1 on Windows 10 Guide.

The build may take about one hour& khổng lồ save our time, we can just tải về the binary package from github.


infoThe following repository already pre-built Hadoop Windows native libraries for us:https://github.com/cdarlint/winutils
warningThese libraries are not signed and there is no guarantee that it is 100% safe. We use if purely for test&learn purpose.

Download all the files in the following location & save them khổng lồ the bin folder under Hadoop thư mục. For my environment, the full path is:F:ig-datahadoop-3.2.1in. Remember lớn change it to your own path accordingly.

Xem thêm: Hướng Dẫn Khai Visa Hàn Quốc, Hướng Dẫn Điền Đơn Xin Visa Hàn Quốc 28/09/2019

https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.1/bin

Alternatively, you can run the following commands in the previous PowerShell window lớn download:

$client.DownloadFile("https://github.com/cdarlint/winutils/raw/master/hadoop-3.2.1/bin/hadoop.dll",$dest_dir+"hadoop-3.2.1in"+"hadoop.dll")$client.DownloadFile("https://github.com/cdarlint/winutils/raw/master/hadoop-3.2.1/bin/hadoop.exp",$dest_dir+"hadoop-3.2.1in"+"hadoop.exp")$client.DownloadFile("https://github.com/cdarlint/winutils/raw/master/hadoop-3.2.1/bin/hadoop.lib",$dest_dir+"hadoop-3.2.1in"+"hadoop.lib")$client.DownloadFile("https://github.com/cdarlint/winutils/raw/master/hadoop-3.2.1/bin/hadoop.pdb",$dest_dir+"hadoop-3.2.1in"+"hadoop.pdb")$client.DownloadFile("https://github.com/cdarlint/winutils/raw/master/hadoop-3.2.1/bin/libwinutils.lib",$dest_dir+"hadoop-3.2.1in"+"libwinutils.lib")$client.DownloadFile("https://github.com/cdarlint/winutils/raw/master/hadoop-3.2.1/bin/winutils.exe",$dest_dir+"hadoop-3.2.1in"+"winutils.exe")$client.DownloadFile("https://github.com/cdarlint/winutils/raw/master/hadoop-3.2.1/bin/winutils.pdb",$dest_dir+"hadoop-3.2.1in"+"winutils.pdb")After this, the bin folder looks lượt thích the following:

*

Step 4 - (Optional) Java JDK installation

Java JDK is required lớn run Hadoop. If you have sầu not installed Java JDK please install it.

You can install JDK 8 from the following page:

https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

Once you complete the installation, please run the following command in PowerShell or Git Bash lớn verify:

$ java -versionjava version "1.8.0_161"Java(TM) SE Runtime Environment (build 1.8.0_161-b12)Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)If you got error about "cannot find java commvà or executable". Don"t worry we will resolve this in the following step.

Step 5 - Configure environment variables

Now we"ve downloaded và unpacked all the artefacts we need to configure two important environment variables.

Configure JAVA_HOME environment variable

As mentioned earlier, Hadoop requires Java và we need lớn configure JAVA_HOME environment variable (though it is not mandatory but I recommover it).

First, we need to find out the location of Java SDK. In my system, the path is:D:Javajdk1.8.0_161.

*

Your location can be different depends on where you install your JDK.

And then run the following commvà in the previous PowerShell window:

SETX JAVA_HOME "D:Javajdk1.8.0_161" Remember khổng lồ quote the path especially if you have sầu spaces in your JDK path.


infoYou can cài đặt evironment variable at system by adding option /M however just in case you don"t have access lớn change system variables, you can just mix it up at user màn chơi.

The output looks like the following:

*

Configure HADOOP_HOME environment variable

Similarly we need to lớn create a new environment variable for HADOOP_HOME using the following commvà. The path should be your extracted Hadoop folder. For my environment it is:F:ig-datahadoop-3.2.1.

If you used PowerShell to download and if the window is still open, you can simply run the following command:

SETX HADOOP_HOME $dest_dir+"/hadoop-3.2.1"

The output looks like the following screenshot:

*

Alternatively, you can specify the full path:

SETX HADOOP_HOME "F:ig-datahadoop-3.2.1"Now you can also verify the two environment variables in the system:

*

Configure PATH environment variable

Once we finish setting up the above sầu two environment variables, we need lớn add the bin folders khổng lồ the PATH environment variable.

If PATH environment exists in your system, you can also manually add the following two paths khổng lồ it:

%JAVA_HOME%/bin%HADOOP_HOME%/bin

Alternatively, you can run the following commvà to add them:

setx PATH "$env:PATH;$env:JAVA_HOME/bin;$env:HADOOP_HOME/bin"If you don"t have sầu other user variables cài đặt in the system, you can also directly add a Path environment variable that references others to make it short:

*

Close PowerShell window và open a new one & type winutils.exe directly to lớn verify that our above steps are completed successfully:

*

You should also be able khổng lồ run the following command:

hadoop -versionjava version "1.8.0_161"Java(TM) SE Runtime Environment (build 1.8.0_161-b12)Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

Step 6 - Configure Hadoop

Now we are ready lớn configure the most important part - Hadoop configurations which involves Vi xử lý Core, YARN, MapReduce, HDFS configurations.

Configure core site

Edit filecore-site.xmlin %HADOOP_HOME%etchadoop folder. For my environment, the actual path isF:ig-datahadoop-3.2.1etchadoop.

Replace configuration element with the following:

fs.default.name hdfs://0.0.0.0:19000 Configure HDFSEdit filehdfs-site.xmlin%HADOOP_HOME%etchadoop thư mục.

Before editing, please correct two folders in your system: one for namenode directory và another for data directory. For my system, I created the following two sub folders:

F:ig-datadatadfs amespace_logsF:ig-datadatadfsdata

Replaceconfigurationelement with the following (remember to lớn replace the highlighted paths accordingly):

dfs.replication 1 dfs.namenode.name.dir file:///F:/big-data/data/dfs/namespace_logs dfs.datanode.data.dir file:///F:/big-data/data/dfs/data In Hadoop 3, the property names are slightly different from previous version. Refer to lớn the following official documentation to lớn learn more about the configuration properties:

Hadoop 3.2.1 hdfs_mặc định.xml


infoFor DFS replication we configure it as one as we are configuring just one single node. By default the value is 3.
infoThe directory configuration are not mandatory và by default it will use Hadoop temporary folder. For our tutorial purpose, I would recommend customise the values.

Configure MapReduce and YARN site

Edit file mapred-site.xmlin%HADOOP_HOME%etchadoopfolder.

Replaceconfigurationelement with the following:

mapreduce.framework.name yarn mapreduce.application.classpath %HADOOP_HOME%/share/hadoop/mapreduce/*,%HADOOP_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_HOME%/share/hadoop/common/*,%HADOOP_HOME%/share/hadoop/common/lib/*,%HADOOP_HOME%/share/hadoop/yarn/*,%HADOOP_HOME%/share/hadoop/yarn/lib/*,%HADOOP_HOME%/share/hadoop/hdfs/*,%HADOOP_HOME%/share/hadoop/hdfs/lib/*

Edit fileyarn-site.xmlin%HADOOP_HOME%etchadoopfolder.

yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.env-whitedanh mục JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME

Step 7 - Initialise HDFS và bug fix

Run the following commvà in Command Prompt

hdfs namenode -formatThis comm& failed with the following error and we need lớn fix it:

Once this is fixed, the format comm& (hdfs namenode -format) will show something like the following:

*

About 3.2.1 HDFS bug on Windows

This is a bug with 3.2.1 release:

https://issues.apađậy.org/jira/browse/HDFS-14890

It will be resolved in version 3.2.2 & 3.3.0.

We can apply a temporary fix as the following change diff shows:

Code fix for HDFS-14890

*

I"ve done the following to lớn get this temporarily fixed before 3.2.2/3.3.0 is released:

Checkout the source code of Hadoop project from GitHub.Checkout branch 3.2.1mở cửa pom file of hadoop-hdfs projectUpdate classStorageDirectory as described in the above sầu code diff screen shot:

if (permission != null) try Set permissions = PosixFilePermissions.fromString(permission.toString()); Files.setPosixFilePermissions(curDir.toPath(), permissions); catch (UnsupportedOperationException uoe) // Default khổng lồ FileUtil for non posix tệp tin systems FileUtil.setPermission(curDir, permission); Use Maven lớn rebuild this project as the following screenshot shows:

*

Fix bugHDFS-14890

I"ve uploaded the JAR tệp tin into lớn the following location. Please download it from the following link:

https://github.com/FahaoTang/big-data/blob/master/hadoop-hdfs-3.2.1.jar

And then rename the file namehadoop-hdfs-3.2.1.jarkhổng lồ hadoop-hdfs-3.2.1.bk in folder %HADOOP_HOME%sharehadoophdfs.

Copy the downloadedhadoop-hdfs-3.2.1.jarto folder%HADOOP_HOME%sharehadoophdfs.


warningThis is just a temporary fix before the official improvement is published. I publish it purely for us to complete the whole installation process and there is no guarantee this temporary fix won"t cause any new issue.

Refer khổng lồ this article for more details about how to build a native Windows Hadoop:Compile và Build Hadoop 3.2.1 on Windows 10 Guide.

Step 8 - Start HDFS daemons

Run the following command khổng lồ start HDFS daemons in Comm& Prompt:

%HADOOP_HOME%sbinstart-dfs.cmd
Two Comm& Prompt windows will open: one for datanode và another for namenode as the following screenshot shows:

*

Step 9 - Start YARN daemons


warningYou may encounter permission issues if you start YARN daemons using normal user. To ensure you don"t encounter any issues. Please open a Commvà Prompt window using Run as administrator
.Alternatively, you can follow this bình luận on this page which doesn"t require Administrator permission using a local Windows account:https://xavipacheco.com/column/hadoop/377/latest-hadoop-321-installation-on-windows-10-step-by-step-guide#comment314

Run the following commvà in an elevated Comm& Prompt window (Run as administrator) lớn start YARN daemons:

%HADOOP_HOME%sbinstart-yarn.cmd
Similarly two Commvà Prompt windows will open: one for resource manager & another for node manager as the following screenshot shows:

*

Step 10 - Useful Web portals exploration

The daemons also host websites that provide useful information about the cluster.

HDFS Namenode information UIhttp://localhost:9870/dfshealth.html#tab-overview

The website looks lượt thích the following screenshot:

*

HDFS Datanode information UIhttp://localhost:9864/datanode.html

The trang web looks lượt thích the following screenshot:

*

YARN resource manager UIhttp://localhost:8088

The trang web looks lượt thích the following screenshot:

*

Through Resource Manager, you can also navigate to lớn any Node Manager:

*

Step 11 - Shutdown YARN & HDFS daemons

You don"t need lớn keep the services running all the time. You can stop them by running the following commands one by one:

Bài viết liên quan

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *