暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

idea连接linux中hadoop集群,HDFS java API demo

Java技术学习笔记 2020-08-05
1319

1.window环境调用linux中hadoop集群时,需要配置

1.hadoopwindow中的环境变量
2.需要在hadoop/bin中添加winutils.exehadoop.dll
下载目录:https://github.com/steveloughran/winutils

2.idea下载hadoop插件

Big Data Tools

备注:

如果点击  Test Connection  时失败可能原因有:
1.虚拟机防火墙没有关闭,或者防火墙对应的端口没有放开
2.缺少winutils.exe 和 hadoop.dll
3.也可能是第一次连接时里面没有数据

3.异常

Error: java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:823)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:846)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:907)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:756)
at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:494)
at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:127)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:208)
at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
    at org.apache.hadoop.mapreduce.lib.input.DelegatingRecordReader.nextKeyValue(DelegatingRecordReader.java:89)

 解决方法:

1.不要调用Hadoop的FileSystem#close
2.disable掉FileSystem内部的cache(有性能问题)
configuration.setBoolean("fs.hdfs.impl.disable.cache", true);
这种情况不能使用下面的项目,因为需要close和创建实例

4.HDFS java API demo

# yml
hdfs:
hdfsPath: hdfs://192.168.225.133:9000
hdfsName: root
bufferSize: 67108864
# HDFSConfig
public Configuration getConfiguration() {
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", hdfsProperties.getHdfsPath());
    // 关闭cache
// configuration.setBoolean("fs.hdfs.impl.disable.cache", true);
return configuration;
}


@Bean
public FileSystem getFileSystem () throws Exception {
FileSystem fileSystem = FileSystem.get(new URI(hdfsProperties.getHdfsPath()), getConfiguration(), hdfsProperties.getHdfsName());
return fileSystem;
}

项目地址:

https://gitee.com/hzy100java/spring-hadoop-client.git

不用close的底层代码原因:

//Cache.class
private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{
FileSystem fs;
synchronized (this) {
fs = map.get(key);
}
if (fs != null) {
return fs;
}


fs = createFileSystem(uri, conf);
synchronized (this) { // refetch the lock again
FileSystem oldfs = map.get(key);
if (oldfs != null) { // a file system is created while lock is releasing
fs.close(); // close the new file system
return oldfs; // return the old file system
}

// now insert the new file system into the map
if (map.isEmpty()
&& !ShutdownHookManager.get().isShutdownInProgress()) {
ShutdownHookManager.get().addShutdownHook(clientFinalizer, SHUTDOWN_HOOK_PRIORITY);
}
fs.key = key;
map.put(key, fs);
if (conf.getBoolean("fs.automatic.close", true)) {
toAutoClose.add(key);
}
return fs;
}
文章转载自Java技术学习笔记,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论