Cloud Paks地理数据研究成果｜IBM_vincent书童

网络 02-07 3644

现状

从全球范围来看，采用商业地理数据进行商业选址及消费者地理细分在发达经济体已经非常普及。为更精准地服务不断升级的中国消费者，宜家家居、麦当劳、星巴克等专门成立了商业地理分析团队，来指导其在中国的店铺选址。麦肯锡的“解读中国”商业地理分析团队亦感受到来自客户方越来越强烈的需求。

政府部门分析某地区的水资源、土地资源分布特征，是为了更好的优化国土空间开发格局，基础设施布局，全面促进资源节约、保护、利用和管理等进一步推动生态文明的建设。

我们正感受着商业世界发生的巨大变化。大数据时代商业选址的变革之路已悄然开启。古人有云，“一步差三市”，合理的选址对商业经营情况起到了至关重要的作用。采用商业地理数据进行商业选址及消费者地理细分已成为未来智能商业的发展趋势。 ?

技术发展趋势和痛点

全球市场上普遍流行的产品工具，比如ArcGIS，GEOTools，GDAL，GEOS和JTS等，我国国产化自主可控的全局背景下，人大金仓数据库和PostgreSql数据库也是未来发展的方向。

利用FME等地理数据集成产品在数据应用侧有了普遍更广泛的应用，如加工后的数据分发和共享等。

当前公司企业等在地理数据应用普遍存在的问题是：如何使地理数据更快速的相应业务需求，接近实时的分析；在智慧城市的背景下如何得到更好的发散应用，如价值洼地，生活圈分析等；

接下来给大家分享IBM的有关地理数据的产品和Demo：

DB2Warehouse是一个分析数据仓库，具有内存数据处理和数据库分析功能。

它由客户机管理和优化，以实现快速灵活的部署，并具有支持分析工作负载的自动扩展功能。根据所选工作节点的数量，Cloud Pak for Data会自动创建适当的数据仓库环境。对于单个节点，仓库使用对称多处理（SMP）体系结构以提高成本效率。对于两个或多个节点，使用大规模并行处理（MPP）体系结构部署仓库，以实现高可用性和改进的性能。

优势

生命周期管理：与云服务类似，安装、升级和管理Db2仓库非常容易。能够在几分钟内部署Db2仓库数据库丰富的生态系统：数据管理控制台、REST、图形具有多层恢复策略的Db2仓库的扩展可用性支持软件定义的存储，如OCS和IBM Spectrum Scale CSI。

DB2WH (Cloud Pak) 地理数据支持和扩展

支持的地理数据类型

支持以下产品和开发语言

1. Esri ArcGIS You can use Esri ArcGIS for Desktop version 10.3.1 together with your warehouse to analyze and visualize geospatial data. 2. Python The ibmdbPy package provides methods to read data from, write data to, and sample data from a Db2 database. It also provides access methods for in-database analytic or geospatial functions 3. R Use the RStudio? development environment that is provided by IBM? Watson? Studio. Use ODBC to connect your own, locally installed R development environment to your database 4. SQL/Procedures 5. DB内部的SpatialData 模块Db2 Spatial Extender/Db2 spatial Analytics

Esri ArcGIS - 创建Arcgis企业级地理数据库

安装并配置 DB2WH(Cloud Pak)在 DB2WH(Cloud Pak) 服务器上创建名为 sde 的操作系统登录帐户。 ?您将通过 sde 登录帐户连接到数据库来创建地理数据库。创建一个 DB2WH(Cloud Pak) 数据库并将其注册到 Spatial Extender 模块。在数据库中授予 sde 用户 DBADM 权限。配置客户端在 64 位的操作系统上安装 Db2 客户端，请运行 64 位可执行文件；该文件将同时安装 32 位和 64 位文件，使您可以从 32 位和 64 位 ArcGIS 客户端进行连接。(IBM dataserver64-v11.5.6_ntx64_rtcl.exe)?创建地理数据库连接到 Db2 数据库。通过 sde 登录帐户进行连接。确保将 sde 用户密码保存在数据库连接对话框中。右键单击数据库连接，然后单击启用地理数据库。随即将打开启用企业级地理数据库工具。

SpatialData 模块

1. 重要的概念 Geometry types：Points,Linestrings,Polygons 等 Coordinate system：A geographic coordinate system uses a three-dimensional spherical surface to determine locations on the earth. Data types：ST_Point, ST_LineString, and ST_Polygon, ST_MultiPoint, ST_MultiLineString, ?ST_MultiPolygon, and ST_Geometry when you are not sure which of the other data types to use. 2. 性能优化 Specifying inline lengths for geospatial columns Registering spatial columns: call st_register_spatial_column() Filtering using a bounding box ?

SpatialData 主要模块介绍

1. Db2 Spatial Extender/Db2 spatial Analytics(Successor)

Functions provided by the Db2 Spatial Extender component can be used to analyze data stored in row-organized/column-organized tables. Spatial Extender stores geospatial data in special data types, each of which can hold up to 4 MB.

2. 启用Db2 spatial Analytics CALL SYSPROC.SYSINSTALLOBJECTS('GEO', 'C', CAST (NULL AS VARCHAR(128)), CAST (NULL AS VARCHAR(128)))

3. Db2 Spatial Extender/Analytics 接口

Db2 Spatial has a wide variety of interfaces to help you set up and create projects that use spatial data: Db2 Spatial Extender stored procedures called from application programs. SQL queries that you submit from application programs. Open source projects that support Spatial Extender such as:

GeoTools () is a Java? library for building spatial applications. For more information, see http://·/docs/en/cloud-paks/cp-data/3.5.0?topic=scripts-geospatio-temporal-library

https://·/docs/en/cloud-paks/cp-data/3.5.0?topic=libraries-geospatio-temporal-library#getting-started-with-the-library

Geospatio-temporal library - 函数

Topological functions

With the spatio-temporal library, you can use topological relations to confine the returned results of your location data analysis.

Get the aggregated bounding box for a list of geometries.

westchester_WKT = 'POLYGON((-73.984 41.325,...,-74.017 40.698,-74.019 40.698,-74.023 40.703,-74.023 40.709))'

wkt_reader = stc.wkt_reader()

westchester = wkt_reader.read(westchester_WKT)

white_plains = wkt_reader.read(white_plains_WKT)

manhattan = wkt_reader.read(manhattan_WKT)

white_plains_bbox = white_plains.get_bounding_box()

westchester_bbox = westchester.get_bounding_box()

manhattan_bbox = manhattan.get_bounding_box()

aggregated_bbox = white_plains_bbox.get_containing_bb(westchester_bbox).get_containing_bb(manhattan_bbox)

Geohashing functions

The spatio-temporal library includes geohashing functions for proximity search (encoding latitude and longitude and grouping nearby points) in location data analysis.

Geohash coverage

test_wkt = 'POLYGON((-73.76223024988917 41.04173285255264,-73.7749331917837 41.04121496082817,-73.78197130823878 41.02748934524744,-73.76476225519923 41.023733725449326,-73.75218805933741 41.031633228865495,-73.7558787789419 41.03752486433286,-73.76223024988917 41.04173285255264))'

poly = wkt_reader.read(test_wkt)

cover = stc.geohash.geohash_cover_at_bit_depth(poly, 36)

Geospatial indexing functions

With the spatio-temporal library, you can use functions to index points within a region, on a region containing points, and points within a radius to enable fast queries on this data during location analysis.

>>> tile_size = 100000

>>> si = stc.tessellation_index(tile_size=tile_size) # we leave bbox as None to use full earth as boundingbox

>>> si.from_df(county_df, 'NAME', 'geometry', verbosity='error')

3221 entries processed, 3221 entries successfully added

Which are the counties within 20 km of White Plains Hospital? The result is sorted by their distances.

>>> counties = si.within_distance_with_info(white_plains_hospital, 20000)

>>> counties.sort(key=lambda tup: tup[2])

>>> for county in counties:

...???? print(county[0], county[2])

Westchester 0.0

Fairfield 7320.602641166855

Rockland 10132.182241119823

Bergen 10934.1691335908

Bronx 15683.400292349625

Nassau 17994.425235412604

Ellipsoidal metrics

You can use ellipsoidal metrics to calculate the distance between points.

Compute the radians between two points using azimuth:

>>> p1 = stc.point(47.1, -73.5)

>>> p2 = stc.point(47.6, -72.9)

>>> stc.eg_metric.azimuth(p1, p2)

0,6802979449118038

Routing functions

The spatio-temporal library includes routing functions that list the edges that yield a path from one node to another node.

Find the best route with minimal distance cost (the fastest route distance-wise):

# Check distance cost, in the unit of meters

>>> best_distance_route.cost

2042,4082601271236

# Check route path (only showing the first three points), which is a list of points in 3-tuple (osm_point_id, lat, lon)

>>> best_distance_route.path[:3]

[(2036943312, 33.7631862, -84.3939405),

?(3523447568, 33.7632666, -84.3939315),

?(2036943524, 33.7633273, -84.3939155)]

Spark引擎 - Analytics Engine ?

You can use Analytics Engine powered by Apache Spark as a compute engine to run analytical and machine learning jobs. The Analytics Engine powered by Apache Spark service is not available by default. An administrator must install this service on the IBM Cloud Pak for Data platform,If you have the Watson Studio service installed, the Analytics Engine powered by Apache Spark service automatically adds a set of default Spark environment definitions to analytics projects. You can also create custom Spark environment definitions in a project.You can submit jobs to Spark clusters in two ways:1. Specifying a Spark environment definition for a job in an analytics project2. Running Spark job APIs Each time you submit a job, a dedicated Spark cluster is created for the job. You can specify the size of the Spark driver, the size of the executor, and the number of executors for the job. This enables you to achieve predictable and consistent performance. When a job completes, the cluster is automatically cleaned up so that the resources are available for other jobs. The service also includes interfaces that enable you to analyze the performance of your Spark applications and debug problems. Spark APIs You can run these types of workloads with Spark jobs APIs: Spark applications that run Spark SQL Data transformation jobs Data science jobs Machine learning jobs

使用Watson Studio中的spark

For Python: At the beginning of this cell, add %%writefile myfile.py to save the code as a Python file to your working directory. Notebooks that use the same runtime can also import this file.The advantage of this method is that the code is available in your notebook, and you can edit and save it as a new Python script at any time.using pyst which supports most of the common geospatial formats, which includes shapefile, GeoJSON and Well-Known Text (WKT).from pyst import STContext # Register STContext, which is the main entry point stc = STContext(spark.sparkContext._gateway)For R: If you want to save code in a notebook as an R script to the working directory, you can use the writeLines(myfile.R) function. RStudio uses the sparklyr package to connect to Spark from R. The sparklyr package includes a dplyr interface to Spark data frames as well as an R interface to Spark’s distributed machine learning pipelines.There are two methods of connecting to Spark from RStudio: ?

Spark Jobs APIs

In IBM Cloud Pak for Data, you can run Spark jobs or applications on your IBM Cloud Pak for Data cluster without installing Watson Studio by using the Spark jobs REST APIs of Analytics Engine powered by Apache Spark.

1. Submitting Spark jobs ：

?curl -k -X POST <JOB_API_ENDPOINT> -H "Authorization: Bearer <ACCESS_TOKEN>" -d '{"engine":{"type":"spark"},"application_arguments":["/opt/ibm/spark/examples/src/main/resources/people.txt"],"application": "/opt/ibm/spark/examples/src/main/python/wordcount.py"}'

2. Viewing Spark job status：

? curl -k -X GET <JOB_API_ENDPOINT> -H "Authorization: Bearer <ACCESS_TOKEN>"

3. Deleting Spark jobs：

? curl -k -X DELETE <JOB_API_ENDPOINT>/<job-id> -H "Authorization: Bearer <ACCESS_TOKEN>"

一个真实案例

确定目标：

? 洪灾影响范围，分析图像入库，格式为 wkt ? 根据 wkt 影响范围统计涉及承包人数，并利用 ML 机器学习给出保险方案

代码参考如下

Reference ：

? Insurance Loss Estimation using Remote Sensing

https://dataplatform.cloud.ibm.com/exchange/public/entry/view/14ea8dfab582137c695a6630e90cdc32?context=cpdaas

内容偏多，若有任何疑问，请联系博主。