借助百度LBS开放平台,在地图上展示北京招聘运维工程师的公司地址[1]。
数据搜集
首先需要有地址信息,本文的地址来源是拉勾网。
#!/bin/bash
URL="http://www.lagou.com/jobs/list_%E8%BF%90%E7%BB%B4%E5%B7%A5%E7%A8%8B%E5%B8%88?kd=%E8%BF%90%E7%BB%B4%E5%B7%A5%E7%A8%8B%E5%B8%88&spc=1&pl=&gj=&xl=&yx=&gx=&st=&labelWords=&lc=&workAddress=&city=%E5%8C%97%E4%BA%AC&requestId=&pn="
echo >links.data
for page in $(seq 1 31)
do
echo "$URL$page"
curl -s "$URL$page" |grep jobs |grep title \
| awk -F "\"" '{print $2}' |tee -a links.data
done
接下来处理成“公司名称 地址”的形式。
#!/bin/bash
[ $# -lt 1 ] && exit 1
file=$1
rm -f yunwei.txt
while read id
do
date=`date +%s`
curl -s $id -o $date.html
name=`sed -n "/<h1 title/,/<\/h1>/p" $date.html |grep "<div" |awk -F '[>|<]' '{print $3}'`
address=`grep "var address" $date.html |awk -F "\"" '{print $2}' |awk '{print $1}' |awk -F "(" '{print $1}'`
echo "$name $address" |tee -a yunwei.txt
rm -f $date.html
done <$file
地址编码
借助百度Geocoding API,将地址转换为经纬度坐标[2]。
<?php
require_once 'function_core.php'; #实现的是curlGet函数
require_once 'config/config.php';
$ak = $config['geocoding']['ak'];
$config['output'] = "json";
$config['city'] = "北京";
if(isset($_POST['address'])) {
$config['address'] = $_POST['address'];
} elseif(isset($argv[1])) {
$config['address'] = $argv[1];
$config['address'] = urlencode($config['address']);
} else {
$config['address'] = urlencode("百度大厦");
}
if(isset($_POST['city'])) {
$city = $_POST['city'];
} elseif(isset($argv[2])) {
$city = $argv[2];
$city = urlencode($config['city']);
} else {
$city = urlencode($config['city']);
}
$apiUrl= "http://api.map.baidu.com/geocoder/v2/?";
$url = $apiUrl . "ak=" . $ak . "&address=" . $config['address'] . "&output=" . $config['output'] . "&city=" . $city;
$geoData = curlGet($url);
if(isset($_POST['shell'])) {
print_r(json_decode($geoData, true));
exit();
}
die($geoData);
转换成LBS数据管理后台批量导入数据要求的csv格式。
<?php
require_once 'function_core.php';
$config['mapApiUrl'] = "http://dev.annhe.net/corp_map/geocoding.php";
$data = file('data.txt');
$add = array();
foreach ( $data as $k => $v) {
$arr = explode(" ", $v);
$add[] = array('name' => trim($arr[0]), 'address' => trim($arr[1]));
}
$file = fopen('geotable.csv', 'w');
fwrite($file, "title,address,longitude,latitude,coord_type\n");
for($i=0; $i<count($add); $i++) {
$title = $add[$i]['name'];
$address = $add[$i]['address'];
$address = str_replace(' ', '', $address);
$address = str_replace('#', '', $address);
$data = array('address' => $address);
$geodata = curlPost($data, $config['mapApiUrl']);
$geodata = json_decode($geodata, true);
if(is_array($geodata) && array_key_exists('result', $geodata)) {
$longitude = $geodata['result']['location']['lng'];
$latitude = $geodata['result']['location']['lat'];
} else {
echo $address . " failed\n";
continue;
}
$coord_type = "3"; #注意这里一定是3,表示百度加密的经纬度坐标
fwrite($file, $title . "," . $address . "," . $longitude . "," . $latitude . "," . $coord_type . "\n");
}
fclose($file);
//print_r($add);
?>
地图展示
lbs后台数据管理,批量上传。获取图层编号,直接修改百度官网的示例,替换ak和图层编号。map_demo
可以看到:中关村、上地,国贸比较集中。
一些问题
经纬度误差
一开始csv中的coord_type设置为1,即 GPS经纬度坐标 ,得到的地图误差很大。原来经纬度坐标按照规定是需要加密的,使用百度geocoding API获取的经纬度坐标应该是百度加密的经纬度坐标[3][4],因此coord_type应设置为3,即 百度加密经纬度坐标。
可选,
1.GPS经纬度坐标
2.国测局加密经纬度坐标
3.百度加密经纬度坐标
4.百度加密墨卡托坐标
ak保密
客户端ak无需保密,服务端ak有些需要保密,例如LBS云存储的ak,最好是单独使用,设置IP白名单。
参考资料
[1]. http://developer.baidu.com/map/index.php?title=jspopular/guide/datalayer [2]. http://developer.baidu.com/map/index.php?title=webapi/guide/webservice-geocoding [3]. http://developer.baidu.com/map/index.php?title=open/question [4]. http://developer.baidu.com/map/index.php?title=lbscloud/api/geodata

发表回复