◆ 前沿
因为业务要求api的一次请求响应时间在10ms以内,所以传统的数据库查询操作直接被排除(网络io和磁盘io)。通过调研,最终使用了bitset,目前已经正常运行了很久
文章末尾有测试的数据
构建环境:springboot+mybatis-plus
完整代码及sql下载地址:
链接:https://pan.baidu.com/s/1b2scjH3gZIhY5Jeq2Byghw
提取码:cxyk
代码使用说明:直接把类拷贝到项目中,改下导入的包就行了
bitset介绍
看JDK中的解释简直一头雾水,用我自己的理解概括一下
1. bitset的内部实现是long数组
2. set中每一个位的默认值为false(0)
3. bitset长度按需增长
4. bitset非线程安全
◆ 进入正题,实现bitset毫秒级查询
有一张user表
CREATE TABLE `user` (`id` int(11) NOT NULL AUTO_INCREMENT,`name` varchar(255) NOT NULL,`address` varchar(255) NOT NULL comment '地址',`gender` varchar(10) NOT NULL comment '性别',`age` varchar(10) NOT NULL,PRIMARY KEY (`id`)) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;
假设我们要查询“北京市18岁的女生”,那么对应的sql如下:
select * from `user` where address='北京' and age='18' and gender='girl';
如何使用bitset实现同样的查询呢?
1.将user表数据加载进内存中
2.为user表建立address,age,gender维度的bitset索引
3.根据索引查询数据
1.将user表数据加载进内存中
将user表从数据库读取出来存入List,这只用做一步就行
User实体类:
public class User implements Cloneable {private String name;private String address;private String gender;private String age;@Overridepublic String toString() {return "User [name=" + name + ", address=" + address + ", gender=" + gender + ", age=" + age + "]";}@Overridepublic User clone() {User user = null;try {user = (User) super.clone();} catch (CloneNotSupportedException e) {e.printStackTrace();}return user;}//省略get set 方法。。。
2.建立索引
创建bitset索引模型类
import java.util.ArrayList;import java.util.BitSet;import java.util.List;import java.util.Map;import java.util.concurrent.ConcurrentHashMap;public class BitSetIndexModel {private String type;//索引类型:address,age,genderprivate ConcurrentHashMap<String, Integer> map;//索引类型和bitSet在bsList中下标的映射关系private List<String> list;//索引类型的值集合,例如gender:girl,boyprivate List<BitSet> bsList;//bitset集合public BitSetIndexModel() {}public BitSetIndexModel(String type) {this.type = type;map = new ConcurrentHashMap<String, Integer>();list = new ArrayList<String>();bsList = new ArrayList<BitSet>();}public String getType() {return type;}public void setType(String type) {this.type = type;}public Map<String, Integer> getMap() {return map;}public void setMap(ConcurrentHashMap<String, Integer> map) {this.map = map;}public List<String> getList() {return list;}public void setList(List<String> list) {this.list = list;}public List<BitSet> getBsList() {return bsList;}public void setBsList(List<BitSet> bsList) {this.bsList = bsList;}/**** @param str* @param i*/public void createIndex(String str, int i) {BitSet bitset = null;//获取‘str'对应的bitset在bsList中的下标Integer index = this.getMap().get(str);if (index != null) {bitset = this.getBsList().get(index);if (bitset == null) {bitset = new BitSet();this.getBsList().add(index, bitset);}bitset.set(i, true);// 将str对应的位置为true,true可省略} else {bitset = new BitSet();List<String> list = this.getList();list.add(str);index = list.size() - 1;bitset.set(i, true);this.getBsList().add(bitset);this.getMap().put(str, index);}}/*** 从entity里拿出符合条件的bitset** @param str* @return*/public BitSet get(String str) {BitSet bitset = null;str = str.toLowerCase();Integer index = this.getMap().get(str);if (index != null) {bitset = this.getBsList().get(index);} else {bitset = new BitSet();}return bitset;}/*** bitset的与运算** @param str* @param bitset* @return*/public BitSet and(String str, BitSet bitset) {if (str != null) {str = str.toLowerCase();if (bitset != null) {bitset.and(get(str));} else {bitset = new BitSet();bitset.or(get(str));}}return bitset;}/*** bitset的或运算** @param str* @param bitset* @return*/public BitSet or(String str, BitSet bitset) {if (str != null) {str = str.toLowerCase();if (bitset != null) {bitset.or(get(str));} else {bitset = new BitSet();bitset.or(get(str));}}return bitset;}/*** 获取bitset值为true的 即 把 bitset翻译为list的索引** @param bitset* @return*/public static List<Integer> getRealIndexs(BitSet bitset) {List<Integer> indexs = new ArrayList<Integer>();if (bitset != null) {int i = bitset.nextSetBit(0);if (i != -1) {indexs.add(i);for (i = bitset.nextSetBit(i + 1); i >= 0; i = bitset.nextSetBit(i + 1)) {int endOfRun = bitset.nextClearBit(i);do {indexs.add(i);} while (++i < endOfRun);}}}return indexs;}}
为每一个user对象创建address,gender,age维度索引
public class UserIndexStore {private static final String ADDRESS = "address";private static final String GENDER = "gender";private static final String AGE = "age";private BitSetIndexModel address;private BitSetIndexModel gender;private BitSetIndexModel age;private ConcurrentHashMap<Integer, User> userMap;//存储所有的user数据public static final UserIndexStore INSTANCE = getInstance();private UserIndexStore() {init();}public static UserIndexStore getInstance() {return UserIndexStoreHolder.instance;}private static class UserIndexStoreHolder {private static UserIndexStore instance = new UserIndexStore();}private void init() {this.address = new BitSetIndexModel(ADDRESS);this.gender = new BitSetIndexModel(GENDER);this.age = new BitSetIndexModel(AGE);userMap = new ConcurrentHashMap<Integer, User>();}/*** 构建索引* @param users*/public void createIndex(List<User> users) {if (users != null && users.size() > 0) {for (int index = 0; index < users.size(); index++) {User user = users.get(index);getAddress().update(user.getAddress(), index);getGender().update(user.getGender(), index);getAge().update(user.getAge(), index);this.userMap.put(index, user);}}}public BitSet query(String address, String gender, String age) {BitSet bitset = null;bitset = getAddress().and(address, bitset);bitset = getGender().and(gender, bitset);bitset = getAge().and(age, bitset);return bitset;}public User findUser(Integer index) {User user = this.userMap.get(index);if (user != null) {return user.clone();//可能会对user做修改操作,要保证内存原始数据不变}return null;}public BitSetIndexModel getAddress() {return address;}public void setAddress(BitSetIndexModel address) {this.address = address;}public BitSetIndexModel getGender() {return gender;}public void setGender(BitSetIndexModel gender) {this.gender = gender;}public BitSetIndexModel getAge() {return age;}public void setAge(BitSetIndexModel age) {this.age = age;}}
◆ 测试
测试结果(查询2w次):
数据量(users.size()) | 并发数 | 平均查询时间
10w | 10 | 9ms
测试电脑:本机配置,服务器应该更快

测试说明:第一次和第二次测试不算,因为系统需要预热
◆ 总结
==优点==:
通过测试发现随着数据量的增大和并发数的提高,平均耗时并没有明显升高,并且响应时间都在10ms以内
==缺点==:
1.不适合数据量太大的情况,因为需要把数据全部加载进内存
2.不适合复杂查询
3.不适合对name,id等唯一值做查询
◆ 后记
因为我们的查询业务比较简单,唯一的要求是速度,并且数据量也不大,每张表的数据量都不超过100w,所以使用这种方式比较合适。
在本篇文章中只谈到了如何创建索引,以及最基本的查询,在下一篇中我会继续说明如何更新索引,以及一些复杂查询,比如<,>,between and。
如果对以上内容有想交流的可以:






