This repository provides a streamlined and efficient solution for importing data from CSV files into an Elasticsearch instance.
Before you begin, please ensure you have the following prerequisites in place:
- IP Geolocation Database Files: Download the necessary IP geolocation database files, unzip the files to get the data in .csv format.
- Elasticsearch: Install and ensure Elasticsearch is up and running. Refer to the Elasticsearch installation guide if you need assistance.
- Logstash: Install and ensure Logstash is up and running. Refer to the Logstash installation guide if you need assistance.
Once the prerequisites are met now you can proceed with indexing the data into an ES instance.
- if you have subscribed DB-I to DB-VII then you should follow following steps.
- If you have subscribed DB-V to DB-VII as they includes security DB too then you should index it too by following:
This index will contains data from 'db-place.csv' file.
- Create an index called 'place_db'.
curl -X PUT "localhost:9200/place_db"
- Define the mapping for 'place_db' index. In Elasticsearch, a "mapping" is similar to a schema definition in a traditional database. It defines how documents and their fields are stored and indexed, including their data types, field-specific indexing options, analyzers, and other settings. I you want to see or modify the mappings just update the corresponding .json file here.
curl -s -X PUT "localhost:9200/place_db/_mapping" -H 'Content-Type: application/json' --data-binary "@index_mapping/place_db_mapping.json"
- Use logstash to push data from .csv file to an ES instance. You can find the corresponding configuration files here. These files contains necessary configuration to map csv data onto an index.
Note: Before running below command make sure to add your place file(downloaded from ipgeolocation.io) path inside geolocaiton_db.conf.
Note: Just make sure that data dir /var/lib/logstash/place/ have writeable permissions.
/usr/share/logstash/bin/logstash -f {path_where_repo_clone}/ipgeo-es-db-reader/logstash_config/place_db.conf --path.data /var/lib/logstash/place/
This index will contains data.
- Create an index called 'country_db'.
curl -X PUT "localhost:9200/country_db"
- Define the mapping for 'country_db' index.
curl -X PUT "localhost:9200/country_db/_mapping" -H 'Content-Type: application/json' --data-binary "@index_mapping/country_db_mapping.json"
- Create an enrich policy. Enrichment policies are used to enrich documents with additional data from external sources before indexing them. This enrichment process enhances the search capabilities by adding relevant information to the documents, which may not be present in the original dataset. We're enhancing the
country_dbindex by incorporating data from theplace_dbindex. If you want, you can have the enrich policies lited here
curl -X PUT "localhost:9200/_enrich/policy/place-db-enrich-policy" -H 'Content-Type: application/json' --data-binary "@enrich_policies/country_db_enrich_policy.json"
- Execute the
place-db-enrich-policy.
curl -X POST "localhost:9200/_enrich/policy/place-db-enrich-policy/_execute"
- Create an ingest pipeline for pushing data. An ingest pipeline will be helpful during document indexing. It will allows us to pre-process documents before they are indexed, enabling various transformations and enrichments of the data. This pipeline will use the
place-db-enrich-policyenrich policy for pre-processing.
curl -X PUT "localhost:9200/_ingest/pipeline/country-db-enrich-pipeline" -H 'Content-Type: application/json' --data-binary "@pipelines_processors/country_db_pipeline_processor.json"
- Use logstash to push data from .csv file to an ES instance. You can find the corresponding configuration files here. These files contain the necessary configurations to map CSV data onto an Elasticsearch index. Additionally, they include pipelines used to enrich the data.
Note: Before running below command make sure to add your country file(downloaded from ipgeolocation.io) path inside country_db.conf.
Note: Just make sure that data dir /var/lib/logstash/country/ have writeable permissions.
/usr/share/logstash/bin/logstash -f {path_where_repo_clone}/ipgeo-es-db-reader/logstash_config/country_db.conf --path.data /var/lib/logstash/country/
This will be our main index that will store information about the geolocation of an ip.
- create an index called 'geolocation_db'
curl -X PUT "localhost:9200/geolocation_db"
- Define the mapping for 'geolocation_db' index.
curl -X PUT "localhost:9200/geolocation_db/_mapping" -H 'Content-Type: application/json' --data-binary "@index_mapping/geolocation_db_mapping.json"
- Create an enrich policy.
curl -X PUT "localhost:9200/_enrich/policy/country-db-enrich-policy" -H 'Content-Type: application/json' --data-binary "@enrich_policies/geolocation_db_enrich_policy.json"
- Execute the
country-db-enrich-policy.
curl -X POST "localhost:9200/_enrich/policy/country-db-enrich-policy/_execute"
- Create a pipeline for pushing data.
curl -X PUT "localhost:9200/_ingest/pipeline/geolocation-db-enrich-pipeline" -H 'Content-Type: application/json' --data-binary "@pipelines_processors/geolocation_db_pipeline_processor.json"
- Use logstash to push data from .csv file to an ES instance. Here you will see few extra options:
-wwill be used for number of pipeline worker, which could improve processing throughput at the cost of increased resource usage.The optimal value corresponds to the number of processors in your machine.-bsets the pipeline batch size, allowing each worker to collect and process n number of events before sending them to outputs.
Note: Before running below command make sure to add your geolocation file(downloaded from ipgeolocation.io) path inside geolocaiton_db_{DB-Version}.conf.
Note: Just make sure that data dir /var/lib/logstash/geolocation/ have writeable permissions.
/usr/share/logstash/bin/logstash -f {path_where_repo_clone}/ipgeo-es-db-reader/logstash_config/geolocation_db_{DB-Version}.conf --path.data /var/lib/logstash/geolocation/ -w {choose as per your resources} -b 150
This index will contains data from 'db-security.csv' file.
- Create an index called 'proxy_db'.
curl -X PUT "localhost:9200/proxy_db"
- Define the mapping for 'proxy_db' index. I you want to see or modify the mappings just update the corresponding .json file here.
curl -s -X PUT "localhost:9200/proxy_db/_mapping" -H 'Content-Type: application/json' --data-binary "@index_mapping/proxy_db_mapping.json"
- Use logstash to push data from .csv file to an ES instance. You can find the corresponding configuration files here. These files contains necessary configuration to map csv data onto an index.
Note: Before running below command make sure to add your proxy file(downloaded from ipgeolocation.io) path inside proxy_db.conf.
Note: Just make sure that data dir /var/lib/logstash/proxy/ have writeable permissions.
/usr/share/logstash/bin/logstash -f {path_where_repo_clone}/ipgeo-es-db-reader/logstash_config/proxy_db.conf --path.data /var/lib/logstash/proxy/
If you need to determine the geolocation of an IP address, simply execute the following command on the machine where your Elasticsearch indices are configured:
curl -X GET "localhost:9200/geolocation_db/_search?pretty" -H "Content-Type: application/json" -d '
{
"query": {
"bool": {
"must": [
{
"range": {
"start_ip": {
"lte": "1.1.1.1"
}
}
},
{
"range": {
"end_ip": {
"gte": "1.1.1.1"
}
}
}
]
}
}
}'
If you want to assess the security of an IP address:
GET /proxy_db/_search?pretty
{
"query": {
"match": {
"ip": "104.215.214.120"
}
}
}